InfoCapability

TextQuests benchmark: LLMs as autonomous agents in text-based games reveal long-context reasoning challenges

AI Impact Summary

TextQuests benchmarks LLMs' ability to act as autonomous agents in 25 Infocom-style games, emphasizing long-context reasoning, exploration learning, and action history management. The report notes hallucinations about past interactions, repeated actions as context grows, and navigation failures in spatial tasks, translating into high compute and latency costs for robust performance. These findings imply that deploying AI agents in dynamic, open-ended environments will require advanced memory, world-modeling, and adaptive planning beyond vanilla prompt-based reasoning, as well as careful cost/performance tradeoffs.

Affected Systems

TextQuestsClaude

Date: Date not specified
Change type: capability
Severity: info

TextQuests benchmark: LLMs as autonomous agents in text-based games reveal long-context reasoning challenges

More from Hugging Face

Get alerts for Hugging Face