Hugging Face: OpenAI releases TextQuests benchmark for evaluating LLM agentic reasoning | SignalBreak | SignalBreak