Structured CodeAgents: JSON-guided code actions improve tool use on OpenAI function calling API and capable models (Claude, Qwen)
AI Impact Summary
Researchers show that forcing LLMs to emit a JSON object with thoughts and executable Python code improves tool orchestration and accuracy across GAIA, MATH, SimpleQA, and Frames benchmarks compared with standard CodeAgent and function-calling approaches. Using smolagents to parse a strict thoughts/code JSON eliminates markdown/code parsing errors and enforces planning before execution, reducing cascading failures. Benefits appear strongest with capable models (32B+ parameters or frontier models) and may incur a 'structure tax' on smaller models (e.g., mistralai/Mistral-7B-Instruct-v0.3) that degrade performance. Migration requires updating the agent’s output format to include thoughts and code in a JSON blob and adjusting tool-invocation pipelines to parse and execute the code blocks safely.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info