Jupyter Agent: training LLMs to reason with notebooks and execute code in notebook environments
AI Impact Summary
Jupyter Agent enables LLMs to execute code inside a notebook environment and to reason step-by-step on data analysis tasks, directly linking natural language prompts to live results. By benchmarking small models like Qwen3-4B-Thinking-2507 and Qwen-3 Coder against the DABStep tasks and introducing a final_answer tool, the approach demonstrates measurable performance gains with streamlined scaffolding. The pipeline relies on large-scale notebook data curation (Datatrove, Kaggle Notebooks dataset, Kaggle Datasets) and established tooling (smolagents, ReACT), signaling a practical path to production-grade data science agents using lightweight models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info