InfoCapability

ScreenSuite launches GUI agent benchmarking suite for Vision-Language Models

AI Impact Summary

ScreenSuite introduces a broad GUI agent benchmarking suite focused on vision-only evaluation across 13 benchmarks, enabling evaluation of perception, grounding, single-step actions, and multi-step workflows. It leverages models like Qwen2.5-VL-72B, UI-Tars-1.5-7B, Holo1-7B, and GPT-4o, with the smolagents framework and dockerized Ubuntu Desktop or Android environments to provide reproducible, environment-agnostic comparisons. This creates a standardized baseline for measuring GUI navigation and click-precision capabilities, which can accelerate model selection and integration decisions for GUI-enabled applications.

Affected Systems

ScreenSuiteQwen2.5-VL-72B

Date: Date not specified
Change type: capability
Severity: info

ScreenSuite launches GUI agent benchmarking suite for Vision-Language Models

More from Hugging Face

Get alerts for Hugging Face