March 2026

AI Browser & Computer Use

Benchmarks, datasets, and infrastructure for AI agents that use computers and browsers. Every datapoint links to its source.

7
Benchmarks · OSWorld, WebArena, GAIA...
8
Datasets · 75+ in full catalog
6
Platforms · Verified pricing & features

Benchmarks & Leaderboards

How well can AI agents actually use computers?

OSWorld

2024

First scalable real computer environment for multimodal agents. 369 tasks across Ubuntu, Windows, macOS involving real web and desktop apps.

369 tasks
Human baseline
72.36%
AskUI VisionAgent
66.2%
GTA1 w/ o3
45.2%
OpenAI CUA o3
42.9%
UI-TARS-1.5
42.5%

WebArena

2023

Self-hosted web environment with 812 realistic tasks across e-commerce, forums, project management, and content editing sites.

812 tasks
Gemini 2.5 Pro
59.2%
GPT-4o
36.4%
GPT-4 + browsing
14.41%

WebChoreArena

2025

532 tedious web tasks requiring massive memory, calculation, and long-term cross-page reasoning. Exposes weaknesses hidden by standard WebArena.

532 tasks
Gemini 2.5 Pro
44.9%
GPT-4o
2.6%

VisualWebArena

2024

Extension of WebArena with 910 tasks requiring visual understanding — images, spatial reasoning on pages, in addition to navigation.

910 tasks

Scores not yet aggregated.

GAIA

2023

General AI Agent benchmark from Meta AI & HuggingFace. Multi-step reasoning, tool use, web browsing across 3 difficulty levels.

466 tasks
Human baseline
92%
Spine Swarm
67%
Writer Action Agent
61%
OpenAI Deep Research
47.6%

CUB (Computer Use Benchmark)

2025

First benchmark specifically for computer & browser use. 106 end-to-end workflows across 7 industries including finance, e-commerce, construction.

106 tasks
Writer Action Agent
10.4%

Mind2Web

2023

Internet-scale dataset for web agents. 2,350 tasks across 137 websites in 31 domains with real web interaction.

2350 tasks

Scores not yet aggregated.

Datasets

Training and evaluation data for computer-use and browser agents. Full catalog (75+)

Browser Infrastructure

Cloud browser platforms for AI agents. Pricing and features verified from vendor websites.

Agent-First Platforms

Adjacent Infrastructure