Webwright: A SWE-Style Browser Agent Framework¶
Source: Microsoft/Webwright on GitHub
Date Published: 2026-05-26
Author: Microsoft Research
TL;DR¶
Webwright is a radically simple browser agent framework that achieves state-of-the-art results on long-horizon web tasks. Its key insight: separate the agent from the browser, treating the browser as a disposable environment the agent spawns while developing a program. No multi-agent systems, no graph engines, no plugin layers — just a terminal, a browser, and a model.
Core Philosophy¶
"Webwright gives LLM a terminal where it can launch multiple browser sessions to inspect the page and complete a web task."
"Separate the agent from the browser, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session — it's the code and logs in the local workspace."
Architecture Comparison¶
| Stagehand | browser-use | Webwright | |
|---|---|---|---|
| Paradigm | Hybrid: code + NL primitives | Autonomous LLM loop over DOM/AX | Coding agent with terminal |
| Action Space | Playwright code / NL | Indexed click/type actions | Free-form Python (Playwright scripts) |
| State | Browser session | Browser session | Local workspace (code, screenshots, logs) |
| Loop shape | Imperative | observe→predict→execute→repeat | write code → execute → inspect → repair |
Key Dependencies (~1.5k LoC)¶
SOTA Performance¶
Achieved state-of-the-art on two real-website benchmarks with a 100-step budget:
- Odysseys (long-horizon evaluation)
- Online-Mind2Web (AutoEval)
Project Structure¶
webwright/
├── pyproject.toml
├── src/webwright/
│ ├── run/cli.py # CLI entrypoint
│ ├── agents/default.py # Core agent loop
│ ├── environments/ # Playwright browser workspace
│ ├── tools/ # image_qa, self_reflection
│ ├── models/ # OpenAI/Anthropic backends
│ └── config/ # Stackable YAML configs
├── tests/
└── outputs/ # Run artifacts
Quick Start¶
Key Takeaways¶
- Webwright achieves SOTA by treating the browser as a disposable environment, not a persistent session
- The agent writes and executes code rather than predicting individual web actions
- Ultra-minimal codebase (~1.5k LoC) with no complex orchestration layers
- Ships as a plugin/skill for existing coding agents (Claude Code, Codex CLI)
- The "code-as-action" paradigm represents a fundamentally different approach from traditional browser agents