Defending Code — Anthropic's Vulnerability Discovery Reference Harness
Overview¶
Anthropic has released an open-source reference implementation for autonomous vulnerability discovery and remediation, built on the operational learnings from Project Glasswing. The harness provides a complete, production-tested framework for turning AI agents loose on codebases to find and fix security vulnerabilities — safely.
The system follows a five-stage lifecycle:
- Recon — map the attack surface, identify entry points, understand the codebase architecture.
- Find — actively probe for vulnerabilities using static and dynamic analysis.
- Triage — assess severity, exploitability, and false positive likelihood for each finding.
- Report — generate structured findings with proof-of-concept exploit code and remediation guidance.
- Patch — implement and validate fixes.
Built-In Skills¶
The harness ships with a set of composable skills activated through slash commands:
/threat-model— given a codebase, produce a structured threat model identifying trust boundaries, data flows, and high-risk components./vuln-scan— run targeted vulnerability scans against specific modules or the entire codebase./triage— evaluate a finding against known vulnerability patterns and assess real-world exploitability./patch— generate and apply a fix for a confirmed vulnerability, with regression tests./customize— extend the harness with custom vulnerability signatures or domain-specific scanning logic.
Strict Security Model¶
The harness enforces a rigorous security boundary between safe operations and sandbox-required operations:
- Safe ops (read/write only) — file reading, code analysis, report generation. These run without special isolation.
- Sandbox-required ops — code execution, dynamic analysis, and patch validation all run inside gVisor sandboxes, preventing escape or persistent damage to the host system.
This two-tier model means the harness can be deployed in continuous integration pipelines with confidence that even a compromised agent cannot break out of the sandbox.
Getting Started: A 2-Week Ramp-Up Plan¶
The reference documentation outlines a practical 4-step ramp-up plan:
| Phase | Duration | Activity |
|---|---|---|
| Day 1 | 1 day | Run static scans on a familiar codebase to validate tooling |
| Day 2 | 1 day | Run the full autonomous pipeline on a C/C++ library |
| Days 3–5 | 3 days | Customize skills and signatures for your target repository |
| Week 2+ | Ongoing | Deploy autonomous scanning at scale with human-in-the-loop review |
This structured approach lets teams build confidence gradually, starting with low-risk static analysis and scaling up to full autonomous vulnerability discovery.