Defending Code — Anthropic's Vulnerability Discovery Reference Harness

Overview¶

Anthropic has released an open-source reference implementation for autonomous vulnerability discovery and remediation, built on the operational learnings from Project Glasswing. The harness provides a complete, production-tested framework for turning AI agents loose on codebases to find and fix security vulnerabilities — safely.

The system follows a five-stage lifecycle:

Recon — map the attack surface, identify entry points, understand the codebase architecture.
Find — actively probe for vulnerabilities using static and dynamic analysis.
Triage — assess severity, exploitability, and false positive likelihood for each finding.
Report — generate structured findings with proof-of-concept exploit code and remediation guidance.
Patch — implement and validate fixes.

Built-In Skills¶

The harness ships with a set of composable skills activated through slash commands:

/threat-model — given a codebase, produce a structured threat model identifying trust boundaries, data flows, and high-risk components.
/vuln-scan — run targeted vulnerability scans against specific modules or the entire codebase.
/triage — evaluate a finding against known vulnerability patterns and assess real-world exploitability.
/patch — generate and apply a fix for a confirmed vulnerability, with regression tests.
/customize — extend the harness with custom vulnerability signatures or domain-specific scanning logic.

Strict Security Model¶

The harness enforces a rigorous security boundary between safe operations and sandbox-required operations:

Safe ops (read/write only) — file reading, code analysis, report generation. These run without special isolation.
Sandbox-required ops — code execution, dynamic analysis, and patch validation all run inside gVisor sandboxes, preventing escape or persistent damage to the host system.

This two-tier model means the harness can be deployed in continuous integration pipelines with confidence that even a compromised agent cannot break out of the sandbox.

Getting Started: A 2-Week Ramp-Up Plan¶

The reference documentation outlines a practical 4-step ramp-up plan:

Phase	Duration	Activity
Day 1	1 day	Run static scans on a familiar codebase to validate tooling
Day 2	1 day	Run the full autonomous pipeline on a C/C++ library
Days 3–5	3 days	Customize skills and signatures for your target repository
Week 2+	Ongoing	Deploy autonomous scanning at scale with human-in-the-loop review

This structured approach lets teams build confidence gradually, starting with low-risk static analysis and scaling up to full autonomous vulnerability discovery.