Skip to content

Defending Code — Anthropic's Vulnerability Discovery Reference Harness

Overview

Anthropic has released an open-source reference implementation for autonomous vulnerability discovery and remediation, built on the operational learnings from Project Glasswing. The harness provides a complete, production-tested framework for turning AI agents loose on codebases to find and fix security vulnerabilities — safely.

The system follows a five-stage lifecycle:

  1. Recon — map the attack surface, identify entry points, understand the codebase architecture.
  2. Find — actively probe for vulnerabilities using static and dynamic analysis.
  3. Triage — assess severity, exploitability, and false positive likelihood for each finding.
  4. Report — generate structured findings with proof-of-concept exploit code and remediation guidance.
  5. Patch — implement and validate fixes.

Built-In Skills

The harness ships with a set of composable skills activated through slash commands:

  • /threat-model — given a codebase, produce a structured threat model identifying trust boundaries, data flows, and high-risk components.
  • /vuln-scan — run targeted vulnerability scans against specific modules or the entire codebase.
  • /triage — evaluate a finding against known vulnerability patterns and assess real-world exploitability.
  • /patch — generate and apply a fix for a confirmed vulnerability, with regression tests.
  • /customize — extend the harness with custom vulnerability signatures or domain-specific scanning logic.

Strict Security Model

The harness enforces a rigorous security boundary between safe operations and sandbox-required operations:

  • Safe ops (read/write only) — file reading, code analysis, report generation. These run without special isolation.
  • Sandbox-required ops — code execution, dynamic analysis, and patch validation all run inside gVisor sandboxes, preventing escape or persistent damage to the host system.

This two-tier model means the harness can be deployed in continuous integration pipelines with confidence that even a compromised agent cannot break out of the sandbox.

Getting Started: A 2-Week Ramp-Up Plan

The reference documentation outlines a practical 4-step ramp-up plan:

Phase Duration Activity
Day 1 1 day Run static scans on a familiar codebase to validate tooling
Day 2 1 day Run the full autonomous pipeline on a C/C++ library
Days 3–5 3 days Customize skills and signatures for your target repository
Week 2+ Ongoing Deploy autonomous scanning at scale with human-in-the-loop review

This structured approach lets teams build confidence gradually, starting with low-risk static analysis and scaling up to full autonomous vulnerability discovery.