Microsoft Agent Governance Toolkit — Deterministic Guardrails for AI Agents¶
Source: microsoft/agent-governance-toolkit \ Date Published: 2026 (Public Preview) \ Author/Org: Microsoft \ Stars: 2.9k · Forks: 457 · License: MIT \ Languages: Python 78.1%, TypeScript 9.8%, C# 3.5%, Rust 3.4%, Go 1.7%
TL;DR¶
Microsoft's Agent Governance Toolkit (AGT) is built on a fundamental premise: prompt-level safety is not a control surface. Relying on LLMs to "follow the rules" via prompts is structurally unreliable. AGT intercepts every tool call, message send, and delegation in deterministic application code before the model's intent reaches the wire — making it structurally impossible for an agent to violate policy.
Core Philosophy¶
"Prompt-level safety ('please follow the rules') is not a control surface. It is a polite request to a stochastic system." "Actions the AGT kernel denies are not 'unlikely.' They are structurally impossible."
The toolkit cites sobering evidence:
- OWASP LLM01:2025: "it is unclear if there are fool-proof methods of prevention for prompt injection"
- JailbreakBench (Chao et al., NeurIPS 2024): Adaptive attacks reach near-100% attack success rates against frontier safety-aligned models
- Andriushchenko et al., 2024: 100% ASR on GPT-4, GPT-3.5, Claude 3, and Llama-3 using simple prompt-only attacks
How It Works¶
Two-Line Integration¶
Example Policy (YAML)¶
apiVersion: governance.toolkit/v1
name: production-policy
default_action: allow
rules:
- name: block-destructive
condition: "action.type in ['drop', 'delete', 'truncate']"
action: deny
description: "Destructive operations require human approval"
- name: require-approval-for-send
condition: "action.type == 'send_email'"
action: require_approval
approvers: ["security-team"]
Behaviour¶
>>> safe_tool(action="read", table="users") # Allowed
{'table': 'users', 'rows': 42}
>>> safe_tool(action="drop", table="users") # Blocked
GovernanceDenied: Action denied by policy rule 'block-destructive'
Multi-Language Support¶
AGT provides SDKs for Python, TypeScript, C#, Rust, and Go, making it possible to enforce a single governance policy across heterogeneous agent ecosystems.
CLI Tools¶
agt doctor # Check installation
agt verify # OWASP compliance check
agt verify --evidence ./agt-evidence.json --strict # Fail CI on weak evidence
agt red-team scan ./prompts/ --min-grade B # Scan prompts for vulnerabilities
Key Takeaways¶
- AGT rejects prompt-level safety as unreliable and enforces governance at the tool-call layer using deterministic policy rules
- Policies are expressed as declarative YAML and can require human approval, deny actions, or log violations — with the same policy enforced across Python, TS, C#, Rust, and Go
- The toolkit was prompted in part by research showing near-100% jailbreak success rates against frontier models, making structural guardrails a necessity for production agent deployments