Skip to content

Microsoft Agent Governance Toolkit — Deterministic Guardrails for AI Agents

Source: microsoft/agent-governance-toolkit \ Date Published: 2026 (Public Preview) \ Author/Org: Microsoft \ Stars: 2.9k · Forks: 457 · License: MIT \ Languages: Python 78.1%, TypeScript 9.8%, C# 3.5%, Rust 3.4%, Go 1.7%


TL;DR

Microsoft's Agent Governance Toolkit (AGT) is built on a fundamental premise: prompt-level safety is not a control surface. Relying on LLMs to "follow the rules" via prompts is structurally unreliable. AGT intercepts every tool call, message send, and delegation in deterministic application code before the model's intent reaches the wire — making it structurally impossible for an agent to violate policy.

Core Philosophy

"Prompt-level safety ('please follow the rules') is not a control surface. It is a polite request to a stochastic system." "Actions the AGT kernel denies are not 'unlikely.' They are structurally impossible."

The toolkit cites sobering evidence:

  • OWASP LLM01:2025: "it is unclear if there are fool-proof methods of prevention for prompt injection"
  • JailbreakBench (Chao et al., NeurIPS 2024): Adaptive attacks reach near-100% attack success rates against frontier safety-aligned models
  • Andriushchenko et al., 2024: 100% ASR on GPT-4, GPT-3.5, Claude 3, and Llama-3 using simple prompt-only attacks

How It Works

Two-Line Integration

from agentmesh.governance import govern
safe_tool = govern(my_tool, policy="policy.yaml")

Example Policy (YAML)

apiVersion: governance.toolkit/v1
name: production-policy
default_action: allow
rules:
  - name: block-destructive
    condition: "action.type in ['drop', 'delete', 'truncate']"
    action: deny
    description: "Destructive operations require human approval"
  - name: require-approval-for-send
    condition: "action.type == 'send_email'"
    action: require_approval
    approvers: ["security-team"]

Behaviour

>>> safe_tool(action="read", table="users")  # Allowed
{'table': 'users', 'rows': 42}

>>> safe_tool(action="drop", table="users")  # Blocked
GovernanceDenied: Action denied by policy rule 'block-destructive'

Multi-Language Support

AGT provides SDKs for Python, TypeScript, C#, Rust, and Go, making it possible to enforce a single governance policy across heterogeneous agent ecosystems.

CLI Tools

agt doctor                                        # Check installation
agt verify                                        # OWASP compliance check
agt verify --evidence ./agt-evidence.json --strict  # Fail CI on weak evidence
agt red-team scan ./prompts/ --min-grade B          # Scan prompts for vulnerabilities

Key Takeaways

  1. AGT rejects prompt-level safety as unreliable and enforces governance at the tool-call layer using deterministic policy rules
  2. Policies are expressed as declarative YAML and can require human approval, deny actions, or log violations — with the same policy enforced across Python, TS, C#, Rust, and Go
  3. The toolkit was prompted in part by research showing near-100% jailbreak success rates against frontier models, making structural guardrails a necessity for production agent deployments