LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks¶

Source: LEAP — Google AI Research | arXiv:2606.03303

TL;DR¶

LEAP (LLM-in-Lean Environment Agentic Prover) is an agentic framework from Google AI Research for automated formal theorem proving using general-purpose foundation models. Key results: on the 2025 Putnam Competition, it solves all 12 problems. On Lean-IMO-Bench (IMO-style problems in Lean), it boosts the one-shot formal solve rate from below 10% to 70%, surpassing the 48% benchmark of a specialized IMO system. It also autonomously formalized proofs for open combinatorial challenges, including a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.

What Is LEAP?¶

LEAP is an agentic framework for formal theorem proving. Unlike previous approaches that relied on specialized provers or hand-crafted tactics, LEAP uses general-purpose foundation models (LLMs) as the reasoning core, wrapped in an agent loop that interacts with the Lean theorem prover.

The key insight: foundation models have impressive mathematical reasoning capabilities, but they struggle with formal verification because they make errors in syntax, type-checking, and proof step ordering. LEAP solves this by treating the LLM as a guided agent that proposes proof steps, receives feedback from Lean's type checker, and iterates.

Architecture¶

LEAP's architecture consists of several components:

Lean Environment — The formal mathematics environment. LEAP interacts with Lean by submitting tactics and receiving feedback (success, type errors, unsolved goals).
LLM Agent — A general-purpose foundation model (e.g., Gemini or a similar model) that proposes proof steps. The agent is prompted with the current goal state, the available hypotheses, and the proof history.
Agent Loop — The control flow: propose a tactic, execute in Lean, observe the result, update the proof state, repeat. The loop includes backtracking — if a tactic fails, the agent can try an alternative approach.
Search and Exploration — LEAP can explore multiple proof paths in parallel, using the LLM to generate candidate tactics and Lean to verify them. Failed paths provide feedback that improves subsequent attempts.

Results on the Putnam Competition¶

The 2025 William Lowell Putnam Mathematical Competition is one of the most prestigious undergraduate mathematics competitions in the world. LEAP solved all 12 problems — a first for an AI system. This demonstrates that the agentic framework, when combined with a capable foundation model, can match the performance of elite human mathematicians on competition-level problems.

Results on Lean-IMO-Bench¶

Lean-IMO-Bench is a benchmark of International Mathematical Olympiad (IMO) problems formalized in Lean. The results are striking:

Approach	Formal Solve Rate
One-shot (no LEAP)	Below 10%
Specialized IMO system	48%
LEAP	70%

LEAP improves the solve rate from under 10% (raw LLM with no agent framework) to 70%, surpassing a specialized IMO system by 22 percentage points. This is a 7x improvement over the baseline and a 1.46x improvement over the specialized system.

The magnitude of this improvement highlights a general lesson: the agentic framework matters more than the model. The same foundation model that scores under 10% in one-shot mode achieves 70% with LEAP's agent loop, feedback mechanisms, and search infrastructure.

Open Problem: Knuth's Hamiltonian Decomposition¶

Perhaps the most impressive achievement: LEAP autonomously formalized proofs for open combinatorial challenges, including a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.

Cayley graphs are a fundamental concept in group theory and graph theory. A Hamiltonian decomposition is a partition of the edges into Hamiltonian cycles. Donald Knuth has studied the Hamiltonian decomposition of even-order Cayley graphs — a long-standing combinatorial problem. LEAP was able to formalize a proof for a key subproblem, meaning the AI system didn't just solve known exercises but contributed to advancing mathematical knowledge.

Implications¶

LEAP has several important implications for the future of AI in mathematics:

Agentic frameworks unlock latent capability — The same model that fails in one-shot mode can solve 70% of IMO problems with proper agent scaffolding. The capability was already there; the framework made it accessible.
Formal verification as training signal — Lean's type checker provides perfect feedback — every error is unambiguous and deterministic. This makes formal theorem proving an ideal domain for agentic approaches with iterative refinement.
Competition problems as stepping stones — Putnam and IMO problems are hard but well-posed. Success here suggests the approach can scale to harder, less structured mathematical problems.
Automated formalization of open problems — The Knuth result demonstrates that AI can contribute to open mathematical research, not just solve known problems.

Key Takeaways¶

LEAP is an agentic framework that wraps foundation models in an agent loop with the Lean theorem prover
Solves all 12 problems from the 2025 Putnam Competition
Boosts Lean-IMO-Bench solve rate from under 10% to 70% (vs 48% for a specialized IMO system)
Autonomously formalized proofs for open combinatorial challenges including Knuth's Hamiltonian decomposition
The agentic framework (feedback loop, search, backtracking) matters more than the base model
Formal verification provides perfect feedback, making theorem proving ideal for agentic approaches
Demonstrates AI's growing capability to contribute to open mathematical research