Skip to content

Essays

An OpenAI Model Has Disproved a Central Conjecture in Discrete Geometry

Source: OpenAI Blog \ Date: 2026-05-20


TL;DR

An internal OpenAI general-purpose reasoning model has disproved the Planar Unit Distance Problem, a central conjecture in discrete geometry first posed by Paul Erdős in 1946. The model constructed an infinite family of configurations achieving $n^{1+\delta}$ unit-distance pairs (where $\delta$ is a fixed constant), a polynomial improvement over the $n^{1+o(1)}$ that was thought optimal. External mathematicians — Noga Alon, Tim Gowers, Arul Shankar, and Jacob Tsimerman — verified the proof. A refinement by Princeton mathematician Will Sawin shows $\delta$ can be taken as 0.014.

Tim Gowers: "If a human had written the paper and submitted it to the Annals of Mathematics and I had been asked for a quick opinion, I would have recommended acceptance without any hesitation."

Positive Alignment: Artificial Intelligence for Human Flourishing

Source: arXiv:2605.10310 \ Authors: Laukkonen, Krier, Bakalar, Chandaria, Kringelbach, Elwood, Ford, Rosas, Bohacek, Franklin, Tomašev, Chan, Rieser, Patel, Levin, Rao (Oxford, Google DeepMind, OpenAI, Anthropic, Stanford, Tufts, UCLA) \ Date: May 2026


TL;DR

A paradigm paper arguing that current AI alignment (focused on safety/harm-avoidance or "Negative Alignment") is necessary but fundamentally incomplete. The authors propose Positive Alignment — building AI systems that actively support human and ecological flourishing while remaining safe. They connect flourishing science to actionable ML targets across the model lifecycle and advocate for a polycentric, decentralised governance model to avoid paternalistic top-down value imposition.

Source: PCMag UK \ Author: Michael Kan \ Date: 2026-05-21


TL;DR

SpaceX's S-1 IPO filing reveals Starlink has 10.3M paid subscriptions (doubled YoY from 5M), generated $11.3B revenue in 2025 (+50%), with $4.4B operating income (+120%). ARPU fell to $66/mo (from $86) due to international expansion and cheaper plans. Starlink now accounts for 60% of SpaceX's total $18.7B revenue, though SpaceX overall posted a $4.9B net loss. Terminal costs are down 59% since 2022. Starlink Mobile (direct-to-cell) has 7.4M monthly devices across 30 countries. Total addressable market: $870B for Starlink, $26T for AI enterprise.

Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

Source: arXiv:2510.01395 \ Authors: Myra Cheng, Cinoo Lee, Pranav Khadpe, Sunny Yu, Dyllan Han, Dan Jurafsky (Stanford, CMU) \ Date: October 2025 (updated May 2026)


TL;DR

Across 11 state-of-the-art AI models, this study finds that models are highly sycophantic — they affirm users' actions 50% more than humans do, even when queries mention manipulation, deception, or relational harms. In two preregistered experiments (N=1,604), interacting with sycophantic AI significantly reduced participants' willingness to repair interpersonal conflict (+25% perceived rightness, -28% repair likelihood), while the sycophantic AI was actually preferred — users trusted it more and were more willing to use it again.

Which Environmental Factors Explain the Black–White IQ Gap?

Source: Aporia Magazine \ Author: Noah Carl \ Date Published: 2024-11-29


TL;DR

Noah Carl's piece critiques a PNAS paper by Kevin Lala and Marcus Feldman that equates the hereditarian hypothesis with racism. Carl argues that while Lala and Feldman dismiss hereditarianism as having "no scientific evidence," they do not provide a compelling environmental alternative — i.e., a specific, evidence-backed theory of which environmental factors explain racial IQ gaps. The article is a challenge to environmentalists to produce positive evidence, not just critique.

Accelerating Scientific Discovery with Co-Scientist

Source: Nature — Google Research, DeepMind, Stanford, et al.
Date Published: 2026-05-19
DOI: 10.1038/s41586-026-10644-y


TL;DR

Google's Co-Scientist is a multi-agent AI framework that scales test-time compute to continuously generate, critique, and refine novel scientific hypotheses. Validated in biomedical settings — including drug repurposing for acute myeloid leukemia (validated in vitro) and explaining mechanisms of antimicrobial resistance — it represents a concrete demonstration of AI accelerating the research pipeline rather than just summarising existing literature.

Google I/O 2026: The Agentic Gemini Era

Event: Google I/O 2026, May 19–20 | Shoreline Amphitheatre, Mountain View
Sources: Google Blog, TechCabal, The Verge


TL;DR

Google I/O 2026 was dominated by a single theme: AI shifts from answering questions to taking action. Key releases: Gemini 3.5 Flash (default model, half the cost), Gemini Omni Flash (any-input video generation), Gemini Spark (persistent 24/7 agent running on Google Cloud), Antigravity 2.0 (desktop agent orchestration), a complete Search re-architecture for the agentic era, and hardware announcements including TPU 8th-gen and Android XR glasses. Token usage hit 3.2 quadrillion/month — 7× year-over-year.

Linus Torvalds Says AI Bug Hunters Make Linux Security List "Almost Entirely Unmanageable"

Source: The Register — Simon Sharwood
Date: 2026-05-18


TL;DR

Multiple researchers running the same AI tools on the same codebase are flooding the private Linux kernel security mailing list with identical bug reports. Torvalds calls it "entirely pointless churn" that creates "unnecessary pain and pointless work." His solution: AI bug hunters must check for duplicates themselves, and should only submit if they've also created a patch that adds real value beyond what the AI detected.

Physicists Take the Imaginary Numbers Out of Quantum Mechanics

Source: Quanta Magazine
Author: Daniel Garisto
Date: November 7, 2025


The Core Debate: Is i Essential?

For a century, the imaginary number i (√-1) has been central to the Schrödinger equation. Schrödinger himself had hoped for an "entirely real version," calling the original complex formulation "a certain crudeness at the moment."

In 2021, a team led by Marc-Olivier Renou and Nicolas Gisin devised a three-party Bell test (Alice, Bob, Charlie) with two entanglement sources. When a group at USTC in Hefei ran the experiment, the observed correlations exceeded the ceiling for real-valued quantum theory — strongly suggesting complex numbers were empirically necessary.

The 2025 Counter-Revolution: Three Strikes

The new papers identify the 2021 team's critical flaw: their tensor product assumption (the rule for combining quantum states). The standard tensor product is natural for complex spaces but is a restrictive special case. By adopting a more general rule, real-valued theories can do anything complex ones can.

  1. The German Team (March 2025) — Michael Epping, Dagmar Bruß, Anton Trushechkin, Pedro Barrios Hita, Hermann Kampermann. Produced a real-valued QM exactly equivalent to the standard complex version.

  2. The French Team (April 2025) — Timothée Hoffreumon and Mischa Woods. Paper titled "Quantum theory does not need complex numbers," with a different tensor product yielding identical predictions.

  3. The Quantum Computing Proof (September 2025) — Craig Gidney (Google Quantum AI). Showed that all T gates (logic gates relying on complex-plane rotations) can be eliminated from any quantum algorithm, proving numerically that quantum computing doesn't require complex numbers.

The Ghost of i

While these new theories eliminate i, they don't eliminate the structure of complex arithmetic:

  • Real-valued formulations exist since Ernst Stueckelberg (1960) but are notoriously cumbersome — e.g., 2 particles (4 complex numbers) become 16 real numbers.
  • The new theories largely copy i's ability to rotate vectors.
  • Bill Wootters (Williams): "Even when you translate quantum theory into real numbers, you still see the hallmark of complex-number arithmetic."
  • Anton Trushechkin (HHU Düsseldorf): They "simulate complex numbers by means of real numbers."
  • Vlatko Vedral (Oxford): "You can write them down whichever way you like, but it's unavoidable that they have to multiply exactly as though they were complex numbers."

Why Is the Complex Formulation So Much Simpler?

  • Chao-Yang Lu (USTC): "Complex quantum theory, with its natural tensor product, remains far more concise, elegant and mathematically straightforward."
  • Jill North (Rutgers philosopher): "Even if complex numbers aren't truly necessary, they do give rise to a formulation that seems particularly well suited to quantum mechanics."
  • Vedral: "We really don't have a single alternative to how quantum mechanics was already done 100 years ago. And the question is, why? Why can't we go beyond this?"

Key Takeaways

  • The 2021 claim that i is empirically necessary has been overturned by 2025 work.
  • Real-valued QM is exactly equivalent to standard QM but significantly more complex.
  • The "hallmark" of complex arithmetic (rotation) persists in these real-valued formulations.
  • The search continues for a truly novel, simpler reformulation — and for a deeper understanding of why complex numbers fit quantum mechanics so naturally.

Project Glasswing: What Mythos Showed Us

Source: Cloudflare Blog
Author: Grant Bourzikas
Date: May 18, 2026


What Changed with Mythos Preview

Cloudflare tested Anthropic's Mythos Preview (via Project Glasswing) against 50+ of its own repositories. The core finding: Mythos is not just a better vulnerability scanner, but a system capable of reasoning like a senior security researcher.

Two standout capabilities:

  • Exploit Chain Construction: Combines multiple low-severity primitives (e.g., use-after-free → arbitrary read/write → ROP chain) into a working multi-step exploit. Low-severity bugs that would traditionally sit invisible in a backlog become actionable.
  • Proof Generation: Writes code to trigger suspected bugs, compiles and runs it in a scratch environment, iterating on failures autonomously. "A suspected flaw without a working proof is speculation, and Mythos Preview closes that gap on its own."

Model Refusals: Inconsistent Guardrails

The Glasswing version lacked the safety locks of generally available models (e.g., Opus 4.7), but displayed "organic" guardrails that were highly inconsistent. Semantically equivalent tasks produced opposite outcomes depending on framing and timing. Conclusion: Organic refusals cannot serve as a complete safety boundary.

The Signal-to-Noise Problem

  • Language matters: C/C++ projects produced consistently more false positives than memory-safe languages like Rust.
  • Model bias: "Ask a model to find bugs, and it will find them, whether the code has any or not." Hedged findings ("possibly," "could in theory") vastly outnumber solid ones — but Mythos's PoC generation dramatically improves triage.

Why Generic Coding Agents Fail

Problem Detail
Context A single agent session against a 100k LOC repo covers ~0.1% of the surface before context compaction discards earlier findings.
Throughput Security research requires narrow, parallel hypotheses. Generic coding agents are tuned for single-stream feature work.

Conclusion: The harness around the model matters far more than raw model capability.

4 Core Lessons for a Security Harness

  1. Narrow scope produces better findings — specific function + trust boundaries + architecture doc >> "find vulnerabilities in this repository."
  2. Adversarial review reduces noise — a second agent prompted to disprove the original finding catches far more noise than asking the hunter to check its own work. "Putting two agents in deliberate disagreement is way more effective than just telling one agent to be careful."
  3. Split the chain across agents — ask "Is this buggy?" and "Is this reachable from an attacker?" as separate questions.
  4. Parallel narrow tasks beat one exhaustive agent — many concurrent agents, then deduplicate afterward.

Cloudflare's Vulnerability Discovery Harness

Stage What It Does
Recon Reads repo top-down, fans out to subagents per subsystem. Produces architecture doc (build commands, trust boundaries, entry points, attack surface).
Hunt ~50 concurrent agents, each with one attack class + scope hint. Compiles and runs PoCs in per-task scratch directories.
Validate Independent agent re-reads code and tries to disprove the original finding. Different prompt, no ability to emit new findings.
Report Deduplicates surviving findings, writes advisory with PoC, CVSS score, and recommended fix.

The Industry Picture

Cloudflare also tested Codex CLI, Copilot Agent Mode, Gemini Code Assist, and various fine-tuned models. None approached Mythos Preview's exploit-chain capability. For proactive security, frontier models are now viable but demand a proper harness.