Google Confirms the First AI-Built Zero-Day Used in the Wild

What Google Found: The First AI-Built Zero-Day in the Wild

On May 11, Google’s Threat Intelligence Group (GTIG) published a report confirming what security researchers had been expecting: the first AI-built zero-day exploit operationalized for a real attack campaign. The target was a popular open-source, web-based system administration tool. The vulnerability was a 2FA bypass. Attackers were preparing mass exploitation when Google’s team disrupted the campaign and initiated responsible disclosure with the vendor.

The flaw itself was not a memory corruption bug or an injection vulnerability. It was a semantic logic error — a hardcoded trust exception that quietly contradicted the application’s 2FA enforcement logic. To any automated scanner, the code looked correct. It did what it was written to do. The problem was that what it did and what the developer intended were not the same thing.

Why LLMs Find What Scanners Miss

Traditional security tooling — fuzzers, SAST, DAST — is tuned for a well-understood class of bugs: buffer overflows, improper input sanitization, known injection patterns. Semantic logic flaws are invisible to those tools because they require understanding developer intent, not just code structure.

Frontier LLMs reason about intent. They can read a 2FA enforcement block, locate an exception condition three files over, and infer that the exception breaks the invariant the developer thought they were maintaining. This is exactly the class of vulnerability where human code review has always been expensive and inconsistent — and where AI now has a structural advantage on offense.

The exploit code made its origins clear. GTIG identified it as AI-generated from several markers: educational docstrings throughout the Python script, a hallucinated CVSS score, textbook-clean formatting with structured help menus and ANSI color class implementations. The model didn’t just find the flaw. It packaged and documented a deployable attack tool.

This shifts the threat model for any team shipping server-side software. The vulnerabilities your scanner flags — the ones your pipeline catches on every PR — are now table stakes. Adversaries can automate discovery of the vulnerabilities your scanner cannot find.

GTIG’s report describes the defensive-side equivalents Google is running internally: Big Sleep, a DeepMind agent that proactively hunts unknown vulnerabilities in production code, and CodeMender, an agent that automatically patches critical vulnerability chains before they can be weaponized. These tools apply the same LLM-based reasoning to your codebase that attackers are now running against it.

For teams building production software for scale-ups, the immediate priority is auditing logic-layer trust assumptions in authentication and session management flows. 2FA implementations, privilege escalation paths, and role-based access checks are the highest-value targets for the class of semantic analysis attackers are now deploying at scale. A periodic manual review cadence is no longer sufficient.

State Actors Are Ahead of Criminal Groups

The criminal campaign is the headline, but GTIG’s data makes clear it represents the least sophisticated actor in this space. China-nexus clusters are operating at a different level entirely. UNC2814 uses expert-persona prompting with Gemini to guide vulnerability research. APT45 sent thousands of recursive prompts analyzing published CVEs for exploitation angles. APT27 is using AI for active malware development.

North Korean and Russia-nexus actors are similarly deployed. GTIG identified PROMPTSPY, an Android backdoor with a GeminiAutomationAgent module that autonomously navigates device UI to exfiltrate data without triggering conventional behavioral detection.

The supply chain vector is also live. TeamPCP (UNC6780) compromised LiteLLM — an AI gateway library common in production AI stacks — alongside PyPI packages and GitHub Actions workflows, deploying the SANDCLOCK credential stealer to harvest AWS keys and GitHub tokens. If your infrastructure routes traffic through LiteLLM or any AI middleware with similar exposure, that dependency tree is worth auditing now.

What We’re Watching

The gap between vulnerability discovery and working exploit is compressing, and the constraint is no longer researcher availability — it’s compute. The meaningful defense is applying the same LLM-based semantic analysis to your own codebase before attackers do, which means treating AI-assisted code review as an operational security control rather than a productivity tool.

We’re watching for the first public report of an AI-built zero-day with no human in the loop at any stage of development and deployment. Based on the trajectory GTIG is describing, that case is closer than most organizations have planned for.

Sources

GTIG AI Threat Tracker: Adversaries Leverage AI for Vulnerability Exploitation, Augmented Operations, and Initial Access — Google Cloud Blog, May 11, 2026
Defending Your Enterprise When AI Models Can Find Vulnerabilities Faster Than Ever — Google Cloud Blog, 2026
AI-assisted hacking is already here, Google warns — Android Authority, May 2026
Hackers Observed Using AI to Develop Zero-Day for the First Time — Infosecurity Magazine, May 2026

What Google Found: The First AI-Built Zero-Day in the Wild

Why LLMs Find What Scanners Miss

The Defender’s New Blind Spot

State Actors Are Ahead of Criminal Groups

What We’re Watching

Sources