Google COSMO: On-Device AI Agents Are Coming to Android
The Leak That Escaped
On May 1, Google published COSMO to the Google Play Store — labelled experimental — then pulled it within hours. The timing, two and a half weeks before Google I/O 2026, made the withdrawal look more like a slip than a strategy. But what COSMO exposed before disappearing is worth understanding before the keynote.
COSMO is a 1.13 GB on-device AI assistant built on Gemini Nano, Google’s on-device inference model. It is not a wrapper around the cloud-hosted Gemini API. It runs models locally, on the handset, with an optional but not required cloud connection.
On-Device First: Why Gemini Nano Changes the Baseline
Previous Gemini integrations in Android — the keyboard, Chrome, Photos — treat on-device inference as a speed optimization. The cloud model is primary; Nano fills in when the network is slow or unavailable.
COSMO inverts this. Its three fulfillment modes, available in settings, make the hierarchy explicit:
- Nano Only — Gemini Nano exclusively, no network required
- Hybrid — Gemini Nano when offline, server model when online
- PI Only — server model only
The order matters. Nano Only comes first. Google is designing an assistant that treats fully local inference as a supported, intentional mode — not a compromise. For any developer building AI features into an Android app, this is the clearest signal yet that on-device AI is becoming a first-class distribution target, not a future nice-to-have.
AccessibilityService as the AI Runtime
COSMO’s second significant architectural choice: it uses Android’s AccessibilityService for real-time screen awareness. Rather than requiring an explicit invocation, COSMO observes on-screen content and surfaces proactive skills — summarizing an open conversation thread, drafting a reply, pulling up a relevant photo — based on what the device is currently showing.
AccessibilityService is not a new API. Screen readers, password managers, and automation tools have used it for years. What changes when a proactive on-device AI agent uses it is the scope of observation: COSMO doesn’t respond to a single view node; it interprets the full semantic structure of whatever your app renders on screen.
This creates two concrete implications for app developers:
Your app’s UI is already legible to COSMO without any integration work. For most apps that’s a feature — your content becomes usable by an AI that can help users act on it without leaving your screen.
Sensitive content is also legible, unless you opt out. The standard mechanism is WindowManager.LayoutParams.FLAG_SECURE, which blocks screenshots and accessibility capture on a view or window. If your app surfaces financial data, health records, messages, or credentials — and you haven’t audited FLAG_SECURE usage recently — COSMO is a good reason to do it now.
// Apply to a Window to block accessibility capture and screenshots
window.addFlags(WindowManager.LayoutParams.FLAG_SECURE)
// Or target a specific view via a WindowManager overlay
val params = WindowManager.LayoutParams().apply {
flags = flags or WindowManager.LayoutParams.FLAG_SECURE
}
Proactive UX Changes Your App’s Job
Traditional mobile UX is pull-based: users tap, apps respond. COSMO is push-based: it watches, infers intent, and surfaces actions unprompted. That gap redefines who is responsible for guiding users through your product.
When an on-device AI agent can summarize the support thread open in your app, draft a response, and propose sending it — without the user navigating anywhere — your app’s role as the sole orchestrator of user intent is over.
Whether that’s good or disruptive for your product depends on how legible your app is to external observers. Apps with well-structured layouts, meaningful contentDescription labels, and clean accessibility semantics will be interpreted correctly. Apps with custom unlabeled icon buttons, deeply nested view hierarchies, or canvas-rendered UI will be either invisible or misread.
The practical takeaway: the same work that makes your app usable with TalkBack makes it a better citizen in an AI-agent world. This isn’t new advice. The COSMO leak just adds concrete urgency to it.
What We’re Watching Before I/O
Google I/O runs May 19–20. The Android Show: I/O Edition streams on May 12, a week before. Based on what COSMO revealed, we expect Google to announce Gemini Nano as a formal system service available to all apps — analogous to how Core ML is exposed on iOS — rather than a capability reserved for Google’s own apps.
If that ships, Android apps will be able to invoke on-device inference directly: no API costs, no network round-trip latency, no dependency on the user’s connection. For the kinds of healthcare and fintech apps we build where offline reliability and data privacy both matter, that’s one of the more meaningful capability unlocks in recent years. We’ll update once the I/O sessions confirm it.
Sources
- Google releases experimental ‘COSMO’ AI assistant app on Play Store — 9to5Google, May 1 2026
- Google just dropped a new ‘experimental AI assistant’ app exclusively for Android — Android Authority, May 1 2026
- Google Releases, Pulls COSMO AI App From Google Play — Droid-Life, May 1 2026
- Google Quietly Drops ‘COSMO’: A 1.13GB Experimental AI Assistant with On-Device Gemini Nano Processing — BigGo Finance, May 2 2026
- Google I/O 2026: Date, time, potential announcements and everything else you need to know — Tom’s Guide, 2026