Why Gemini-Powered Siri Runs on Nvidia Blackwell, Not Apple's Servers

Apple announced last year that Siri would gain access to Google’s Gemini models for complex queries. A new report from The Information fills in the infrastructure story: Gemini-powered Siri will run not on Apple’s own Private Cloud Compute hardware, but on Google’s fleet of Nvidia Blackwell B200 data center GPUs — launching in September.

That’s a bigger admission than it sounds.

The hardware Apple couldn’t build fast enough

Apple spent the better part of two years building Private Cloud Compute (PCC) — a dedicated server infrastructure running on Apple Silicon (M2 Ultra, now M5), with a hardened iOS/macOS-based OS and hardware security features borrowed from iPhone: Secure Enclave, Secure Boot, cryptographic attestation. The entire architecture was designed so that Apple, not a cloud provider, held every layer of the stack.

When Apple negotiated its Gemini license with Google, the plan was apparently to run Gemini inference on PCC hardware. That plan failed. Apple tried to get Gemini working on its in-house servers and found it ran too slowly. The B200-class GPU clusters that Google operates for Gemini inference are simply faster than Apple’s M5 servers for this class of workload — transformer inference at scale is still where GPUs dominate over CPU-adjacent chips, however well optimized.

So September’s Siri ships on Google’s hardware. Apple’s dedicated AI server chips — its real answer to Nvidia — aren’t in production until 2027.

What Nvidia Blackwell confidential compute actually does

Apple can’t tell users their queries are being processed on third-party hardware without answering the privacy question. Its answer is Nvidia’s confidential compute feature on the Blackwell B200.

What this means in practice: data in transit and in VRAM is encrypted end-to-end, GPU-to-GPU communication over NVLink and NVSwitch is also encrypted (a gap that only closed with the Blackwell generation), and remote attestation lets the client verify the exact hardware and firmware configuration before sending data. The performance cost at production scale is near-zero — large-model workloads see throughput indistinguishable from unencrypted runs.

This is production-ready technology. Multiple cloud providers deployed Blackwell confidential compute in early 2026. Apple is using it to argue: even though the chips belong to Google, the data belongs to the user. Whether that argument survives regulatory scrutiny — particularly under the EU AI Act — remains to be seen.

The on-device/cloud split that mirrors good app architecture

The Information describes a hybrid inference model: lighter Siri tasks stay on-device, running on the Neural Engine in Apple Silicon; heavier conversational and reasoning queries route to Google Cloud running Nvidia B200s.

This architecture is identical to what thoughtful app developers should already be building. Fast, predictable, cheap tasks — classification, local context retrieval, short generation — belong on device. Complex reasoning, large-context synthesis, and tasks requiring a full frontier model belong in the cloud. The split is not a compromise; it’s the correct design.

The hard part is routing. Apple needs to decide, in real time and without user-visible latency, which tier handles each query. Getting the threshold wrong means either slow responses (routing too much to the cloud) or weak responses (routing too much on-device). It’s a calibration problem that will evolve with every model update.

Private Cloud Compute’s quiet sidelining

The honest read of this news is that PCC has been sidelined for Siri’s flagship use case. Apple’s press materials describe PCC as central to Apple Intelligence’s privacy story, and the M5 upgrade earlier this year was framed as a capacity expansion. But if Gemini-powered Siri — the product Apple ships to hundreds of millions of users — runs on Google Cloud, PCC shrinks to handling Apple’s own foundation models: the smaller models that manage email summarization, notification previews, and local context.

That’s still a meaningful workload. But it’s not the headline feature, and it’s not what Apple implied when it introduced PCC.

The broader point: Apple built an entire server infrastructure strategy around the premise that it could control all the critical ingredients. Gemini’s inference requirements broke that premise before the strategy fully deployed.

What Gemini-Powered Siri means for the apps we build

A materially improved Siri changes the calculus for anything that integrates App Intents. If the September relaunch delivers on the reasoning capability Gemini can provide, voice-triggered app features become viable in ways they haven’t been since Siri launched in 2011.

At Dracode, where we build mobile products for founders and scale-ups, we’re watching specifically how Apple surfaces that reasoning to third-party apps through the App Intents framework. If the new Siri can maintain multi-turn context, resolve ambiguous references across apps, and execute compound actions, the gap between a conversational AI experience and a native iOS one narrows considerably. That changes which features are worth scoping into a product roadmap.

For now, the architecture principle Apple is applying — on-device for speed and privacy, cloud for depth, routing logic invisible to the user — is the same one we’d recommend for any AI feature in a mobile app. The infrastructure is different, but the design is sound.

Sources

Report details Apple’s plan to use Nvidia chips for the Gemini-powered Siri — 9to5Mac, June 3 2026
Apple’s Overhauled Siri Will Reportedly Run on Nvidia’s Blackwell Chips — MacRumors, June 4 2026
Private Cloud Compute: A new frontier for AI privacy in the cloud — Apple Security Research
Apple plans M5-based Private Cloud Compute architecture for Apple Intelligence — 9to5Mac, February 2026
Confidential Computing Meets NVIDIA HGX B200: Secure AI Without the Performance Trade-Off — Corvex, 2026