GOOGL: Google Releases Drafter Models to...

Google has released Multi-Token Prediction (MTP) drafter models for its Gemma 4 family of open AI models, capable of accelerating inference speeds by up to three times without degrading output quality. This addresses a key bottleneck in deploying large language models, where performance is often limited by memory bandwidth rather than raw computing power.

The new drafters utilize a technique called speculative decoding. This involves pairing the main Gemma 4 model with a smaller, faster "drafter" model that proposes several future tokens at once. The larger model then verifies this batch of tokens in a single parallel pass, significantly reducing latency compared to the standard one-token-at-a-time generation process. The MTP drafters are available under an open-source license and integrated into popular AI development platforms.

Related News

Google DeepMind Proposes FINRA-Style AI Watchdog, Citing AGI Security Risks

Swiss Regulator Launches Probe into Google's Android Search Default

Google buys 1.6GW Arkansas solar output, fueling AI and data centers

Google traffic climbs in June, defying AI competition concerns

Anthropic debuts rupee pricing for Claude, targeting India's AI market