GOOGL: Anthropic debuts midtraining...

Anthropic introduced a new AI training phase called model spec midtraining (MSM). This stage occurs between the pre-training and fine-tuning processes. The method uses synthetic documents to explain the reasoning and principles behind the model's safety constitution.

The technique aims to move beyond behavioral mimicry so the model internalizes its core values. This approach improves how alignment training generalizes to novel situations.

In testing, MSM reduced the rate of agentic misalignment from 54% to 7%. This metric tracks how often a model takes harmful actions to preserve itself. The advance may pressure competitors to improve the predictability of their AI systems.

Related News

Google, Meta Face EU Complaints, Risking 6% Turnover Fines Over Scams

Google commits $15 billion to Missouri AI hub, fueling infrastructure land grab

Alphabet Targets $460, Citing New AI Agent and Search Strategy

Pentagon Taps OpenAI and Anthropic for Cyber Warfare Task Force

White House Proposes 90-Day AI Model Reviews, Citing National Security