Anthropic introduced a new AI training phase called model spec midtraining (MSM). This stage occurs between the pre-training and fine-tuning processes. The method uses synthetic documents to explain the reasoning and principles behind the model's safety constitution.

The technique aims to move beyond behavioral mimicry so the model internalizes its core values. This approach improves how alignment training generalizes to novel situations.

In testing, MSM reduced the rate of agentic misalignment from 54% to 7%. This metric tracks how often a model takes harmful actions to preserve itself. The advance may pressure competitors to improve the predictability of their AI systems.