Gemini · Google · Alignment Forum
Synthetic document finetuning for instilling positive traits
Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.
◎ Multiple-sources
This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.
Key facts
- But the other two panels show each filtering did change the model's structure as expected: the BLUF-filtered model produces less BLUF (52% -> 41%), and the emotional-validation-filtered model
- Explanation of the capability evals: LMSYS SxS is measured relative to the baseline of SFT-only, 0% synthetic data
- hence why that datapoint is near 50%, because this is the model measured
- The team took two patterns with >20% frequency in the data: emotional-validation buffering, and BLUF (Bottom Line Up Front), where the opening sentence is a direct response either agreeing with or refuting
- Their MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits they wanted the model to exhibit) as their universe context, with a checkpoint of Gemini 3 Flash post-trained
Summary
This work closely follows Li et al (model spec midtraining, or MSM), who show that by training a model on synthetic documents before chat finetuning starts, they can shape how the model generalizes. Their MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits they wanted the model to exhibit) as their universe context, with a checkpoint of Gemini 3 Flash post-trained only on the Flash SFT mixture as their starting point. The team created synthetic datasets in similar ways for both pipelines, again heavily inspired by the pipeline in Kutasov et al, as well as Marks et al.:. When trained on this data, they removed the system prompts used to generate it, similar to Guan et al.