Gemini · Google · Alignment Forum

Synthetic document finetuning for instilling positive traits

Tue, Jun 16 · 12:04 AM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.

◎ Multiple-sources

This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas.

Key facts

But the other two panels show each filtering did change the model's structure as expected: the BLUF-filtered model produces less BLUF (52% -> 41%), and the emotional-validation-filtered model
Explanation of the capability evals: LMSYS SxS is measured relative to the baseline of SFT-only, 0% synthetic data
hence why that datapoint is near 50%, because this is the model measured
The team took two patterns with >20% frequency in the data: emotional-validation buffering, and BLUF (Bottom Line Up Front), where the opening sentence is a direct response either agreeing with or refuting
Their MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits they wanted the model to exhibit) as their universe context, with a checkpoint of Gemini 3 Flash post-trained

Summary

This work closely follows Li et al (model spec midtraining, or MSM), who show that by training a model on synthetic documents before chat finetuning starts, they can shape how the model generalizes. Their MVP pipeline used a "traits document" (a short bullet-pointed list of positive traits they wanted the model to exhibit) as their universe context, with a checkpoint of Gemini 3 Flash post-trained only on the Flash SFT mixture as their starting point. The team created synthetic datasets in similar ways for both pipelines, again heavily inspired by the pipeline in Kutasov et al, as well as Marks et al.:. When trained on this data, they removed the system prompts used to generate it, similar to Guan et al.

#Gemini #Google