Google · Google Research
ConvApparel: Measuring and bridging the realism gap in user simulators
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
★ Tier-1 Source
Ofer Meshi and Sally Goldman, Research Scientists, Google Research.
Key facts
- Each simulator was tasked with generating 600 conversations, 300 with the "good" agent and 300 with the "bad" agent, allowing them to compare their performance against the human baseline
- This research was conducted in collaboration with their co-authors: Krisztian Balog, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, and Craig Boutilier
- Ofer Meshi and Sally Goldman, Research Scientists, Google Research
- As a scalable alternative, the AI research community has increasingly turned to user simulators — LLM-powered agents explicitly instructed to roleplay as human users
Summary
The team introduce ConvApparel, a new human-AI conversation dataset and a comprehensive evaluation framework designed to quantify the "realism gap" in LLM-based user simulators and improve the training of robust conversational agents. Modern conversational AI agents can typically handle complex, multi-turn tasks like asking clarifying questions and proactively assisting users. As a scalable alternative, the AI research community has increasingly turned to user simulators — LLM-powered agents explicitly instructed to roleplay as human users. In their recent paper, they introduce ConvApparel, a new dataset of human-AI conversations designed to do exactly that.