ConvApparel: Measuring and bridging the realism gap in user simulators

Thu, Apr 9 · 12:00 AM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

★ Tier-1 Source

Ofer Meshi and Sally Goldman, Research Scientists, Google Research.

Key facts

Each simulator was tasked with generating 600 conversations, 300 with the "good" agent and 300 with the "bad" agent, allowing them to compare their performance against the human baseline
This research was conducted in collaboration with their co-authors: Krisztian Balog, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, and Craig Boutilier
Ofer Meshi and Sally Goldman, Research Scientists, Google Research
As a scalable alternative, the AI research community has increasingly turned to user simulators — LLM-powered agents explicitly instructed to roleplay as human users

Summary

The team introduce ConvApparel, a new human-AI conversation dataset and a comprehensive evaluation framework designed to quantify the "realism gap" in LLM-based user simulators and improve the training of robust conversational agents. Modern conversational AI agents can typically handle complex, multi-turn tasks like asking clarifying questions and proactively assisting users. As a scalable alternative, the AI research community has increasingly turned to user simulators — LLM-powered agents explicitly instructed to roleplay as human users. In their recent paper, they introduce ConvApparel, a new dataset of human-AI conversations designed to do exactly that.

Read full article at Google Research →

#google