Skip to footer
Home Research Artificial Intelligence / Machine Learning Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning

Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning


We present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language. Using DSTC2 as seed data, we trained natural language understanding (NLU) and generation (NLG) networks for each agent and let the agents interact online. We model the interaction as a stochastic collaborative game where each agent (player) has a role (“assistant”, “tourist”, “eater”, etc.) and their own objectives, and can only interact via natural language they generate. Each agent, therefore, needs to learn to operate optimally in an environment with multiple sources of uncertainty (its own NLU and NLG, the other agent’s NLU, Policy, and NLG). In our evaluation, we show that the stochastic-game agents outperform deep learning based supervised baselines.


Alexandros Papangelis, Yi-Chia Wang, Piero Molino, Gokhan Tur



Full Paper

‘Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning’ (PDF)

Uber AI

Previous article Stakeholders as Researchers: Empowering non-researchers to interact directly with consumers
Next article Probabilistic Programming for Birth-Death Models of Evolution Using an Alive Particle Filter with Delayed Sampling
Alexandros Papangelis is a senior research scientist at Uber AI, on the Conversational AI team; his interests include statistical dialogue management, natural language processing, and human-machine social interactions. Prior to Uber, he was with Toshiba Research Europe, leading the Cambridge Research Lab team on Statistical Spoken Dialogue. Before joining Toshiba, he was a postdoctoral fellow at CMU's Articulab, working with Justine Cassell on designing and developing the next generation of socially-skilled virtual agents. He received his PhD from the University of Texas at Arlington, MSc from University College London, and BSc from the University of Athens.
Yi-Chia Wang is a research scientist at Uber AI, focusing on the conversational AI. She received her Ph.D. from the Language Technologies Institute in School of Computer Science at Carnegie Mellon University. Her research interests and skills are to combine language processing technologies, machine learning methodologies, and social science theories to statistically analyze large-scale data and model human-human / human-bot behaviors. She has published more than 20 peer-reviewed papers in top-tier conferences/journals and received awards, including the CHI Honorable Mention Paper Award, the CSCW Best Paper Award, and the AIED Best Student Paper Nomination.
Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.
Gokhan Tur is a Director of Engineering with Uber AI.