Plug and Play Language Models: A Simple Approach to Controlled Text Generation

    Abstract

    Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the Plug and Play Language Model (PPLM) for controllable language generation, which combines a pretrained LM with one or more simple attribute classifiers that guide text generation without any further training of the LM. In the canonical scenario we present, the attribute models are simple classifiers consisting of a user-specified bag of words or a single learned layer with 100,000 times fewer parameters than the LM. Sampling entails a forward and backward pass in which gradients from the attribute model push the LM’s hidden activations and thus guide the generation. Model samples demonstrate control over a range of topics and sentiment styles, and extensive automated and human annotated evaluations show attribute alignment and fluency. PPLMs are flexible in that any combination of differentiable attribute models may be used to steer text generation, which will allow for diverse and creative applications beyond the examples given in this paper.

    Authors

    Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

    Conference

    ICLR 2020

    Full Paper

    ‘Plug and Play Language Models: A Simple Approach to Controlled Text Generation’ (PDF)

    Uber AI

    Comments
    Previous articleFully Automated HTML and JavaScript Rewriting for Constructing a Self‐healing Web Proxy
    Next articleDiscovering Essential Multiple Gene Effects through Large Scale Optimization: an Application to Human Cancer Metabolism
    Janice Lan
    Janice Lan is a research scientist with Uber AI.
    Jane Hung
    Jane Hung is a research scientist with Uber AI Labs.
    Eric Frank
    Before joining Uber AI Labs as a researcher, Eric invented AI oriented toys for Kite and Rocket Research. He was also a research assistant at the University of Rochester and makes art in his free time.
    Piero Molino
    Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.
    Jason Yosinski
    Jason Yosinski is a founding member of Uber AI Labs and there leads the Deep Collective research group. He is known for contributions to understanding neural network modeling, representations, and training. Prior to Uber, Jason worked on robotics at Caltech, co-founded two web companies, and started a robotics program in Los Angeles middle schools that now serves over 500 students. He completed his PhD working at the Cornell Creative Machines Lab, University of Montreal, JPL, and Google DeepMind. He is a recipient of the NASA Space Technology Research Fellowship, has co-authored over 50 papers and patents, and was VP of ML at Geometric Intelligence, which Uber acquired. His work has been profiled by NPR, the BBC, Wired, The Economist, Science, and the NY Times. In his free time, Jason enjoys cooking, reading, paragliding, and pretending he's an artist.
    Rosanne Liu
    Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.