Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

    Abstract

    The recent “Lottery Ticket Hypothesis” paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keep the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results. Ablating these factors leads to new insights for why LT networks perform as well as they do. We show why setting weights to zero is important, how signs are all you need to make the re-initialized network train, and why masking behaves like training. Finally, we discover the existence of Supermasks, or masks that can be applied to an untrained, randomly initialized network to produce a model with performance far better than chance (86% on MNIST, 41% on CIFAR-10).

    Authors

    Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski

    Conference

    NeurIPS 2019

    Full Paper

    ‘Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask’ (PDF)

    Uber AI

    Comments
    Previous articleDeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch
    Next articleLCA: Loss Change Allocation for Neural Network Training
    Hattie Zhou
    Hattie Zhou is a data scientist with Uber's Marketing Analytics team.
    Janice Lan
    Janice Lan is a research scientist with Uber AI.
    Rosanne Liu
    Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.
    Jason Yosinski
    Jason Yosinski is a founding member of Uber AI Labs and there leads the Deep Collective research group. He is known for contributions to understanding neural network modeling, representations, and training. Prior to Uber, Jason worked on robotics at Caltech, co-founded two web companies, and started a robotics program in Los Angeles middle schools that now serves over 500 students. He completed his PhD working at the Cornell Creative Machines Lab, University of Montreal, JPL, and Google DeepMind. He is a recipient of the NASA Space Technology Research Fellowship, has co-authored over 50 papers and patents, and was VP of ML at Geometric Intelligence, which Uber acquired. His work has been profiled by NPR, the BBC, Wired, The Economist, Science, and the NY Times. In his free time, Jason enjoys cooking, reading, paragliding, and pretending he's an artist.