Normalizing the Normalizers: Comparing and Extending Network Normalization Scheme

    Abstract

    Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. Our second contribution is the finding that a small modification to these normalization schemes, in conjunction with a sparse regularizer on the activations, leads to significant benefits over standard normalization techniques. We demonstrate the effectiveness of our unified divisive normalization framework in the context of convolutional neural nets and recurrent neural networks, showing improvements over baselines in image classification, language modeling as well as super-resolution.

    Authors

    Mengye Ren, Renjie Liao, Raquel Urtasun, Fabian H. Sinz, Richard Zemel

    Conference

    ICLR 2017

    Full Paper

    ‘Normalizing the Normalizers: Comparing and Extending Network Normalization Scheme’ (PDF)

    Uber ATG

    Comments
    Previous articleAnnotating Object Instances with a Polygon-RNN
    Next articleFind Your Way by Observing the Sun and Other Semantic Cues
    Mengye Ren
    Mengye Ren is a research scientist at Uber ATG Toronto. He is also a PhD student in the machine learning group of the Department of Computer Science at the University of Toronto. He studied Engineering Science in his undergrad at the University of Toronto. His research interests are machine learning, neural networks, and computer vision. He is originally from Shanghai, China.
    Renjie Liao
    Renjie Liao is a PhD student in Machine Learning Group, Department of Computer Science, University of Toronto, supervised by Prof. Raquel Urtasun and Prof. Richard Zemel. He is also a Research Scientist in Uber Advanced Technology Group Toronto. He is also affiliated with Vector Institute. He received M.Phil. degree from Department of Computer Science and Engineering, Chinese University of Hong Kong, under the supervision of Prof. Jiaya Jia. He got B.Eng. degree from School of Automation Science and Electrical Engineering in Beihang University (former Beijing University of Aeronautics and Astronautics).
    Raquel Urtasun
    Raquel Urtasun is the Chief Scientist for Uber ATG and the Head of Uber ATG Toronto. She is also a Professor at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award, a Fallona Family Research Award and two Best Paper Runner up Prize awarded CVPR in 2013 and 2017. She was also named Chatelaine 2018 Woman of the year, and 2018 Toronto’s top influencers by Adweek magazine