An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

    Abstract

    Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10–100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST showed 24% better IOU when using CoordConv, and in the RL domain agents playing Atari games benefit significantly from the use of CoordConv layers.

    Authors

    Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, Jason Yosinski

    Conference

    NeurIPS 2018

    Link

    arXiv

    Full Paper

    ‘An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution’ (PDF)

    Uber AI

    Comments
    Previous articleGeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation
    Next articleEvolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm
    Rosanne Liu
    Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.
    Joel Lehman
    Joel Lehman was previously an assistant professor at the IT University of Copenhagen, and researches neural networks, evolutionary algorithms, and reinforcement learning.
    Piero Molino
    Piero Molino is a Senior Research Scientist at Uber AI. He works on natural language understanding and conversational AI. He is a co-founder of Uber AI.
    Felipe Petroski Such
    Felipe Petroski Such is a research scientist focusing on deep neuroevolution, reinforcement learning, and HPC. Prior to joining the Uber AI labs he obtained a BS/MS from the RIT where he developed deep learning architectures for graph applications and ICR as well as hardware acceleration using FPGAs.
    Eric Frank
    Before joining Uber AI Labs as a researcher, Eric invented AI oriented toys for Kite and Rocket Research. He was also a research assistant at the University of Rochester and makes art in his free time.
    Alex Sergeev
    Alex Sergeev is a deep learning engineer on the Machine Learning Platform team.
    Jason Yosinski
    Jason Yosinski is a founding member of Uber AI Labs and there leads the Deep Collective research group. He is known for contributions to understanding neural network modeling, representations, and training. Prior to Uber, Jason worked on robotics at Caltech, co-founded two web companies, and started a robotics program in Los Angeles middle schools that now serves over 500 students. He completed his PhD working at the Cornell Creative Machines Lab, University of Montreal, JPL, and Google DeepMind. He is a recipient of the NASA Space Technology Research Fellowship, has co-authored over 50 papers and patents, and was VP of ML at Geometric Intelligence, which Uber acquired. His work has been profiled by NPR, the BBC, Wired, The Economist, Science, and the NY Times. In his free time, Jason enjoys cooking, reading, paragliding, and pretending he's an artist.