Faster Neural Networks Straight from JPEG

    Abstract

    The simple, elegant approach of training convolutional neural networks (CNNs) directly from RGB pixels has enjoyed overwhelming empirical success. But can more performance be squeezed out of networks by using different input representations? In this paper we propose and explore a simple idea: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec. Intuitively, when processing JPEG images using CNNs, it seems unnecessary to decompress a blockwise frequency representation to an expanded pixel representation, shuffle it from CPU to GPU, and then process it with a CNN that will learn something similar to a transform back to frequency representation in its first layers. Why not skip both steps and feed the frequency domain into the network directly? In this paper we modify \libjpeg to produce DCT coefficients directly, modify a ResNet-50 network to accommodate the differently sized and strided input, and evaluate performance on ImageNet. We find networks that are both faster and more accurate, as well as networks with about the same accuracy but 1.77x faster than ResNet-50.

    Authors

    Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, Jason Yosinski

    Conference

    NeurIPS 2018

    Full Paper

    ‘Faster Neural Networks Straight from JPEG’ (PDF)

    Uber AI

    Comments
    Previous articleProfiling Android Applications with Nanoscope
    Next articleAn Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
    Lionel Gueguen
    Lionel Gueguen is a senior software engineer with Uber ATG.
    Alex Sergeev
    Alex Sergeev is a deep learning engineer on the Machine Learning Platform team.
    Rosanne Liu
    Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.
    Jason Yosinski
    Jason Yosinski is a founding member of Uber AI Labs and there leads the Deep Collective research group. He is known for contributions to understanding neural network modeling, representations, and training. Prior to Uber, Jason worked on robotics at Caltech, co-founded two web companies, and started a robotics program in Los Angeles middle schools that now serves over 500 students. He completed his PhD working at the Cornell Creative Machines Lab, University of Montreal, JPL, and Google DeepMind. He is a recipient of the NASA Space Technology Research Fellowship, has co-authored over 50 papers and patents, and was VP of ML at Geometric Intelligence, which Uber acquired. His work has been profiled by NPR, the BBC, Wired, The Economist, Science, and the NY Times. In his free time, Jason enjoys cooking, reading, paragliding, and pretending he's an artist.