Characterizing how Visual Question Answering models scale with the world

    Abstract

    Detecting differences in generalization ability between models for visual question answering tasks has proven to be surprisingly difficult. We propose a new statistic, asymptotic sample complexity, for model comparison, and construct a synthetic data distribution to compare a strong baseline CNN-LSTM model to a structured neural network with powerful inductive biases. Our metric identifies a clear improvement in the structured model’s generalization ability relative to the baseline despite their similarity under existing metrics.

    Authors

    Eli Bingham, Piero Molino, Paul Szerlip, Fritz Obermeyer, Noah D. Goodman

    Conference

    ViGIL @ NeurIPS 2017

    Full Paper

    ‘Characterizing how Visual Question Answering models scale with the world (PDF at Github)

    Uber AI

    Comments
    Previous articleDeep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
    Next articleSafe Mutations for Deep and Recurrent Neural Networks through Output Gradients
    Eli Bingham
    Eli is a research scientist working on probabilistic programming, approximate Bayesian inference, and grounded language understanding. He has previously worked on condensed matter physics, computational biology, climatology, multiscale dictionary learning, and deep learning for computer vision. In his spare time he hangs out in a lab tinkering with his nanopore DNA sequencer.
    Piero Molino
    Piero Molino is a Senior Research Scientist at Uber AI. He works on natural language understanding and conversational AI. He is a co-founder of Uber AI.
    Paul Szerlip
    Paul Szerlip earned his PhD with Dr. Kenneth Stanley at the University of Central Florida focusing on open-source infrastructure for collaborative evolutionary software. This open-source platform enables researchers to quickly integrate crowd-sourced human contributions with automated algorithms, while making the results easily accessible online. His later research highlighted new ways to integrate neuroevolutionary techniques like HyperNEAT and Novelty Search into deep learning frameworks.
    Fritz Obermeyer
    Fritz is a research engineer at Uber AI focusing on probabilistic programming. He is the engineering lead for the Pyro team.
    Noah Goodman
    In addition to working at Uber AI Labs, Noah is also an Associate Professor of Psychology, Computer Science, and Linguistics at Stanford University, where he runs the Computation and Cognition Lab.