Skip to footer
Home Research Artificial Intelligence / Machine Learning Characterizing how Visual Question Answering models scale with the world

Characterizing how Visual Question Answering models scale with the world


Detecting differences in generalization ability between models for visual question answering tasks has proven to be surprisingly difficult. We propose a new statistic, asymptotic sample complexity, for model comparison, and construct a synthetic data distribution to compare a strong baseline CNN-LSTM model to a structured neural network with powerful inductive biases. Our metric identifies a clear improvement in the structured model’s generalization ability relative to the baseline despite their similarity under existing metrics.


Eli Bingham, Piero Molino, Paul Szerlip, Fritz Obermeyer, Noah D. Goodman


ViGIL @ NeurIPS 2017

Full Paper

‘Characterizing how Visual Question Answering models scale with the world (PDF)

Uber AI

Previous article Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Next article Meta-Learning for Semi-Supervised Few-Shot Classification
Eli is a research scientist working on probabilistic programming, approximate Bayesian inference, and grounded language understanding. He has previously worked on condensed matter physics, computational biology, climatology, multiscale dictionary learning, and deep learning for computer vision. In his spare time he hangs out in a lab tinkering with his nanopore DNA sequencer.
Piero is a Staff Research Scientist in the Hazy research group at Stanford University. He is a former founding member of Uber AI where he created Ludwig, worked on applied projects (COTA, Graph Learning for Uber Eats, Uber’s Dialogue System) and published research on NLP, Dialogue, Visualization, Graph Learning, Reinforcement Learning and Computer Vision.
Paul Szerlip earned his PhD with Dr. Kenneth Stanley at the University of Central Florida focusing on open-source infrastructure for collaborative evolutionary software. This open-source platform enables researchers to quickly integrate crowd-sourced human contributions with automated algorithms, while making the results easily accessible online. His later research highlighted new ways to integrate neuroevolutionary techniques like HyperNEAT and Novelty Search into deep learning frameworks.
Fritz is a research engineer at Uber AI focusing on probabilistic programming. He is the engineering lead for the Pyro team.
In addition to working at Uber AI Labs, Noah is also an Associate Professor of Psychology, Computer Science, and Linguistics at Stanford University, where he runs the Computation and Cognition Lab.