Pathwise Derivatives for Multivariate Distributions


    We exploit the link between the transport equation and derivatives of expectations to construct efficient pathwise gradient estimators for multivariate distributions. We focus on two main threads. First, we use null solutions of the transport equation to construct adaptive control variates that can be used to construct gradient estimators with reduced variance. Second, we consider the case of multivariate mixture distributions. In particular we show how to compute pathwise derivatives for mixtures of multivariate Normal distributions with arbitrary means and diagonal covariances. We demonstrate in a variety of experiments in the context of variational inference that our gradient estimators can outperform other methods, especially in high dimensions.


    Martin Jankowiak, Theofanis Karaletsos


    AI STATS 19

    Full Paper

    ‘Pathwise Derivatives for Multivariate Distributions’ (PDF)

    Uber AI

    Previous articleHierarchical Recurrent Attention Networks for Structured Online Maps
    Next articleEnd-to-end Learning of Multi-sensor 3D Tracking by Detection
    Martin Jankowiak
    Martin Jankowiak is a senior research scientist at Uber whose research focuses on probabilistic machine learning. He is a co-creator of the Pyro probabilistic programming language.
    Theofanis Karaletsos
    Theofanis took his first steps as a machine learner at the Max Planck Institute For Intelligent Systems in collaboration with Microsoft Research Cambridge with work focused on unsupervised knowledge extraction from unstructured data, such as generative modeling of images and phenotyping for biology. He then moved to Memorial Sloan Kettering Cancer Center in New York, where he worked on machine learning in the context of cancer therapeutics. He joined a small AI startup Geometric Intelligence in 2016 and with his colleagues formed the new Uber AI Labs. Theofanis' research interests are focused on rich probabilistic modeling, approximate inference and probabilistic programming. His main passion are structured models, examples of which are spatio-temporal processes, models of image formation, deep probabilistic models and the tools needed to make them work on real data. His past in the life sciences has also made him keenly interested in how to make models interpretable and quantify their uncertainty, non-traditional learning settings such as weakly supervised learning and model criticism.