# Zoubin Ghahramani

## Engineering Blog Articles

### First Uber Science Symposium: Discussing the Next Generation of RL, NLP, ConvAI, and DL

The Uber Science Symposium featured talks from members of the broader scientific community about the the latest innovations in RL, NLP, and other fields.

### Announcing the 2019 Uber AI Residency

The Uber AI Residency is a 12-month training program for academics and professionals interested in becoming an AI researcher with Uber AI Labs or Uber ATG.

### Introducing the Uber AI Residency

Interested in accelerating your career by tackling some of Uber’s most challenging AI problems? Apply for the Uber AI Residency, a research fellowship dedicated to fostering the next generation of AI talent.

### Welcoming Peter Dayan to Uber AI Labs

Arriving now: Uber's Chief Scientist Zoubin Ghahramani introduces Uber AI Labs' newest team member, award-winning neuroscientist Peter Dayan.

## Research Papers

### Probabilistic Meta-Representations Of Neural Networks

**T. Karaletsos**,

**P. Dayan**,

**Z. Ghahramani**

Existing Bayesian treatments of neural networks are typically characterized by weak prior and approximate posterior distributions according to which all the weights are drawn independently. Here, we consider a richer prior distribution in which units in the network are represented by latent variables, and the weights between units are drawn conditionally on the values of the collection of those variables. [...]

**[PDF at arXiv]**

*UAI 2018 Uncertainty In Deep Learning Workshop*

**(UDL)**, 2018### Functional Programming for Modular Bayesian Inference

A. Ścibior, O. Kammar,

We present an architectural design of a library for Bayesian modelling and inference in modern functional programming languages. The novel aspect of our approach are modular implementations of existing state-ofthe-art inference algorithms. Our design relies on three inherently functional features: higher-order functions, inductive data-types, and support for either type-classes or an expressive module system. [...]

**Z. Ghahramani**We present an architectural design of a library for Bayesian modelling and inference in modern functional programming languages. The novel aspect of our approach are modular implementations of existing state-ofthe-art inference algorithms. Our design relies on three inherently functional features: higher-order functions, inductive data-types, and support for either type-classes or an expressive module system. [...]

**[PDF at University of Cambridge]***2019*### Discovering Interpretable Representations for Both Deep Generative and Discriminative Models

T. Adel,

Interpretability of representations in both deep generative and discriminative models is highly desirable. Current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. We propose two interpretability frameworks. First, we provide an interpretable lens for an existing model. We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. [...]

**Z. Ghahramani**, A. WellerInterpretability of representations in both deep generative and discriminative models is highly desirable. Current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. We propose two interpretability frameworks. First, we provide an interpretable lens for an existing model. We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. [...]

**[PDF at Proceedings of Machine Learning Research]***International Conference on Machine Learning (***ICML**), 2018### Variational Bayesian dropout: pitfalls and fixes

J. Hron, A. Matthews,

Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm [...]

**Z. Ghahramani**Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm [...]

**[PDF at arXiv]***International Conference on Machine Learning (***ICML**), 2018### Gaussian Process Behaviour in Wide Deep Neural Networks

Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner,

Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. [...]

**Zoubin Ghahramani**Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. [...]

**[PDF at OpenReview.net]***International Conference on Learning Representations (***ICLR**), 2018### The Mirage of Action-Dependent Baselines in Reinforcement Learning

G. Tucker, S. Bhupatiraju, S. Gu, R. Turner,

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. [...]

**Z. Ghahramani**, S. LevinePolicy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. [...]

**[PDF]***International Conference on Machine Learning (***ICML**), 2018### Weakly supervised collective feature learning from curated media

Y. Mukuta, A. Kimura, D. Adrian,

The current state-of-the-art in feature learning relies on the supervised learning of large-scale datasets consisting of target content items and their respective category labels. However, constructing such large-scale fully-labeled datasets generally requires painstaking manual effort. One possible solution to this problem is to employ community contributed text tags as weak labels, however, the concepts underlying a single text tag strongly depends on the users. [...]

**Z. Ghahramani**The current state-of-the-art in feature learning relies on the supervised learning of large-scale datasets consisting of target content items and their respective category labels. However, constructing such large-scale fully-labeled datasets generally requires painstaking manual effort. One possible solution to this problem is to employ community contributed text tags as weak labels, however, the concepts underlying a single text tag strongly depends on the users. [...]

**[PDF at arXiv]***AAAI Conference on Artificial Intelligence (***AAAI**), 2018### Variational Gaussian Dropout is not Bayesian

J. Hron, A. Matthews,

Gaussian multiplicative noise is commonly used as a stochastic regularisation technique in training of deterministic neural networks. A recent paper reinterpreted the technique as a specific algorithm for approximate inference in Bayesian neural networks; several extensions ensued. [...]

**Z. Ghahramani**Gaussian multiplicative noise is commonly used as a stochastic regularisation technique in training of deterministic neural networks. A recent paper reinterpreted the technique as a specific algorithm for approximate inference in Bayesian neural networks; several extensions ensued. [...]

**[PDF at Bayesian Deep Learning]***Bayesian Deep Learning Workshop @ NeurIPS, 2017*### Lost Relatives of the Gumbel Trick

M. Balog, N. Tripuraneni,

The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. [...]

**Z. Ghahramani**, A. WellerThe Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. [...]

**[PDF at Proceedings of Machine Learning Research]***International Conference on Machine Learning***(ICML)**, 2017### A birth-death process for feature allocation

K. Palla, D. Knowles,

We propose a Bayesian nonparametric prior over feature allocations for sequential data, the birthdeath feature allocation process (BDFP). The BDFP models the evolution of the feature allocation of a set of N objects across a covariate (e.g. time) by creating and deleting features. [...]

**Z. Ghahramani**We propose a Bayesian nonparametric prior over feature allocations for sequential data, the birthdeath feature allocation process (BDFP). The BDFP models the evolution of the feature allocation of a set of N objects across a covariate (e.g. time) by creating and deleting features. [...]

**[PDF at Proceedings of Machine Learning Research]***International Conference on Machine Learning***(ICML)**, 2017### Automatic Discovery of the Statistical Types of Variables in a Dataset

I. Valera,

A common practice in statistics and machine learning is to assume that the statistical data types (e.g., ordinal, categorical or real-valued) of variables, and usually also the likelihood model, is known. However, as the availability of real-world data increases, this assumption becomes too restrictive. [...]

**Z. Ghahramani**A common practice in statistics and machine learning is to assume that the statistical data types (e.g., ordinal, categorical or real-valued) of variables, and usually also the likelihood model, is known. However, as the availability of real-world data increases, this assumption becomes too restrictive. [...]

**[PDF at Proceedings of Machine Learning Research]***International Conference on Machine Learning***(ICML)**, 2017### General Latent Feature Modeling for Data Exploration Tasks

I. Valera, M. Pradier,

This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. [...]

**Z. Ghahramani**This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. [...]

**[PDF at OpenReview.net]***ICML Workshop on Human Interpretability in Machine Learning***(ICML)**, 2017### Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

S. Gu, T. Lillicrap, R. Turner,

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. [...]

**Z. Ghahramani**, B. Schölkopf, S. LevineOff-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. [...]

**[PDF at NIPS Proceedings]***Advances in Neural Information Processing Systems***(NeurIPS)**, 2017### Deep Bayesian Active Learning with Image Data

Y. Gal, R. Islam,

Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. [...]

**Z. Ghahramani**Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. [...]

**[PDF at Proceedings of Machine Learning Research]***International Conference on Machine Learning***(ICML)**, 2017### Bayesian inference on random simple graphs with power law degree distributions

J. Lee, C. Heaukulani,

We present a model for random simple graphs with a degree distribution that obeys a power law (i.e., is heavy-tailed). To attain this behavior, the edge probabilities in the graph are constructed from Bertoin-Fujita-Roynette-Yor (BFRY) random variables, which have been recently utilized in Bayesian statistics for the construction of power law models in several applications. [...]

**Z. Ghahramani**, L. James, S. ChoiWe present a model for random simple graphs with a degree distribution that obeys a power law (i.e., is heavy-tailed). To attain this behavior, the edge probabilities in the graph are constructed from Bertoin-Fujita-Roynette-Yor (BFRY) random variables, which have been recently utilized in Bayesian statistics for the construction of power law models in several applications. [...]

**[PDF at arXiv]***International Conference on Machine Learning***(ICML)**, 2017### Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

S. Gu, T. Lillicrap,

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. [...]

**Z. Ghahramani**, R. Turner, S. LevineModel-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. [...]

**[PDF at OpenReview.net]***International Conference on Learning Representations***(ICLR)**, 2016### Magnetic Hamiltonian Monte Carlo

N. Tripuraneni, M. Rowland,

Hamiltonian Monte Carlo (HMC) exploits Hamiltonian dynamics to construct efficient proposals for Markov chain Monte Carlo (MCMC). In this paper, we present a generalization of HMC which exploits \textit{non-canonical} Hamiltonian dynamics. [...]

**Z. Ghahramani**, R. TurnerHamiltonian Monte Carlo (HMC) exploits Hamiltonian dynamics to construct efficient proposals for Markov chain Monte Carlo (MCMC). In this paper, we present a generalization of HMC which exploits \textit{non-canonical} Hamiltonian dynamics. [...]

**[PDF at arXiv]***International Conference on Machine Learning***(ICML)**, 2017