Skip to footer

Raquel Urtasun

Raquel Urtasun
3 BLOG ARTICLES 50 RESEARCH PAPERS
Raquel Urtasun is the Chief Scientist for Uber ATG and the Head of Uber ATG Toronto. She is also a Professor at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award, a Fallona Family Research Award and two Best Paper Runner up Prize awarded CVPR in 2013 and 2017. She was also named Chatelaine 2018 Woman of the year, and 2018 Toronto’s top influencers by Adweek magazine

Engineering Blog Articles

Announcing the 2019 Uber AI Residency

The Uber AI Residency is a 12-month training program for academics and professionals interested in becoming an AI researcher with Uber AI Labs or Uber ATG.

Introducing the Uber AI Residency

Interested in accelerating your career by tackling some of Uber’s most challenging AI problems? Apply for the Uber AI Residency, a research fellowship dedicated to fostering the next generation of AI talent.

SBNet: Leveraging Activation Block Sparsity for Speeding up Convolutional Neural Networks

Uber ATG Toronto developed Sparse Blocks Network (SBNet), an open source algorithm for TensorFlow, to speed up inference of our 3D vehicle detection systems while lowering computational costs.

Research Papers

Learning to Localize through Compressed Binary Maps

X. Wei, I. A. Bârsan, J. Martinez, S. Wang, R. Urtasun
One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps. In this paper we propose to learn to compress the map representation such that it is optimal for the localization task. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Convolutional Recurrent Network for Road Boundary Extraction

J. Liang, N. Homayounfar, S. Wang, W.-C. Ma, R. Urtasun
Creating high definition maps that contain precise information of static elements of the scene is of utmost importance for enabling self driving cars to drive safely. In this paper, we tackle the problem of drivable road boundary extraction from LiDAR and camera imagery. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Multi-Task Multi-Sensor Fusion for 3D Object Detection

M. Liang, B. Yang, Y. Chen, R. Hu, R. Urtasun
In this paper we propose to exploit multiple related tasks for accurate multi-sensor 3D object detection. Towards this goal we present an end-to-end learnable architecture that reasons about 2D and 3D object detection as well as ground estimation and depth completion. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Deep Rigid Instance Scene Flow

W.-C. Ma, S. Wang, R. Hu, Y. Xiong, R. Urtasun
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Dimensionality Reduction for Representing the Knowledge of Probabilistic Models

M. T. Law, J. Snell, A.-M. Farahmand, R. Urtasun, R. S. Zemel
Most deep learning models rely on expressive high-dimensional representations to achieve good performance on tasks such as classification. However, the high dimensionality of these representations makes them difficult to interpret and prone to over-fitting. We propose a simple, intuitive and scalable dimension reduction framework that takes into account the soft probabilistic interpretation of standard deep models for classification. [...] [PDF]
International Conference on Learning Representations (ICLR), 2019

DARNet: Deep Active Ray Network for Building Segmentation

D. Cheng, R. Liao, S. Fidler, R. Urtasun
In this paper, we propose a Deep Active Ray Network (DARNet) for automatic building segmentation. Taking an image as input, it first exploits a deep convolutional neural network (CNN) as the backbone to predict energy maps, which are further utilized to construct an energy function. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

UPSNet: A Unified Panoptic Segmentation Network

Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, R. Urtasun
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

DeepSignals: Predicting Intent of Drivers Through Visual Attributes

D. Frossard, E. Kee, R. Urtasun
Detecting the intention of drivers is an essential task in self-driving, necessary to anticipate sudden events like lane changes and stops. Turn signals and emergency flashers communicate such intentions, providing seconds of potentially critical reaction time. In this paper, we propose to detect these signals in video sequences by using a deep neural network that reasons about both spatial and temporal information. [...] [PDF]
International Conference on Robotics and Automation (ICRA), 2019

Neural Guided Constraint Logic Programming for Program Synthesis

L. Zhang, G. Rosenblatt, E. Fetaya, R. Liao, W. Byrd, M. Might, R. Urtasun, R. Zemel
Synthesizing programs using example input/outputs is a classic problem in artificial intelligence. We present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren. [...] [PDF]
Advances in Neural Information Processing Systems (NeurIPS), 2018

LanczosNet: Multi-Scale Deep Graph Convolutional Networks

R. Liao, Z. Zhao, R. Urtasun, R. Zemel
Relational data can generally be represented as graphs. For processing such graph structured data, we propose LanczosNet, which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution. [...] [PDF]
Neural Information Processing Systems (NeurIPS), 2018

Graph HyperNetworks for Neural Architecture Search

C. Zhang, M. Ren, R. Urtasun
Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs. However, it can be prohibitively expensive as the search requires training thousands of different networks, while each can last for hours. In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. [...] [PDF]
Meta Learning workshop @ Neural Information Processing Systems (NeurIPS), 2018

Learning to Localize Using a LiDAR Intensity Map

I. Bârsan, S. Wang, A. Pokrovsky, R. Urtasun
In this paper we propose a real-time, calibration-agnostic and effective localization system for self-driving cars. Our method learns to embed the online LiDAR sweeps and intensity map into a joint deep embedding space. [...] [PDF]
Conference on Robot Learning (CORL), 2018

Deep Multi-Sensor Lane Detection

M. Bai, G. Mattyus, N. Homayounfar, S. Wang, S. K. Lakshmikanth, R. Urtasun
Reliable and accurate lane detection has been a long-standing problem in the field of autonomous driving. In recent years, many approaches have been developed that use images (or videos) as input and reason in image space. In this paper we argue that accurate image estimates do not translate to precise 3D lane boundaries, which are the input required by modern motion planning algorithms. [...] [PDF]
International Conference on Intelligent Robots and Systems (IROS), 2018

HDNET: Exploiting HD Maps for 3D Object Detection

B. Yang, M. Liang, R. Urtasun
In this paper we show that High-Definition (HD) maps provide strong priors that can boost the performance and robustness of modern 3D object detectors. Towards this goal, we design a single stage detector that extracts geometric and semantic features from the HD maps. [...] [PDF]
Conference on Robot Learning (CORL), 2018

IntentNet: Learning to Predict Intention from Raw Sensor Data

S. Casas, W. Luo, R. Urtasun
In order to plan a safe maneuver, self-driving vehicles need to understand the intent of other traffic participants. We define intent as a combination of discrete high level behaviors as well as continuous trajectories describing future motion. In this paper we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment. [...] [PDF]
Conference on Robot Learning (CORL), 2018

Efficient Convolutions for Real-Time Semantic Segmentation of 3D Point Clouds

C. Zhang, W. Luo, R. Urtasun
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. [...] [PDF]
International Conference on 3D Vision (3DV), 2018

Deep Continuous Fusion for Multi-Sensor 3D Object Detection

M. Liang, B. Yang, S. WangR. Urtasun
In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. Towards this goal, we design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. [...] [PDF]
European Conference on Computer Vision (ECCV), 2018

Single Image Intrinsic Decomposition Without a Single Intrinsic Image

W. Ma, H, Chu, B. Zhou, R. Urtasun, A. Torralba
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. [...] [PDF]
European Conference on Computer Vision (ECCV), 2018

End-to-End Deep Structured Models for Drawing Crosswalks

J. Liang, R. Urtasun
In this paper we address the problem of detecting crosswalks from LiDAR and camera imagery. Towards this goal, given multiple Li-DAR sweeps and the corresponding imagery, we project both inputs onto the ground surface to produce a top down view of the scene. [...] [PDF]
European Conference on Computer Vision (ECCV), 2018

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

M. Teichmann, M. Weber, M. Zöllner, R. Cipolla, R. Urtasun
While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. [...] [PDF]
IEEE Intelligent Vehicles Symposium (IV), 2018

Reviving and Improving Recurrent Back Propagation

R. Liao, Y. Xiong, E. Fetaya, L. Zhang, K. Yoon, X. Pitkow, R. Urtasun, R. Zemel
In this paper, we revisit the recurrent back-propagation (RBP) algorithm, discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). [...] [PDF]
Conference on Computer Vision and Pattern Recognition (ICML), 2018

Learning to Reweight Examples for Robust Deep Learning

M. Ren, W. Zeng, B. Yang, R. Urtasun
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. [...] [PDF]
Conference on Computer Vision and Pattern ( ICML), 2018

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation

X. Qi, R. Liao, Z. Liu, R. Urtasun, J. Jia
In this paper, we propose Geometric Neural Network (GeoNet) to jointly predict depth and surface normal maps from a single image. Building on top of two-stream CNNs, our GeoNet incorporates geometric relation between depth and surface normal via the new depth-to-normal and normal-to-depth networks. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Matching Adversarial Networks

G. Mattyus, R. Urtasun
Generative Adversarial Nets (GANs) and Conditonal GANs (CGANs) show that using a trained network as loss function (discriminator) enables to synthesize highly structured outputs (e.g. natural images). However, applying a discriminator network as a universal loss function for common supervised tasks (e.g. semantic segmentation, line detection, depth estimation) is considerably less successful. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Deep Parametric Continuous Convolutional Neural Networks

S. Wang, S. Suo, W. Ma, A. PokrovskyR. Urtasun
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Learning deep structured active contours end-to-end

D. Marcos, D. Tuia, B. Kellenberger, L. Zhang, M. Bai, R. Liao, R. Urtasun
The world is covered with millions of buildings, and precisely knowing each instance's position and extents is vital to a multitude of applications. Recently, automated building footprint segmentation models have shown superior detection accuracy thanks to the usage of Convolutional Neural Networks (CNN). [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

SBNet: Sparse Block’s Network for Fast Inference

M. Ren, A. Pokrovsky, B. Yang, R. Urtasun
Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

H. Chu, W. Ma, K. Kundu, R. Urtasun, S. Fidler
The last few years have seen approaches trying to combine the increasing popularity of depth sensors and the success of the convolutional neural networks. Using depth as additional channel alongside the RGB input has the scale variance problem present in image convolution based approaches. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a...

W. Luo, B. Yang, R. Urtasun
In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse data at range. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

End-to-end Learning of Multi-sensor 3D Tracking by Detection

D. Frossard, R. Urtasun
In this paper we propose a novel approach to tracking by detection that can exploit both cameras as well as LIDAR data to produce very accurate 3D trajectories. Towards this goal, we formulate the problem as a linear program that can be solved exactly, and learn convolutional networks for detection as well as matching in an end-to-end manner. [...] [PDF]
International Conference on Robotics and Automation (ICRA), 2018

Hierarchical Recurrent Attention Networks for Structured Online Maps

N. Homayounfar, W. Ma, S. Lakshmikanth, R. Urtasun
In this paper, we tackle the problem of online road network extraction from sparse 3D point clouds. Our method is inspired by how an annotator builds a lane graph, by first identifying how many lanes there are and then drawing each one in turn. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

PIXOR: Real-time 3D Object Detection from Point Clouds

B. Yang, W. Luo, R. Urtasun
We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Leveraging Constraint Logic Programming for Neural Guided Program Synthesis

L. Zhang, G. Rosenblatt, E. Fetaya, R. Liao, W. Byrd, R. Urtasun, R. Zemel
We present a method for solving Programming by Example (PBE) problems that tightly integrates a neural network with a constraint logic programming system called miniKanren. Internally, miniKanren searches for a program that satisfies the recursive constraints imposed by the provided examples. [...] [PDF]
International Conference on Machine Learning (ICLR), 2018

Graph Partition Neural Networks for Semi-Supervised Classification

R. Liao, M. Brockschmidt, D. Tarlow, A. Gaunt, R. Urtasun, R. Zemel
We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally propagating information between the subgraphs. [...] [PDF]
Workshop @ International Conference on Machine Learning (ICLR), 2018

Sports Field Localization via Deep Structured Models

N. Homayounfar, S. Fidler, R. Urtasun
In this work, we propose a novel way of efficiently localizing a soccer field from a single broadcast image of the game. Related work in this area relies on manually annotating a few key frames and extending the localization to similar images, or installing fixed specialized cameras in the stadium from which the layout of the field can be obtained. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Inference in Probabilistic Graphical Models by Graph Neural Networks

K. Yoon, R. Liao, Y. Xiong, L. Zhang, E. Fetaya, R. Urtasun, R. Zemel, X. Pitkow
A fundamental computation for statistical inference and accurate decision-making is to compute the marginal probabilities or most probable states of task-relevant variables. Probabilistic graphical models can efficiently represent the structure of such complex data, but performing these inferences is generally difficult. [...] [PDF]
Workshop @ International Conference on Learning Representations (ICLR), 2018

The Reversible Residual Network: Backpropagation Without Storing Activations

A. Gomez, M. Ren, Raquel Urtasun, R. Grosse
Residual Networks (ResNets) have demonstrated significant improvement over traditional Convolutional Neural Networks (CNNs) on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck as one needs to store all the intermediate activations for calculating gradients using backpropagation. [...] [PDF]
Advances in Neural Information Processing Systems (NeurIPS), 2017

Be Your Own Prada: Fashion Synthesis With Structural Coherence

S. Zhu, R. Urtasun, S. Fidler, D. Lin, C. Loy
We present a novel and effective approach for generating new clothing on a wearer through generative adversarial learning. Given an input image of a person and a sentence describing a different outfit, our model "redresses" the person as desired, while at the same time keeping the wearer and her/his pose unchanged. [...] [PDF]
International Conference on Computer Vision (ICCV), 2017

3D Graph Neural Networks for RGBD Semantic Segmentation

X. Qi, R. Liao, J. Jia, S. Fidler, R. Urtasun
RGBD semantic segmentation requires joint reasoning about 2D appearance and 3D geometric information. In this paper we propose a 3D graph neural network (3DGNN) that builds a k-nearest neighbor graph on top of 3D point cloud. [...] [PDF]
International Conference on Computer Vision (ICCV), 2017

SGN: Sequential Grouping Networks for Instance Segmentation

S. Liu, J. Jia, S. Fidler, R. Urtasun
In this paper, we propose Sequential Grouping Networks (SGN) to tackle the problem of object instance segmentation. SGNs employ a sequence of neural networks, each solving a sub-grouping problem of increasing semantic complexity in order to gradually compose objects out of pixels. [...] [PDF]
International Conference on Computer Vision (ICCV), 2017

Situation Recognition With Graph Neural Networks

R. Li, M. Tapaswi, R. Liao, J. Jia, R. Urtasun, S. Fidler
We address the problem of recognizing situations in images. Given an image, the task is to predict the most salient verb (action), and fill its semantic roles such as who is performing the action, what is the source and target of the action, etc. [...] [PDF]
International Conference on Computer Vision (ICCV), 2017

Deep Spectral Clustering Learning

M. T. Law, R. Urtasun, R. S. Zemel
Clustering is the task of grouping a set of examples so that similar examples are grouped into the same cluster while dissimilar examples are in different clusters. The quality of a clustering depends on two problem-dependent factors which are i) the chosen similarity metric and ii) the data representation. Supervised clustering approaches, which exploit labeled partitioned datasets have thus been proposed, for instance to learn a metric optimized to perform clustering. [...] [PDF]
International Conference on Machine Learning (ICML), 2017

Efficient Multiple Instance Metric Learning Using Weakly Supervised Data

M. T. Law, Y. Yu, R. Urtasun, R. S. Zemel, E. P. Xing
We consider learning a distance metric in a weakly supervised setting where “bags” (or sets) of instances are labeled with “bags” of labels. A general approach is to formulate the problem as a Multiple Instance Learning (MIL) problem where the metric is learned so that the distances between instances inferred to be similar are smaller than the distances between instances inferred to be dissimilar. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Few-Shot Learning Through an Information Retrieval Lens

E. Triantafillou, R. Zemel, R. Urtasun
Few-shot learning refers to understanding new concepts from only a few examples. We propose an information retrieval-inspired approach for this problem that is motivated by the increased importance of maximally leveraging all the available information in this low-data regime. [PDF]
Code: [LINK]
Advances in Neural Information Processing Systems (NeurIPS), 2017

Find Your Way by Observing the Sun and Other Semantic Cues

W.-C. Ma, S. Wang, M. Brubaker, S. Fidler, R. Urtasun
In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world. Towards this goal, we utilize freely available cartographic maps and derive a probabilistic model that exploits semantic cues in the form of sun direction, presence of an intersection, road type, speed limit as well as the ego-car trajectory in order to produce very reliable localization results. [...] [PDF]
International Conference on Robotics and Automation (ICRA), 2017

Normalizing the Normalizers: Comparing and Extending Network Normalization Scheme

M. Ren, R. Liao, R. Urtasun, F. H. Sinz, R. Zemel
Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. [...] [PDF]
International Conference on Learning Representations (ICLR), 2017

Annotating Object Instances with a Polygon-RNN

L. Castrejón, K. Kundu, R. Urtasun, S. Fidler
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Towards Diverse and Natural Image Descriptions via a Conditional GAN

B. Dai, S. Fidler, R. Urtasun, D. Lin
In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712.5 km² of land, 8439 km of road and around 400,000 buildings. Our benchmark provides different perspectives of the world captured from airplanes, drones and cars driving around the city. [...] [PDF]
International Conference on Computer Vision (ICCV), 2017

TorontoCity: Seeing the World With a Million Eyes

S. Wang; M. Bai; G. Mattyus; H. Chu; W. Luo; B. Yang; J. Liang; J. Cheverie; R. Urtasun; D. Lin.
Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. [...] [PDF]
International Conference on Computer Vision (ICCV), 2017

Deep Watershed Transform for Instance Segmentation

M. Bai, R. Urtasun
Most contemporary approaches to instance segmentation use complex pipelines involving conditional random fields, recurrent neural networks, object proposals, or template matching schemes. In our paper, we present a simple yet powerful end-to-end convolutional neural network to tackle this task. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Popular Articles