Skip to footer

[vc_row gap=”4″][vc_column width=”1/3″][vc_widget_sidebar sidebar_id=”td-abstract”][/vc_column][vc_column width=”2/3″]

Results for Computer Vision

Identifying Unknown Instances for Autonomous Driving

K. Wong, S. Wang, M. Ren, M. Liang, R. Urtasun
We propose a novel open-set instance segmentation algorithm for point clouds that identifies instances from both known and unknown classes. In particular, we train a deep convolutional neural network that projects points belonging to the same instance together in a category-agnostic embedding space. [PDF]
The Conference on Robot Learning (CoRL), 2019

Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction

A. Jain, S. Casas, R. Liao, Y. Xiong, S. Feng, S. Segal, R. Urtasun
Our research shows that non-parametric distributions can capture extremely well the (erratic) pedestrian behavior. We propose Discrete Residual Flow, a convolutional neural network for human motion prediction that accurately models the temporal dependencies and captures the uncertainty inherent in long-range motion forecasting. In particular, our method captures multi-modal posteriors over future human motion very realistically. [PDF]
Conference on Neural Information Processing Systems (NeurIPS), 2019

Learning Joint 2D-3D Representations for Depth Completion

Y. Chen, B. Yang, M. Liang, R. Urtasun
We design a simple yet effective architecture that fuses information between 2D and 3D representations at multiple levels to learn fully fused joint representations at multiple levels, and show state-of-the-art results on the KITTI depth completion benchmark. [PDF]
International Conference on Computer Vision (ICCV), 2019

DAGMapper: Learning to Map by Discovering Lane Topology

N. Homayounfar, W.-C. Ma\*, J. Liang\*, X. Wu, J. Fan, R. Urtasun
We map complex lane topologies in highways by formulating the problem as a deep directed graphical model. As an interesting result, we can train our model in I40 and generalize to unseen highways in SF. [PDF]
International Conference on Computer Vision (ICCV), 2019

DMM-Net: Differentiable Mask-Matching Network for Video Instance Segmentation

X. Zeng, R. Liao, L. Gu, Y. Xiong, S. Fidler, R. Urtasun
We propose the differentiable mask-matching network (DMM-Net) for solving the video instance segmentation problem where the initial instance masks are provided. On DAVIS 2017 dataset, DMM-Net achieves the best performance without online learning on the first frames and the 2nd best with it. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. [PDF]
International Conference on Computer Vision (ICCV), 2019

LCA: Loss Change Allocation for Neural Network Training

J. Lan, R. Liu, H. Zhou, J. Yosinski
Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. […] [PDF]
Conference on Neural Information Processing Systems (NeurIPS), 2019

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

H. Zhou, J. Lan, R. Liu, J. Yosinski
Optical Character Recognition (OCR) approaches have been widely advanced in recent years thanks to the resurgence of deep learning. The state-of-the-art models are mainly trained on the datasets consisting of the constrained scenes. Detecting and recognizing text from the real-world images remains a technical challenge. […] [PDF]
Conference on Neural Information Processing Systems (NeurIPS), 2019

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch

S. Duggal, S. Wang, W.-C. Ma, R. Hu, R. Urtasun
We propose a real-time dense depth estimation approach using stereo image pairs, which utilizes differentiable Patch Match to progressively prune the stereo matching search space. Our model achieves competitive performance on the KITTI benchmark despite running in real time. [PDF]
International Conference on Computer Vision (ICCV), 2019

DSIC: Deep Stereo Image Compression

J. Liu, S. Wang, R. Urtasun
We design a novel architecture for compressing a stereo image pair that tries to extract as much shared information from the first image in order to reduce the bitrate of the second image. We demonstrate an impressive 30-50% reduction in the second image bitrate at low bitrates. [PDF]
International Conference on Computer Vision (ICCV), 2019

LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving

G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, C. Wellington
In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of the sensor, where the input data is naturally compact. […]
[PDF]
Computer Vision and Pattern Recognition (CVPR), 2019

End-to-end Interpretable Neural Motion Planner

W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, R. Urtasun
In this paper, we propose a neural motion planner for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. Towards this goal, we design a holistic model that takes as input raw LIDAR data and an HD map and produces interpretable intermediate representations in the form of 3D detections and their future trajectories, as well as a cost volume defining the goodness of each position that the self-driving car can take within the planning horizon. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Learning to Localize through Compressed Binary Maps

X. Wei, I. A. Bârsan, S. Wang, J. Martinez, R. Urtasun
One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps. In this paper we propose to learn to compress the map representation such that it is optimal for the localization task. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Convolutional Recurrent Network for Road Boundary Extraction

J. Liang, N. Homayounfar, S. Wang, W.-C. Ma, R. Urtasun
Creating high definition maps that contain precise information of static elements of the scene is of utmost importance for enabling self driving cars to drive safely. In this paper, we tackle the problem of drivable road boundary extraction from LiDAR and camera imagery. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Multi-Task Multi-Sensor Fusion for 3D Object Detection

M. Liang, B. Yang, Y. Chen, R. Hu, R. Urtasun
In this paper we propose to exploit multiple related tasks for accurate multi-sensor 3D object detection. Towards this goal we present an end-to-end learnable architecture that reasons about 2D and 3D object detection as well as ground estimation and depth completion. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Deep Rigid Instance Scene Flow

W.-C. Ma, S. Wang, R. Hu, Y. Xiong, R. Urtasun
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

DARNet: Deep Active Ray Network for Building Segmentation

D. Cheng, R. Liao, S. Fidler, R. Urtasun
In this paper, we propose a Deep Active Ray Network (DARNet) for automatic building segmentation. Taking an image as input, it first exploits a deep convolutional neural network (CNN) as the backbone to predict energy maps, which are further utilized to construct an energy function. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

UPSNet: A Unified Panoptic Segmentation Network

Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, R. Urtasun
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Learning a Generative Model for Multi-Step Human-Object Interactions from Videos

H. Wang, S. Pirk, V. Kim, E. Yumer, L. Guibas
Creating dynamic virtual environments consisting of humans interacting with objects is a fundamental problem in computer graphics. While it is well-accepted that agent interactions play an essential role in synthesizing such scenes, most extant techniques exclusively focus on static scenes, leaving the dynamic component out. In this paper, we present a generative model to synthesize plausible multi-step dynamic human–object interactions. […] [PDF]
European Association for Computer Graphics (Eurographics), 2019

DeepSignals: Predicting Intent of Drivers Through Visual Attributes

D. Frossard, E. Kee, R. Urtasun
Detecting the intention of drivers is an essential task in self-driving, necessary to anticipate sudden events like lane changes and stops. Turn signals and emergency flashers communicate such intentions, providing seconds of potentially critical reaction time. In this paper, we propose to detect these signals in video sequences by using a deep neural network that reasons about both spatial and temporal information. […] [PDF]
International Conference on Robotics and Automation (ICRA), 2019

Metropolis-Hastings Generative Adversarial Networks

R. Turner, J. Hung, Y. Saatci, J. Yosinski
We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN’s discriminator-generator pair, as opposed to sampling in a standard GAN which draws samples from the distribution defined by the generator. […] [PDF]
International Conference on Machine Learning (ICML), 2019

Understanding Neural Networks via Feature Visualization: A survey

A. Nguyen, J. Yosinski, J. Clune
A neuroscience method to understanding the brain is to find and study the preferred stimuli that highly activate an individual cell or groups of cells. Recent advances in machine learning enable a family of methods to synthesize preferred stimuli that cause a neuron in an artificial or biological brain to fire strongly. […] [PDF]
Interpretable AI: Interpreting, Explaining and Visualizing Deep Learning, 2019

Photo-Sketching: Inferring Contour Drawings from Images

M. Li, Z. Lin, R. Mech, E. Yumer, D. Ramanan

Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. […] [PDF]
Winter Conference on Applications of Computer Vision (WACV), 2019

Predicting Motion of Vulnerable Road Users using High-Definition Maps and Efficient ConvNets

F. Chou, T.-H. Lin, H. Cui, V. Radosavljevic, T. Nguyen, T. Huang, M. Niedoba, J. Schneider, N. Djuric
Following detection and tracking of traffic actors, prediction of their future motion is the next critical component of a self-driving vehicle (SDV), allowing the SDV to move safely and efficiently in its environment. This is particularly important when it comes to vulnerable road users (VRUs), such as pedestrians and bicyclists. We present a deep learning method for predicting VRU movement where we rasterize high-definition maps and actor’s surroundings into bird’s-eye view image used as input to convolutional networks. […] [PDF]
MLITS workshop @ Neural Information Processing Systems (NeurIPS), 2018

Rotated Rectangles for Symbolized Building Footprint Extraction

M. Dickenson, L. Gueguen
Building footprints (BFP) provide useful visual context for users of digital maps when navigating in space. This paper proposes a method for extracting and symbolizing building footprints from satellite imagery using a convolutional neural network (CNN). […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Faster Neural Networks Straight from JPEG

L. Gueguen, A. Sergeev, B. Kadlec, R. Liu, J. Yosinski
The simple, elegant approach of training convolutional neural networks (CNNs) directly from RGB pixels has enjoyed overwhelming empirical success. But can more performance be squeezed out of networks by using different input representations? In this paper we propose and explore a simple idea: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec. […] [PDF]
Advances in Neural Information Processing Systems (NeurIPS), 2018

Joint Mapping and Calibration via Differentiable Sensor Fusion

J. Chen, F. Obermeyer, V. Lyapunov, L. Gueguen, N. Goodman
We leverage automatic differentiation (AD) and probabilistic programming to develop an end-to-end optimization algorithm for batch triangulation of a large number of unknown objects. Given noisy detections extracted from noisily geo-located street level imagery without depth information, we jointly estimate the number and location of objects of different types, together with parameters for sensor noise characteristics and prior distribution of objects conditioned on side information. […] [PDF]
Computing Research Repository (CoRR), 2018

Deep Multi-Sensor Lane Detection

M. Bai, G. Mattyus, N. Homayounfar, S. Wang, S. K. Lakshmikanth, R. Urtasun
Reliable and accurate lane detection has been a long-standing problem in the field of autonomous driving. In recent years, many approaches have been developed that use images (or videos) as input and reason in image space. In this paper we argue that accurate image estimates do not translate to precise 3D lane boundaries, which are the input required by modern motion planning algorithms. […] [PDF]
International Conference on Intelligent Robots and Systems (IROS), 2018

HDNET: Exploiting HD Maps for 3D Object Detection

B. Yang, M. Liang, R. Urtasun
In this paper we show that High-Definition (HD) maps provide strong priors that can boost the performance and robustness of modern 3D object detectors. Towards this goal, we design a single stage detector that extracts geometric and semantic features from the HD maps. […] [PDF]
Conference on Robot Learning (CORL), 2018

IntentNet: Learning to Predict Intention from Raw Sensor Data

S. Casas, W. Luo, R. Urtasun
In order to plan a safe maneuver, self-driving vehicles need to understand the intent of other traffic participants. We define intent as a combination of discrete high level behaviors as well as continuous trajectories describing future motion. In this paper we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment. […] [PDF]
Conference on Robot Learning (CORL), 2018

Efficient Convolutions for Real-Time Semantic Segmentation of 3D Point Clouds

C. Zhang, W. Luo, R. Urtasun
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. […] [PDF]
International Conference on 3D Vision (3DV), 2018

Page 1 of 3
[/vc_column][/vc_row]