Skip to footer

Results for Computer Vision

Deep Continuous Fusion for Multi-Sensor 3D Object Detection

M. Liang, B. Yang, S. WangR. Urtasun
In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. Towards this goal, we design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. […] [PDF]
European Conference on Computer Vision (ECCV), 2018

Single Image Intrinsic Decomposition Without a Single Intrinsic Image

W. Ma, H, Chu, B. Zhou, R. Urtasun, A. Torralba
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. […] [PDF]
European Conference on Computer Vision (ECCV), 2018

Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks

H. Cui, V. Radosavljevic, F. Chou, T.-H. Lin, T. Nguyen, T. Huang, J. Schneider, N. Djuric
Autonomous driving presents one of the largest problems that the robotics and artificial intelligence communities are facing at the moment, both in terms of difficulty and potential societal impact. Self-driving vehicles (SDVs) are expected to prevent road accidents and save millions of lives while improving the livelihood and life quality of many more. […] [PDF]
International Conference on Robotics and Automation (ICRA), 2019

LSQ++: lower running time and higher recall in multi-codebook quantization

J. Martinez, S. Zakhmi, H. Hoos, and J. Little
Multi-codebook quantization (MCQ) is the task of expressing a set of vectors as accurately as possible in terms of discrete entries in multiple bases. Work in MCQ is heavily focused on lowering quantization error, thereby improving distance estimation and recall on benchmarks of visual descriptors at a fixed memory budget. […] [PDF]
European Conference on Computer Vision (ECCV), 2018

End-to-End Deep Structured Models for Drawing Crosswalks

J. Liang, R. Urtasun
In this paper we address the problem of detecting crosswalks from LiDAR and camera imagery. Towards this goal, given multiple Li-DAR sweeps and the corresponding imagery, we project both inputs onto the ground surface to produce a top down view of the scene. […] [PDF]
European Conference on Computer Vision (ECCV), 2018

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

M. Teichmann, M. Weber, M. Zöllner, R. Cipolla, R. Urtasun
While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. […] [PDF]
IEEE Intelligent Vehicles Symposium (IV), 2018

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

R. Liu, J. Lehman, P. Molino, F.i Such, E. Frank, A. Sergeev, J. Yosinski
Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space. […] [PDF]
Advances in Neural Information Processing Systems (NeurIPS), 2018

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation

X. Qi, R. Liao, Z. Liu, R. Urtasun, J. Jia
In this paper, we propose Geometric Neural Network (GeoNet) to jointly predict depth and surface normal maps from a single image. Building on top of two-stream CNNs, our GeoNet incorporates geometric relation between depth and surface normal via the new depth-to-normal and normal-to-depth networks. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Matching Adversarial Networks

G. Mattyus, R. Urtasun
Generative Adversarial Nets (GANs) and Conditonal GANs (CGANs) show that using a trained network as loss function (discriminator) enables to synthesize highly structured outputs (e.g. natural images). However, applying a discriminator network as a universal loss function for common supervised tasks (e.g. semantic segmentation, line detection, depth estimation) is considerably less successful. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Deep Parametric Continuous Convolutional Neural Networks

S. Wang, S. Suo, W. Ma, A. PokrovskyR. Urtasun
We propose an approach for semi-automatic annotation of object instances. While most current methods treat object segmentation as a pixel-labeling problem, we here cast it as a polygon prediction task, mimicking how most current datasets have been annotated. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Learning deep structured active contours end-to-end

D. Marcos, D. Tuia, B. Kellenberger, L. Zhang, M. Bai, R. Liao, R. Urtasun
The world is covered with millions of buildings, and precisely knowing each instance’s position and extents is vital to a multitude of applications. Recently, automated building footprint segmentation models have shown superior detection accuracy thanks to the usage of Convolutional Neural Networks (CNN). […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

SBNet: Sparse Block’s Network for Fast Inference

M. Ren, A. Pokrovsky, B. Yang, R. Urtasun
Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers – this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

iMapper: Interaction-guided Scene Mapping from Monocular Videos

A. Monszpart, P. Guerrero, D. Ceylan, E. Yumer, N. Mitra
A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video. While the problem remains a subject of active research, concurrent advances have been made in the context of human pose reconstruction from monocular video, including image-space feature point detection and 3D pose recovery. These methods, however, start to fail under moderate to heavy occlusion as the problem becomes severely under-constrained. We approach the problems differently. We observe that people interact similarly in similar scenes. […] [PDF]
Special Interest Group on Computer Graphics and Interactive Techniques Conference, (SIGGRAPH), 2018

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

H. Chu, W. Ma, K. Kundu, R. Urtasun, S. Fidler
The last few years have seen approaches trying to combine the increasing popularity of depth sensors and the success of the convolutional neural networks. Using depth as additional channel alongside the RGB input has the scale variance problem present in image convolution based approaches. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

W. Luo, B. Yang, R. Urtasun
In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse data at range. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

End-to-end Learning of Multi-sensor 3D Tracking by Detection

D. Frossard, R. Urtasun
In this paper we propose a novel approach to tracking by detection that can exploit both cameras as well as LIDAR data to produce very accurate 3D trajectories. Towards this goal, we formulate the problem as a linear program that can be solved exactly, and learn convolutional networks for detection as well as matching in an end-to-end manner. […] [PDF]
International Conference on Robotics and Automation (ICRA), 2018

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning

M. Norouzzadeh, A. Nguyen, M. Kosmala, A. Swanson, M. Palmer, C. Parker, J. Clune
Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would revolutionize our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could transform many fields of biology, ecology, and zoology into “big data” sciences. […] [PDF]
PNAS Vol. 115 no. 25, 2018

Hierarchical Recurrent Attention Networks for Structured Online Maps

N. Homayounfar, W. Ma, S. Lakshmikanth, R. Urtasun
In this paper, we tackle the problem of online road network extraction from sparse 3D point clouds. Our method is inspired by how an annotator builds a lane graph, by first identifying how many lanes there are and then drawing each one in turn. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

PIXOR: Real-time 3D Object Detection from Point Clouds

B. Yang, W. Luo, R. Urtasun
We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Robust Dense Mapping for Large-Scale Dynamic Environments

I. Bârsan, P. Liu, M. Pollefeys, A. Geiger
We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. […] [PDF]
Video: [LINK]
Project Page: [LINK]
International Conference on Robotics and Automation (ICRA), 2018

Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li, Heerad Farkhoor, R. Liu, J. Yosinski
Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. […] [PDF]
International Conference on Learning Representations (ICLR), 2018

Sports Field Localization via Deep Structured Models

N. Homayounfar, S. Fidler, R. Urtasun
In this work, we propose a novel way of efficiently localizing a soccer field from a single broadcast image of the game. Related work in this area relies on manually annotating a few key frames and extending the localization to similar images, or installing fixed specialized cameras in the stadium from which the layout of the field can be obtained. […] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Characterizing how Visual Question Answering models scale with the world

E. Bingham, P. Molino, P. Szerlip, F. Obermeyer, N. Goodman
Detecting differences in generalization ability between models for visual question answering tasks has proven to be surprisingly difficult. We propose a new statistic, asymptotic sample complexity, for model comparison, and construct a synthetic data distribution to compare a strong baseline CNN-LSTM model to a structured neural network with powerful inductive biases. […] [PDF]
ViGIL @ NeurIPS(NeurIPS), 2017

Automated Identification of Northern Leaf Blight-Infected Maize Plants from Field Imagery Using Deep Learning

C. DeChant, T. Wiesner-Hanks, S, Chen, E. Stewart, J. Yosinski, M. Gore, R. Nelson, and H. Lipson
Northern leaf blight (NLB) can cause severe yield loss in maize; however, scouting large areas to accurately diagnose the disease is time consuming and difficult. We demonstrate a system capable of automatically identifying NLB lesions in field-acquired images of maize plants with high reliability. […] [PDF]
Phytopathology, 2017

Be Your Own Prada: Fashion Synthesis With Structural Coherence

S. Zhu, R. Urtasun, S. Fidler, D. Lin, C. Loy
We present a novel and effective approach for generating new clothing on a wearer through generative adversarial learning. Given an input image of a person and a sentence describing a different outfit, our model “redresses” the person as desired, while at the same time keeping the wearer and her/his pose unchanged. […] [PDF]
International Conference on Computer Vision (ICCV), 2017

DeepRoadMapper: Extracting Road Topology From Aerial Images

G. Máttyus, W. Luo, R. Urtasun
Creating road maps is essential for applications such as autonomous driving and city planning. Most approaches in industry focus on leveraging expensive sensors mounted on top of a fleet of cars. This results in very accurate estimates when exploiting a user in the loop. […] [PDF]
International Conference on Computer Vision (ICCV), 2017

3D Graph Neural Networks for RGBD Semantic Segmentation

X. Qi, R. Liao, J. Jia, S. Fidler, R. Urtasun
RGBD semantic segmentation requires joint reasoning about 2D appearance and 3D geometric information. In this paper we propose a 3D graph neural network (3DGNN) that builds a k-nearest neighbor graph on top of 3D point cloud. […] [PDF]
International Conference on Computer Vision (ICCV), 2017

SGN: Sequential Grouping Networks for Instance Segmentation

S. Liu, J. Jia, S. Fidler, R. Urtasun
In this paper, we propose Sequential Grouping Networks (SGN) to tackle the problem of object instance segmentation. SGNs employ a sequence of neural networks, each solving a sub-grouping problem of increasing semantic complexity in order to gradually compose objects out of pixels. […] [PDF]
International Conference on Computer Vision (ICCV), 2017

Situation Recognition With Graph Neural Networks

R. Li, M. Tapaswi, R. Liao, J. Jia, R. Urtasun, S. Fidler
We address the problem of recognizing situations in images. Given an image, the task is to predict the most salient verb (action), and fill its semantic roles such as who is performing the action, what is the source and target of the action, etc. […] [PDF]
International Conference on Computer Vision (ICCV), 2017

End-To-End Instance Segmentation With Recurrent Attention

M. Ren, R. Zemel
While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. […] [PDF]
Supplementary Materials: [LINK]
Code: [LINK]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Page 2 of 3