Skip to footer

Ersin Yumer

Ersin Yumer
1 BLOG ARTICLES 5 RESEARCH PAPERS
Ersin Yumer is a Staff Research Scientist, leading the San Francisco research team within Uber ATG R&D. Prior to joining Uber, he led the perception machine learning team at Argo AI, and before that he spent three years at Adobe Research. He completed his PhD studies at Carnegie Mellon University, during which he spent several summers at Google Research as well. His current research interests lie at the intersection of machine learning, 3D computer vision, and graphics. He develops end-to-end learning systems and holistic machine learning applications that bring signals of the visual world together: images, point clouds, videos, 3D shapes and depth scans.

Engineering Blog Articles

Announcing the 2019 Uber AI Residency

The Uber AI Residency is a 12-month training program for academics and professionals interested in becoming an AI researcher with Uber AI Labs or Uber ATG.

Research Papers

UPSNet: A Unified Panoptic Segmentation Network

Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, R. Urtasun
In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Learning a Generative Model for Multi-Step Human-Object Interactions from Videos

H. Wang, S. Pirk, V. Kim, E. Yumer, L. Guibas
Creating dynamic virtual environments consisting of humans interacting with objects is a fundamental problem in computer graphics. While it is well-accepted that agent interactions play an essential role in synthesizing such scenes, most extant techniques exclusively focus on static scenes, leaving the dynamic component out. In this paper, we present a generative model to synthesize plausible multi-step dynamic human–object interactions. [...] [PDF]
European Association for Computer Graphics (Eurographics), 2019

Exploratory Stage Lighting Design using Visual Objectives

E. Shimizu, S. Paris, M. Fisher, E. Yumer, K. Fatahalian
Lighting is a critical element of theater. A lighting designer is responsible for drawing the audience’s attention to a specific part of the stage, setting time of day, creating a mood, and conveying emotions. Designers often begin the lighting design process by collecting reference visual imagery that captures different aspects of their artistic intent. Then, they experiment with various lighting options to determine which ideas work best on stage. However, modern stages contain tens to hundreds of lights, and setting each light source’s parameters individually to realize an idea is both tedious and requires expert skill. In this paper, we describe an exploratory lighting design tool based on feedback from professional designers. [...] [PDF]
European Association for Computer Graphics (Eurographics), 2019

Photo-Sketching: Inferring Contour Drawings from Images

M. Li, Z. Lin, R. Mech, E. Yumer, D. Ramanan
Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. [...] [PDF]
Winter Conference on Applications of Computer Vision (WACV), 2019

iMapper: Interaction-guided Scene Mapping from Monocular Videos

A. Monszpart, P. Guerrero, D. Ceylan, E. Yumer, N. Mitra
A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video. While the problem remains a subject of active research, concurrent advances have been made in the context of human pose reconstruction from monocular video, including image-space feature point detection and 3D pose recovery. These methods, however, start to fail under moderate to heavy occlusion as the problem becomes severely under-constrained. We approach the problems differently. We observe that people interact similarly in similar scenes. [...] [PDF]
Special Interest Group on Computer Graphics and Interactive Techniques Conference, (SIGGRAPH), 2018

Popular Articles