Skip to footer
Home Authors Posts by Mengye Ren

Mengye Ren

Mengye Ren is a research scientist at Uber ATG Toronto. He is also a PhD student in the machine learning group of the Department of Computer Science at the University of Toronto. He studied Engineering Science in his undergrad at the University of Toronto. His research interests are machine learning, neural networks, and computer vision. He is originally from Shanghai, China.

Engineering Blog Articles

SBNet: Leveraging Activation Block Sparsity for Speeding up Convolutional Neural Networks


By applying convolutional neural networks (CNNs) and other deep learning techniques, researchers at Uber ATG Toronto are committed to developing technologies that power safer and more reliable transportation solutions.

CNNs are widely used for analyzing visual imagery and data from

Research Papers

Identifying Unknown Instances for Autonomous Driving

K. Wong, S. Wang, M. Ren, M. Liang, R. Urtasun
We propose a novel open-set instance segmentation algorithm for point clouds that identifies instances from both known and unknown classes. In particular, we train a deep convolutional neural network that projects points belonging to the same instance together in a category-agnostic embedding space. [PDF]
The Conference on Robot Learning (CoRL), 2019

Incremental Few-Shot Learning with Attention Attractor Networks

M. Ren, R. Liao, E. Fetaya, R. Zemel
This paper addresses this problem, incremental few- shot learning, where a regular classification network has already been trained to recognize a set of base classes, and several extra novel classes are being considered, each with only a few labeled examples. After learning the novel classes, the model is then evaluated on the overall classification performance on both base and novel classes. To this end, we propose a meta-learning model, the Attention Attractor Networks, which regularizes the learning of novel classes. [PDF]
Conference on Neural Information Processing Systems (NeurIPS), 2019

Graph HyperNetworks for Neural Architecture Search

C. Zhang, M. Ren, R. Urtasun
Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs. However, it can be prohibitively expensive as the search requires training thousands of different networks, while each can last for hours. In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. [...] [PDF]
Meta Learning workshop @ Neural Information Processing Systems (NeurIPS), 2018

Learning to Reweight Examples for Robust Deep Learning

M. Ren, W. Zeng, B. Yang, R. Urtasun
Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. [...] [PDF]
International Conference on Machine Learning ( ICML), 2018

SBNet: Sparse Block’s Network for Fast Inference

M. Ren, A. Pokrovsky, B. Yang, R. Urtasun
Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. [...] [PDF]
Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Understanding Short-Horizon Bias in Stochastic Meta-Optimization

Y. Wu, M. Ren, R. Liao, R. Grosse
Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training. There has been much recent interest in gradient-based meta-optimization, where one tunes hyperparameters, or even learns an optimizer, in order to minimize the expected loss when the training procedure is unrolled. [...] [PDF]
International Conference on Learning Representations (ICLR), 2018

Meta-Learning for Semi-Supervised Few-Shot Classification

M. Ren, E. Triantafilou, S. Ravi, J. Snell, K. Swersky, J. Tenenbaum, H. Larochelle, R. Zemel
In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Recent progress in few-shot classification has featured meta-learning, in which a parameterized model for a learning algorithm is defined and trained on episodes representing different classification problems, each with a small labeled training set and its corresponding test set. [...] [PDF]
Code & Datasets: [LINK]
International Conference on Learning Representations (ICLR), 2018

The Reversible Residual Network: Backpropagation Without Storing Activations

A. Gomez, M. Ren, Raquel Urtasun, R. Grosse
Residual Networks (ResNets) have demonstrated significant improvement over traditional Convolutional Neural Networks (CNNs) on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck as one needs to store all the intermediate activations for calculating gradients using backpropagation. [...] [PDF]
Advances in Neural Information Processing Systems (NeurIPS), 2017

End-To-End Instance Segmentation With Recurrent Attention

M. Ren, R. Zemel
While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. [...] [PDF]
Supplementary Materials: [LINK]
Code: [LINK]
Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Normalizing the Normalizers: Comparing and Extending Network Normalization Scheme

M. Ren, R. Liao, R. Urtasun, F. H. Sinz, R. Zemel
Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better models. However its success has been very limited when dealing with recurrent neural networks. On the other hand, layer normalization normalizes the activations across all activities within a layer. This was shown to work well in the recurrent setting. In this paper we propose a unified view of normalization techniques, as forms of divisive normalization, which includes layer and batch normalization as special cases. [...] [PDF]
International Conference on Learning Representations (ICLR), 2017