Skip to footer
Home Research Artificial Intelligence / Machine Learning DMM-Net: Differentiable Mask-Matching Network for Video Instance Segmentation

DMM-Net: Differentiable Mask-Matching Network for Video Instance Segmentation



In this paper, we propose the differentiable mask-matching network (DMM-Net) for solving the video object segmentation problem where the initial object masks are provided. Relying on the Mask R-CNN backbone, we extract mask proposals per frame and formulate the matching between object templates and proposals at one time step as a linear assignment problem where the cost matrix is predicted by a CNN. We propose a differentiable matching layer by unrolling a projected gradient descent algorithm in which the projection exploits the Dykstra’s algorithm. We prove that under mild conditions, the matching is guaranteed to converge to the optimum. In practice, it performs similarly to the Hungarian algorithm during inference. Meanwhile, we can back-propagate through it to learn the cost matrix. After matching, a refinement head is leveraged to improve the quality of the matched mask. Our DMM-Net achieves competitive results on the largest video object segmentation dataset YouTube-VOS. On DAVIS 2017, DMM-Net achieves the best performance without online learning on the first frames. Without any fine-tuning, DMM-Net performs comparably to state-of-the-art methods on SegTrack v2 dataset. At last, our matching layer is very simple to implement; we attach the PyTorch code (<50 lines) in the supplementary material. Our code is released at this https URL.


Xiaohui Zeng, Renjie Liao, Li Gu, Yuwen Xiong, Sanja Fidler, Raquel Urtasun


ICCV 2019

Full Paper

‘DMM-Net: Differentiable Mask-Matching Network for Video Instance Segmentation’ (PDF)

Uber ATG

Previous article Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving
Next article DAGMapper: Learning to Map by Discovering Lane Topology
Renjie Liao
Renjie Liao is a PhD student in Machine Learning Group, Department of Computer Science, University of Toronto, supervised by Prof. Raquel Urtasun and Prof. Richard Zemel. He is also a Research Scientist in Uber Advanced Technology Group Toronto. He is also affiliated with Vector Institute. He received M.Phil. degree from Department of Computer Science and Engineering, Chinese University of Hong Kong, under the supervision of Prof. Jiaya Jia. He got B.Eng. degree from School of Automation Science and Electrical Engineering in Beihang University (former Beijing University of Aeronautics and Astronautics).
Wei-Chiu Ma
Wei-Chiu Ma is a PhD student at MIT advised by Prof. Antonio Torralba. His research interests lie in the intersection of computer vision and machine learning, in particular low-level vision and 3D vision. He also works part-time at Uber ATG Toronto with Prof. Raquel Urtasun to apply his research to self-driving vehicles.
Raquel Urtasun
Raquel Urtasun is the Chief Scientist for Uber ATG and the Head of Uber ATG Toronto. She is also a Professor at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award, a Fallona Family Research Award and two Best Paper Runner up Prize awarded CVPR in 2013 and 2017. She was also named Chatelaine 2018 Woman of the year, and 2018 Toronto’s top influencers by Adweek magazine