While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation using a unified architecture where the encoder is shared amongst the three tasks. Our approach is very simple, can be trained end-to-end and performs extremely well in the challenging KITTI dataset. Our approach is also very efficient, allowing us to perform inference at more then 23 frames per second. Training scripts and trained weights to reproduce our results can be found here: https://github.com/MarvinTeichmann/MultiNet.
Martin Teichmann, Michael Weber, Marius Zöllner, Roberto Cipolla, Raquel Urtasun