TL;DR: We present CUPS, a Scene-Centric Unsupervised Panoptic Segmentation method leveraging motion and depth from stereo pairs to generate pseudo-labels. Using these labels, we train a monocular ...