Abstract: Open-vocabulary object detection aims to identify and localize some seen and unseen classes in the training stage without extra annotations. A key limitation in prevailing methods is their ...