Towards Generalized 3D Object Detection for Autonomous Driving: Enhancing Data Efficiency in Novel Domains
Robust 3D object detection is fundamental to the advancement of robotics and autonomous driving, allowing autonomous systems to navigate complex environments safely and efficiently. Current 3D object detection algorithms rely on supervised learning using large-scale annotated datasets, which are meticulously designed to reflect diverse real-world scenarios and object variations. Creating and annotating such datasets is notoriously labor-intensive and time-consuming, requiring significant efforts to ensure precision and comprehensiveness.However, due to the rapid development and geographic differences in vehicle designs, deploying these carefully tuned systems in real-world environments still introduces domain gaps that compromise performance. These challenges are often addressed by continuously collecting and annotating additional data, which is unsustainable due to the high costs involved. Consequently, there is a pressing need for more generalized 3D object detection capabilities to ensure consistent and reliable performance across diverse and evolving domains. In this thesis, we identified mis-localization as the primary cause of reduced 3D object detection performance in novel domains. That is, 3D objects (e.g. cars, cyclists, and pedestrians) in novel domains can generally be correctly identified by unadapted 3D object detectors, but their precise sizes, locations, and orientations are often inaccurate. However, previous studies tend to view object detection in novel domains as entirely new tasks, overlooking the fact that objects of the same class often share similar shapes across domains, and that bounding boxes are consistently defined to be tight around the objects. We argue that addressing such mis-localization does not necessarily require extensive data collection and annotation. In fact, successful adaptation can be achieved with minimal or even no prior knowledge of the new domain. Notably, StatNorm significantly enhances domain adaptation performance by simulating new domains using only their object dimension statistics; DRIFT markedly improves both the efficiency and performance of unsupervised object discovery in novel domains by aligning 3D object detectors with heuristic reward functions; DiffuBox consistently boosts existing domain adaptation methods through domain-agnostic bounding box refinement that operates in the bounding-box-relative view. Through comprehensive experimental results, we demonstrate that incorporating shape knowledge and heuristic priors can effectively mitigate the domain gaps, achieving more robust, efficient, and generalized 3D object detection across diverse environments without the need for extensive domain-specific data collection and annotation. Overall, we offer a more practical and scalable solution for deploying autonomous driving systems in diverse real-world scenarios.