This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 0evGoI814PUaz5b0v33LkD_90-aSg2bXpRFolsVcaH0
Cover

How Panopticus Uses AI to Detect Objects in 3D

Written by @omnidirectional | Published on 2025/3/2

TL;DR
Panopticus is implemented using CUDA, TensorRT, and PyTorch, optimizing 3D detection with modularized neural networks, parallel processing, and efficient memory management.

ABSTRACT

1 INTRODUCTION

2 BACKGROUND: OMNIDIRECTIONAL 3D OBJECT DETECTION

3 PRELIMINARY EXPERIMENT

3.1 Experiment Setup

3.2 Observations

3.3 Summary and Challenges

4 OVERVIEW OF PANOPTICUS

5 MULTI-BRANCH OMNIDIRECTIONAL 3D OBJECT DETECTION

5.1 Model Design

6 SPATIAL-ADAPTIVE EXECUTION

6.1 Performance Prediction

5.2 Model Adaptation

6.2 Execution Scheduling

7 IMPLEMENTATION

8 EVALUATION

8.1 Testbed and Dataset

8.2 Experiment Setup

8.3 Performance

8.4 Robustness

8.5 Component Analysis

8.6 Overhead

9 RELATED WORK

10 DISCUSSION AND FUTURE WORK

11 CONCLUSION AND REFERENCES

7 IMPLEMENTATION

We implemented Panopticus using Python and CUDA for GPU-based acceleration. All neural networks were developed using PyTorch [41] and trained on the training set in the nuScenes dataset [2]. Note that Panopticus is compatible with other 3D perception datasets such as Waymo [46], having sensor configurations similar to nuScenes. The neural networks for 3D object detection are developed based on

Figure 9: Setup for mobile testbed and dataset. MMDetection3D [35]. For the camera motion network, we customized and trained [53] to produce consistent relative poses between two consecutive frames for any given camera view. For the object tracker, we modified SimpleTrack [36] to use velocity for 3D Kalman’s state transition model. We used the GPU-accelerated XGBoost library [12] and linear model in scikit-learn [43] for performance predictors, and PuLP [17] for the ILP solver. In the model adaptation stage, memory usage is profiled via tegrastats

We optimized the performance of our multi-branch model in various ways. We used TensorRT [11] to modularize the neural networks and to accelerate the inference. Networks converted to TensorRT are optimized using floating-point 16 (FP16) quantization and layer fusion, etc. Camera view images assigned to the same networks, such as backbone network or DepthNet, are batch-processed together. Additionally, we used CUDA Multi-Stream [5] to process these multiple networks in parallel on an edge GPU. Meanwhile, we noticed an accuracy loss due to the simplistic modularization of our multi-branch model. Specifically, DepthNets and BEV head trained with a specific backbone network are incompatible with others. One-size-fits-all DepthNet is not practical since each backbone generates a 2D feature map of different sizes. Thus, our model includes DepthNet variants tailored to each backbone, ensuring accurate depth estimation. In contrast, the BEV head takes inputs of a consistent shape, regardless of the preceding networks—backbones and DepthNets. This allows us to train a universal BEV head compatible with any combination of preceding networks. We trained the BEV head from scratch, while the pre-trained backbones and DepthNets were fine-tuned.

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

[story continues]


Written by
@omnidirectional
The stories behind the systems that capture signals in all directions. We publish research on Omnidirectional.Tech

Topics and
tags
edge-ai|panopticus-ai|edge-ai-optimization|ai-for-autonomous-vehicles|spatial-adaptive-ai-models|low-cost-ai-perception|bevdet-neural-networks|lidar-vs.-camera-detection
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 0evGoI814PUaz5b0v33LkD_90-aSg2bXpRFolsVcaH0