FusionOcc: Multi-Modal Fusion for 3D Occupancy Prediction, MM 2024 [paper]
FusionOcc is a new multi-modal fusion network for 3D occupancy prediction by fusing features of LiDAR point clouds and surround-view images. The model fuses features of these two modals in 2D and 3D space, respectively. Semi-supervised method is utilized to generate dense depth map, which is integrated by BEV images via a cross-modal fusion module. Features of voxelized point clouds are aligned and merged with BEV images' features converted by a view-transformer in 3D space. FusionOcc establishes a new baseline for further research in multi-modal fusion for 3D occupancy prediction, while achieves the new state-of-the-art on Occ3D-nuScenes dataset.
# main prerequisites
Python = 3.8
nuscenes-devkit = 1.1.11
PyTorch = 1.10.0
torch-scatter = 2.0.9
opencv-python = 4.9.0
Pillow = 10.0.1
mmcv-ful = 1.5.3
mmdetection = 2.25.1
FusionOcc
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── lidarseg
│ │ ├── imgseg
│ │ ├── gts
| | ├── v1.0-trainval
| | ├── fusionocc-nuscenes_infos_train.pkl
| | ├── fusionocc-nuscenes_infos_val.pkl
| Backbone | Config | Mask | Pretrain | mIoU | Checkpoints |
|---|---|---|---|---|---|
| Swin-Base | Base | ✖️ | ImageNet, nuImages | 56.62 | BaseWoMask |
We provide instructions for evaluating our pretrained models. Download checkpoints above first.
the config file is here fusion_occ.py
Run:
./tools/dist_test.sh $config $checkpoint num_gpuModify the "load_from" path at the end of the config file to load pre-trained weights, run:
./tools/dist_train.sh $config num_gpuTo obtain the version without using mask, simply modify the use_mask field in the config file to False and train several epochs.
You can also acquire pre-trained weights from BEVDet to start training from the very beginning.
Thanks a lot to these excellent open-source projects, our code is based on them:
Some other related projects for Occ3d prediction:
If this work is helpful for your research, please consider citing the following paper:
@inproceedings{
zhang2024fusionocc,
title={FusionOcc: Multi-Modal Fusion for 3D Occupancy Prediction},
author={Shuo Zhang and Yupeng Zhai and Jilin Mei and Yu Hu},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=xX66hwZJWa}
}
