CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion

Bin Dou1, Tianyu Zhang1, Yongjia Ma1, Zhaohui Wang1, Zejian Yuan1*
1Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University
RGB Semantic Panoptic

Semantic & Panoptic segmentation results are presented.

Abstract

We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based segmentation methods have relied on time-consuming neural scene optimization. While recent 3D Gaussian Splatting has notably improved speed, existing Gaussian-based segmentation methods struggle to produce compact masks, especially in zero-shot segmentation. This issue probably stems from their straightforward assignment of learnable parameters to each Gaussian, resulting in a lack of robustness against cross-view inconsistent 2D machine-generated labels. Our method aims to address this problem by employing Dual Feature Fusion Network as Gaussians' segmentation field. Specifically, we first optimize 3D Gaussians under RGB supervision. After Gaussian Locating, DINO features extracted from images are applied through explicit unprojection, which are further incorporated with spatial features from the efficient point cloud processing network. Feature aggregation is utilized to fuse them in a global-to-local strategy for compact segmentation features. Experimental results show that our model outperforms baselines on both semantic and panoptic zero-shot segmentation task, meanwhile consumes less than 10% inference time compared to NeRF-based methods.

Method

Given only posed RGB images of a 3D scene, our method aims to build an expressive representation to capture geometry, appearance as well as compact segmenting identity of the scene. Our proposed model, CoSSegGaussians, enables compact novel-view 3D-consistent segmentation, while consuming much less rendering time compared to NeRF-based methods. The below figure provides an overview of CoSSegGaussians' architecture.

Interpolate start reference image.

Comparisons

(a) shows qualitative comparison of our method with other methods on Replica/ScanNet dataset. (b) and (c) present quantitative comparison on semantic and panoptic segmentation respectively, by averaging over all scenes in each dataset.

Interpolate start reference image.

Animation

Scene Segmentation

Use the slider to observe rendered semantic & panoptic segmentation maps from various viewpoints of different methods.


Loading...

RGB

Loading...

Semantic(Gaussian Grouping)

Loading...

Panoptic(Gaussian Grouping)

Loading...

Semantic(Ours)

Loading...

Panoptic(Ours)


Segmented 3D Gaussians


DINO Feature Field

We've visualized the feature field obtained through DINO feature unprojection.


Applications

Language-guided Segmentation

Language-guided segmentation results are provided as an application of our method, based on 2D language-guided segmentation method Text2Seg.

Text Prompt: Blue Toy

Text Prompt: Potted Plant

Scene Manipulation

Scene manipulation results are provided as another application.

Interactive & Detailed Results
Translating-image-office-gt
Translating-image-office-group
Translating-image-office-ours
Translating-image-office-gt-detail

Before Manipulation

Translating-image-office-group-detail

Gaussian Grouping

Translating-image-office-ours-detail

Ours

Erasing-image-room-gt
Erasing-image-room-group
Erasing-image-room-ours
Erasing-image-room-gt-detail

Before Manipulation

Erasing-image-room-group-detail

Gaussian Grouping

Erasing-image-room-ours-detail

Ours

Multi-views' Results

Translating of Seat

Removal of Stool


Related Links

There are some related works focusing on 3D scene segmentation, such as Semantic-NeRF, DM-NeRF, Panoptic-Lifting, Gaussian Grouping, etc.

Citation

@article{dou2024cosseggaussians,
      title={CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature Fusion},
      author={Dou, Bin and Zhang, Tianyu and Ma, Yongjia and Wang, Zhaohui and Yuan, Zejian},
      journal={arXiv preprint arXiv:2401.05925},
      year={2024}
    }