C-GenReg

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Ben-Gurion University, Beer-Sheva, Israel

CVPR 2026

C-GenReg enables training-free 3D registration by transforming point clouds into multi-view consistent RGB images, allowing Vision Foundation Model features to augment conventional 3D geometric features for more robust registration.

Abstract

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. C-GenReg augments the geometric registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry.

This generative transfer preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for dense correspondences extracts matches, which are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a Match-then-Fuse probabilistic cold-fusion scheme that combines the generated-RGB and geometric correspondence posteriors. This principled fusion preserves each modality's inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play, and experiments on 3DMatch, ScanNet, and Waymo demonstrate strong zero-shot performance, superior cross-domain generalization, and successful operation on real outdoor LiDAR data where imagery is unavailable.

C-GenReg teaser figure showing the generated-RGB branch, geometric branch, and probabilistic fusion.
C-GenReg operates with a generated-RGB branch, a geometric branch, and a probabilistic fusion stage that combines both correspondence sources before estimating the final rigid transformation.

Method

Overview of the C-GenReg pipeline.
Method overview. The generated-RGB branch and the geometric branch produce independent correspondence posteriors that are fused by the Match-then-Fuse module before pose estimation.

1. Generated-RGB Branch

Source and target point clouds are represented as depth-frame sequences, temporally concatenated, and processed by a frozen World Foundation Model to generate RGB views that are geometrically coherent with the input depth and cross-view consistent. A subset of K frames per domain is then fed to a frozen, task-specific VFM, and the resulting dense pixel features are lifted back to 3D using the original depth maps.

2. Geometric Branch

In parallel, the raw point clouds are processed by a pretrained geometric feature extractor that produces dense geometric descriptors directly in 3D. This branch preserves the registration-oriented geometric cues that complement the generated-image representation.

3. Match-then-Fuse Probabilistic Fusion

Each modality produces a posterior correspondence map, pimg and pgeo, from its similarity scores. These are fused by the proposed Match-then-Fuse probabilistic module into a unified posterior pfuse, from which the final rigid transformation is estimated.

Virtual camera projection for adapting LiDAR scans into the depth-image input format.
LiDAR adaptation. For outdoor LiDAR data, C-GenReg first converts each raw point cloud into a depth-image representation by projecting it onto a virtual camera. This depth image can then be fed into the same World Foundation Model used for indoor data, allowing the method to generate an aligned RGB view and apply the same downstream registration pipeline on LiDAR data.

Results

3DMatch Benchmark

C-GenReg achieves the best overall performance across most reported indoor metrics while remaining fully zero-shot.

Table 1 from the paper showing the 3DMatch benchmark results.

ScanNet Benchmarks

Cross-dataset generalization on unseen indoor scenes, including the ScanNet Hard and ScanNet SuperGlue splits.

Table 2 from the paper showing ScanNet benchmark results.

Waymo Outdoor Benchmark

Real outdoor LiDAR registration where C-GenReg substantially outperforms geometric baselines trained on KITTI.

Table 3 from the paper showing Waymo outdoor benchmark results.

Ablation Studies

Task-specific VFMs outperform general-purpose ones, and probabilistic fusion consistently improves over simple concatenation across geometric backbones.

Table 4 from the paper showing ablation study results on 3DMatch.

Prompt Robustness

Replacing a detailed scene description with a general one results in negligible degradation, while even a minimal prompt remains reasonably strong; in contrast, a semantically incorrect prompt substantially degrades registration accuracy.

Prompt robustness figure from the paper.

Effect of View Selection

Registration performance measured by Relative Rotation Error and Relative Translation Error as a function of the number of selected views K. Performance saturates for K ≥ 4, indicating that only a few representative views are sufficient for stable registration.

Figure 6 from the paper showing the effect of view selection K.

Citation

@article{haitman2026cgenreg,
  title     = {C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion},
  author    = {Haitman, Yuval and Efraim, Amit and Francos, Joseph M.},
  journal   = {arXiv preprint arXiv:2604.16680},
  year      = {2026},
  doi       = {10.48550/arXiv.2604.16680},
  url       = {https://arxiv.org/abs/2604.16680}
}