Yilong Liu, Wanhua Li, Chen Zhu-Tian, Hanspeter Pfister
[Paper] [Project Page]
LangFlash is a feed-forward framework for 3D language Gaussian splatting from sparse, unposed input images.
Create a Python 3.10 environment and install the required dependencies (other torch version should work):
conda create -y -n langflash python=3.10
conda activate langflash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txtDownload the geometry and semantic checkpoints, then place both files in ckpt/:
- Geometry checkpoint:
ckpt/mixRe10kDl3dv.ckpt(From NoPoSplat) - Semantic checkpoint:
ckpt/re10k.ckpt
The final directory layout should look like this:
ckpt/
mixRe10kDl3dv.ckpt
re10k.ckpt
Run the Gradio demo with:
python -m LangFlash.model.gradioThe demo uses the bundled examples under sample/ and loads checkpoints from ckpt/.
The preprocessed dataset is available on Hugging Face.
The dataset release contains:
scannet_extracted_sp ScanNet data and metadata
clip_test RE10K test language features and corresponding masks
clip_train_part{i} RE10K training language features and corresponding masks
This project builds on several excellent open-source projects:NoPoSplat,EfficientViT,LangSplat,SAM2,lseg-minimal, and FG-CLIP.
If you find LangFlash useful, please cite:
@misc{liu2026langflash,
title = {LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images},
author = {Yilong Liu and Wanhua Li and Chen Zhu-Tian and Hanspeter Pfister},
year = {2026},
eprint = {2605.23287},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}