lingbot-map-3d-reconstruction

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LingBot-Map 3D Reconstruction Skill

LingBot-Map 3D重建Skill

Skill by ara.so — Daily 2026 Skills collection.
LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from streaming image or video data using a Geometric Context Transformer. It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames via paged KV cache attention.
ara.so开发的Skill — 属于Daily 2026 Skills合集。
LingBot-Map是一款前馈式3D基础模型,借助Geometric Context Transformer从流式图像或视频数据中重建场景。通过分页KV缓存注意力机制,在518×378分辨率下处理超过10000帧的序列时,可达到约20 FPS的速度。

What It Does

功能介绍

  • Streaming 3D reconstruction from image sequences or video
  • Feed-forward inference (no iterative optimization needed)
  • Outputs: point clouds with per-point confidence, camera poses, depth maps
  • Key features: anchor context, pose-reference window, trajectory memory for drift correction
  • 流式3D重建:支持从图像序列或视频中重建场景
  • 前馈式推理:无需迭代优化
  • 输出内容:带逐点置信度的点云、相机姿态、深度图
  • 核心特性:锚点上下文、姿态参考窗口、用于漂移校正的轨迹记忆

Installation

安装步骤

bash
undefined
bash
undefined

1. Create environment

1. 创建环境

conda create -n lingbot-map python=3.10 -y conda activate lingbot-map
conda create -n lingbot-map python=3.10 -y conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

2. 安装PyTorch(CUDA 12.8版本)

pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128

3. Install lingbot-map

3. 安装lingbot-map

git clone https://github.com/Robbyant/lingbot-map.git cd lingbot-map pip install -e .
git clone https://github.com/Robbyant/lingbot-map.git cd lingbot-map pip install -e .

4. Install FlashInfer for fast paged KV cache attention (recommended)

4. 安装FlashInfer以实现高速分页KV缓存注意力(推荐)

pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

5. Optional: visualization support

5. 可选:可视化支持

pip install -e ".[vis]"
pip install -e ".[vis]"

6. Optional: sky masking for outdoor scenes

6. 可选:户外场景天空遮罩

pip install onnxruntime # CPU pip install onnxruntime-gpu # GPU
undefined
pip install onnxruntime # CPU版本 pip install onnxruntime-gpu # GPU版本
undefined

Model Download

模型下载

Models available on HuggingFace and ModelScope:
python
undefined
模型可从HuggingFace和ModelScope获取:
python
undefined

Download via huggingface_hub

通过huggingface_hub下载

from huggingface_hub import hf_hub_download
model_path = hf_hub_download( repo_id="robbyant/lingbot-map", filename="checkpoint.pt" )

Or manually download from:
- HuggingFace: `https://huggingface.co/robbyant/lingbot-map`
- ModelScope: `https://www.modelscope.cn/models/Robbyant/lingbot-map`
from huggingface_hub import hf_hub_download
model_path = hf_hub_download( repo_id="robbyant/lingbot-map", filename="checkpoint.pt" )

或手动从以下地址下载:
- HuggingFace: `https://huggingface.co/robbyant/lingbot-map`
- ModelScope: `https://www.modelscope.cn/models/Robbyant/lingbot-map`

CLI Commands

CLI命令

Demo with Interactive 3D Viewer (browser at localhost:8080)

带交互式3D查看器的演示(浏览器访问localhost:8080)

bash
undefined
bash
undefined

From image folder

从图像文件夹读取

python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/

From video file

从视频文件读取

python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10

Outdoor scene with sky masking

带天空遮罩的户外场景

python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --mask_sky

Example scenes included in repo

仓库中包含的示例场景

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/church --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/oxford --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/university4 --mask_sky
undefined
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/church --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/oxford --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/university4 --mask_sky
undefined

Long Sequence Handling

长序列处理

bash
undefined
bash
undefined

Keyframe interval: store every Nth frame in KV cache (saves memory)

关键帧间隔:每N帧在KV缓存中存储一次(节省内存)

Use when sequence > 320 frames

当序列长度>320帧时使用

python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --keyframe_interval 6
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --keyframe_interval 6

Windowed mode: for very long sequences (>3000 frames)

窗口模式:用于超长序列(>3000帧)

python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--mode windowed --window_size 64
undefined
python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--mode windowed --window_size 64
undefined

Without FlashInfer (SDPA fallback)

不使用FlashInfer(回退到SDPA)

bash
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa
bash
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

Sky Masking with Custom Paths

自定义路径的天空遮罩

bash
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky \
    --sky_mask_dir /path/to/cached_masks/ \
    --sky_mask_visualization_dir /path/to/mask_viz/
bash
python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky \
    --sky_mask_dir /path/to/cached_masks/ \
    --sky_mask_visualization_dir /path/to/mask_viz/

CLI Arguments Reference

CLI参数参考

Input

输入参数

ArgumentDescription
--model_path
Path to model checkpoint (.pt file)
--image_folder
Directory of input images
--video_path
Input video file path
--fps
Frames per second to sample from video
参数描述
--model_path
模型检查点路径(.pt文件)
--image_folder
输入图像目录
--video_path
输入视频文件路径
--fps
从视频中采样的帧率

Inference Mode

推理模式

ArgumentDefaultDescription
--mode
streaming
streaming
or
windowed
--window_size
64
Window size for windowed mode
--keyframe_interval
1
Store every Nth frame in KV cache
--use_sdpa
False
Use PyTorch SDPA instead of FlashInfer
参数默认值描述
--mode
streaming
可选
streaming
windowed
--window_size
64
窗口模式下的窗口大小
--keyframe_interval
1
每N帧在KV缓存中存储一次
--use_sdpa
False
使用PyTorch SDPA替代FlashInfer

Sky Masking

天空遮罩

ArgumentDescription
--mask_sky
Enable sky segmentation and masking
--sky_mask_dir
Custom directory for cached sky masks
--sky_mask_visualization_dir
Save side-by-side mask visualizations
参数描述
--mask_sky
启用天空分割与遮罩
--sky_mask_dir
缓存天空遮罩的自定义目录
--sky_mask_visualization_dir
保存遮罩对比可视化结果

Visualization

可视化

ArgumentDefaultDescription
--port
8080
Viser viewer port
--conf_threshold
1.5
Filter low-confidence points
--point_size
0.00001
Point cloud point size
--downsample_factor
10
Spatial downsampling for display
参数默认值描述
--port
8080
Viser查看器端口
--conf_threshold
1.5
过滤低置信度点
--point_size
0.00001
点云的点大小
--downsample_factor
10
用于显示的空间下采样因子

Python API Usage

Python API使用

Basic Streaming Inference

基础流式推理

python
import torch
from lingbot_map import LingBotMap  # adjust import to actual module structure
python
import torch
from lingbot_map import LingBotMap  # 根据实际模块结构调整导入路径

Load model

加载模型

device = "cuda" if torch.cuda.is_available() else "cpu" model = LingBotMap.from_pretrained("/path/to/checkpoint.pt") model = model.to(device).eval()
device = "cuda" if torch.cuda.is_available() else "cpu" model = LingBotMap.from_pretrained("/path/to/checkpoint.pt") model = model.to(device).eval()

Streaming inference over image list

对图像列表进行流式推理

from pathlib import Path from PIL import Image import torchvision.transforms as T
transform = T.Compose([ T.Resize((378, 518)), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
image_paths = sorted(Path("/path/to/images").glob("*.jpg"))
with torch.no_grad(): for img_path in image_paths: img = Image.open(img_path).convert("RGB") frame = transform(img).unsqueeze(0).to(device) output = model.stream(frame) # output contains: pointmap, confidence, camera pose
undefined
from pathlib import Path from PIL import Image import torchvision.transforms as T
transform = T.Compose([ T.Resize((378, 518)), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
image_paths = sorted(Path("/path/to/images").glob("*.jpg"))
with torch.no_grad(): for img_path in image_paths: img = Image.open(img_path).convert("RGB") frame = transform(img).unsqueeze(0).to(device) output = model.stream(frame) # output包含:pointmap、置信度、相机姿态
undefined

Loading and Running the Demo Programmatically

程序化加载并运行演示

python
undefined
python
undefined

The demo.py script is the primary entry point

demo.py脚本是主要入口

Run it as a subprocess or study it for API patterns

可通过子进程运行或参考其API模式

import subprocess
result = subprocess.run([ "python", "demo.py", "--model_path", "/path/to/checkpoint.pt", "--image_folder", "example/church", "--mask_sky", "--port", "8080" ], check=True)
undefined
import subprocess
result = subprocess.run([ "python", "demo.py", "--model_path", "/path/to/checkpoint.pt", "--image_folder", "example/church", "--mask_sky", "--port", "8080" ], check=True)
undefined

Video Input Pattern

视频输入模式

python
import cv2
import torch
python
import cv2
import torch

Extract frames from video for batch processing

从视频中提取帧用于批量处理

def extract_frames(video_path: str, fps: int = 10): cap = cv2.VideoCapture(video_path) video_fps = cap.get(cv2.CAP_PROP_FPS) interval = max(1, int(video_fps / fps))
frames = []
frame_idx = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    if frame_idx % interval == 0:
        # Convert BGR to RGB
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(frame_rgb)
    frame_idx += 1

cap.release()
return frames
frames = extract_frames("video.mp4", fps=10)
undefined
def extract_frames(video_path: str, fps: int = 10): cap = cv2.VideoCapture(video_path) video_fps = cap.get(cv2.CAP_PROP_FPS) interval = max(1, int(video_fps / fps))
frames = []
frame_idx = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    if frame_idx % interval == 0:
        # 将BGR转换为RGB
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(frame_rgb)
    frame_idx += 1

cap.release()
return frames
frames = extract_frames("video.mp4", fps=10)
undefined

Common Patterns

常用模式

Pattern 1: Outdoor Scene Reconstruction

模式1:户外场景重建

bash
undefined
bash
undefined

Always use --mask_sky for outdoor scenes to remove noisy sky points

户外场景请始终使用--mask_sky以去除噪点较多的天空点

python demo.py
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5
undefined
python demo.py
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5
undefined

Pattern 2: Long Indoor Sequence

模式2:长室内序列

bash
undefined
bash
undefined

Use keyframe_interval to manage KV cache for sequences 320-3000 frames

对于320-3000帧的序列,使用keyframe_interval管理KV缓存

python demo.py
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5
undefined
python demo.py
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5
undefined

Pattern 3: Very Long Video (>3000 frames)

模式3:超长视频(>3000帧)

bash
undefined
bash
undefined

Use windowed mode for extremely long sequences

对于极长序列,使用窗口模式

python demo.py
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64
undefined
python demo.py
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64
undefined

Pattern 4: High Quality Dense Reconstruction

模式4:高质量密集重建

bash
undefined
bash
undefined

Lower conf_threshold keeps more points, smaller downsample shows more detail

降低conf_threshold保留更多点,减小downsample_factor显示更多细节

python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005
undefined
python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005
undefined

Pattern 5: CPU / No FlashInfer Fallback

模式5:CPU / 无FlashInfer回退

bash
undefined
bash
undefined

When FlashInfer is unavailable, use SDPA

当FlashInfer不可用时,使用SDPA

python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa
undefined
python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa
undefined

Architecture Concepts

架构概念

ComponentRole
Anchor ContextCoordinate grounding to prevent drift
Pose-Reference WindowDense geometric cues from recent frames
Trajectory MemoryLong-range drift correction across the sequence
Paged KV CacheEfficient attention over long streaming sequences
组件作用
Anchor Context(锚点上下文)坐标锚定,防止漂移
Pose-Reference Window(姿态参考窗口)从最近帧中提取密集几何线索
Trajectory Memory(轨迹记忆)跨序列的长距离漂移校正
Paged KV Cache(分页KV缓存)对长流式序列进行高效注意力计算

Troubleshooting

故障排除

FlashInfer Not Available

FlashInfer不可用

bash
undefined
bash
undefined

Error: FlashInfer not found

错误提示:FlashInfer not found

Solution: Install or use SDPA fallback

解决方案:安装FlashInfer或回退到SDPA

pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

Or add --use_sdpa to any command

或在命令中添加--use_sdpa

python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa
undefined
python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa
undefined

CUDA Out of Memory on Long Sequences

长序列处理时CUDA内存不足

bash
undefined
bash
undefined

Reduce memory with keyframe interval

通过关键帧间隔减少内存占用

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --keyframe_interval 6
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --keyframe_interval 6

Or switch to windowed mode

或切换到窗口模式

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mode windowed --window_size 32
undefined
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mode windowed --window_size 32
undefined

Sky Mask Model Download Fails

天空遮罩模型下载失败

bash
undefined
bash
undefined

Manual download of skyseg.onnx

手动下载skyseg.onnx

Place in expected path or specify via --sky_mask_dir

将文件放置到预期路径或通过--sky_mask_dir指定

undefined
undefined

Low Quality / Noisy Point Cloud

点云质量低/存在噪点

bash
undefined
bash
undefined

Increase confidence threshold to filter noisy points

提高置信度阈值以过滤噪点

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --conf_threshold 2.5
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --conf_threshold 2.5

For outdoor, always add sky masking

户外场景请始终添加天空遮罩

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mask_sky --conf_threshold 2.0
undefined
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mask_sky --conf_threshold 2.0
undefined

Port Already in Use

端口已被占用

bash
undefined
bash
undefined

Change the viewer port

修改查看器端口

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --port 8090
undefined
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --port 8090
undefined

Images Not Loading

图像无法加载

bash
undefined
bash
undefined

Ensure images are sorted and in supported formats (jpg, png)

确保图像已排序且格式支持(jpg、png等)

ls /path/to/images | head -5
ls /path/to/images | head -5

Supported: .jpg, .jpeg, .png, .bmp, .webp

支持格式:.jpg, .jpeg, .png, .bmp, .webp

undefined
undefined

Performance Guidelines

性能指南

Sequence LengthRecommended ModeNotes
< 320 framesDefault streamingFull KV cache
320–3000 frames
--keyframe_interval 6
Reduces cache by 6x
> 3000 frames
--mode windowed --window_size 64
Sliding window
  • Target resolution: 518×378 for ~20 FPS throughput
  • GPU: CUDA-capable GPU required for practical speeds
  • Model size: ~4.63 GB checkpoint
序列长度推荐模式说明
< 320帧默认流式模式完整KV缓存
320–3000帧
--keyframe_interval 6
缓存占用减少6倍
> 3000帧
--mode windowed --window_size 64
滑动窗口
  • 目标分辨率:518×378,可达到约20 FPS的处理速度
  • GPU要求:需要支持CUDA的GPU以实现实用速度
  • 模型大小:检查点文件约4.63 GB

Citation

引用

bibtex
@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}
bibtex
@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}