lingbot-map-3d-reconstruction
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLingBot-Map 3D Reconstruction Skill
LingBot-Map 3D重建Skill
Skill by ara.so — Daily 2026 Skills collection.
LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from streaming image or video data using a Geometric Context Transformer. It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames via paged KV cache attention.
由ara.so开发的Skill — 属于Daily 2026 Skills合集。
LingBot-Map是一款前馈式3D基础模型,借助Geometric Context Transformer从流式图像或视频数据中重建场景。通过分页KV缓存注意力机制,在518×378分辨率下处理超过10000帧的序列时,可达到约20 FPS的速度。
What It Does
功能介绍
- Streaming 3D reconstruction from image sequences or video
- Feed-forward inference (no iterative optimization needed)
- Outputs: point clouds with per-point confidence, camera poses, depth maps
- Key features: anchor context, pose-reference window, trajectory memory for drift correction
- 流式3D重建:支持从图像序列或视频中重建场景
- 前馈式推理:无需迭代优化
- 输出内容:带逐点置信度的点云、相机姿态、深度图
- 核心特性:锚点上下文、姿态参考窗口、用于漂移校正的轨迹记忆
Installation
安装步骤
bash
undefinedbash
undefined1. Create environment
1. 创建环境
conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map
conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map
2. Install PyTorch (CUDA 12.8)
2. 安装PyTorch(CUDA 12.8版本)
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
3. Install lingbot-map
3. 安装lingbot-map
git clone https://github.com/Robbyant/lingbot-map.git
cd lingbot-map
pip install -e .
git clone https://github.com/Robbyant/lingbot-map.git
cd lingbot-map
pip install -e .
4. Install FlashInfer for fast paged KV cache attention (recommended)
4. 安装FlashInfer以实现高速分页KV缓存注意力(推荐)
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
5. Optional: visualization support
5. 可选:可视化支持
pip install -e ".[vis]"
pip install -e ".[vis]"
6. Optional: sky masking for outdoor scenes
6. 可选:户外场景天空遮罩
pip install onnxruntime # CPU
pip install onnxruntime-gpu # GPU
undefinedpip install onnxruntime # CPU版本
pip install onnxruntime-gpu # GPU版本
undefinedModel Download
模型下载
Models available on HuggingFace and ModelScope:
python
undefined模型可从HuggingFace和ModelScope获取:
python
undefinedDownload via huggingface_hub
通过huggingface_hub下载
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="robbyant/lingbot-map",
filename="checkpoint.pt"
)
Or manually download from:
- HuggingFace: `https://huggingface.co/robbyant/lingbot-map`
- ModelScope: `https://www.modelscope.cn/models/Robbyant/lingbot-map`from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="robbyant/lingbot-map",
filename="checkpoint.pt"
)
或手动从以下地址下载:
- HuggingFace: `https://huggingface.co/robbyant/lingbot-map`
- ModelScope: `https://www.modelscope.cn/models/Robbyant/lingbot-map`CLI Commands
CLI命令
Demo with Interactive 3D Viewer (browser at localhost:8080)
带交互式3D查看器的演示(浏览器访问localhost:8080)
bash
undefinedbash
undefinedFrom image folder
从图像文件夹读取
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/
--image_folder /path/to/images/
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/
--image_folder /path/to/images/
From video file
从视频文件读取
python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--video_path video.mp4 --fps 10
python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--video_path video.mp4 --fps 10
Outdoor scene with sky masking
带天空遮罩的户外场景
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --mask_sky
--image_folder /path/to/images/ --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --mask_sky
--image_folder /path/to/images/ --mask_sky
Example scenes included in repo
仓库中包含的示例场景
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/church --mask_sky
--image_folder example/church --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/oxford --mask_sky
--image_folder example/oxford --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/university4 --mask_sky
--image_folder example/university4 --mask_sky
undefinedpython demo.py --model_path /path/to/checkpoint.pt
--image_folder example/church --mask_sky
--image_folder example/church --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/oxford --mask_sky
--image_folder example/oxford --mask_sky
python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/university4 --mask_sky
--image_folder example/university4 --mask_sky
undefinedLong Sequence Handling
长序列处理
bash
undefinedbash
undefinedKeyframe interval: store every Nth frame in KV cache (saves memory)
关键帧间隔:每N帧在KV缓存中存储一次(节省内存)
Use when sequence > 320 frames
当序列长度>320帧时使用
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --keyframe_interval 6
--image_folder /path/to/images/ --keyframe_interval 6
python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --keyframe_interval 6
--image_folder /path/to/images/ --keyframe_interval 6
Windowed mode: for very long sequences (>3000 frames)
窗口模式:用于超长序列(>3000帧)
python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--mode windowed --window_size 64
--video_path video.mp4 --fps 10
--mode windowed --window_size 64
undefinedpython demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--mode windowed --window_size 64
--video_path video.mp4 --fps 10
--mode windowed --window_size 64
undefinedWithout FlashInfer (SDPA fallback)
不使用FlashInfer(回退到SDPA)
bash
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --use_sdpabash
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --use_sdpaSky Masking with Custom Paths
自定义路径的天空遮罩
bash
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --mask_sky \
--sky_mask_dir /path/to/cached_masks/ \
--sky_mask_visualization_dir /path/to/mask_viz/bash
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --mask_sky \
--sky_mask_dir /path/to/cached_masks/ \
--sky_mask_visualization_dir /path/to/mask_viz/CLI Arguments Reference
CLI参数参考
Input
输入参数
| Argument | Description |
|---|---|
| Path to model checkpoint (.pt file) |
| Directory of input images |
| Input video file path |
| Frames per second to sample from video |
| 参数 | 描述 |
|---|---|
| 模型检查点路径(.pt文件) |
| 输入图像目录 |
| 输入视频文件路径 |
| 从视频中采样的帧率 |
Inference Mode
推理模式
| Argument | Default | Description |
|---|---|---|
| | |
| | Window size for windowed mode |
| | Store every Nth frame in KV cache |
| | Use PyTorch SDPA instead of FlashInfer |
| 参数 | 默认值 | 描述 |
|---|---|---|
| | 可选 |
| | 窗口模式下的窗口大小 |
| | 每N帧在KV缓存中存储一次 |
| | 使用PyTorch SDPA替代FlashInfer |
Sky Masking
天空遮罩
| Argument | Description |
|---|---|
| Enable sky segmentation and masking |
| Custom directory for cached sky masks |
| Save side-by-side mask visualizations |
| 参数 | 描述 |
|---|---|
| 启用天空分割与遮罩 |
| 缓存天空遮罩的自定义目录 |
| 保存遮罩对比可视化结果 |
Visualization
可视化
| Argument | Default | Description |
|---|---|---|
| | Viser viewer port |
| | Filter low-confidence points |
| | Point cloud point size |
| | Spatial downsampling for display |
| 参数 | 默认值 | 描述 |
|---|---|---|
| | Viser查看器端口 |
| | 过滤低置信度点 |
| | 点云的点大小 |
| | 用于显示的空间下采样因子 |
Python API Usage
Python API使用
Basic Streaming Inference
基础流式推理
python
import torch
from lingbot_map import LingBotMap # adjust import to actual module structurepython
import torch
from lingbot_map import LingBotMap # 根据实际模块结构调整导入路径Load model
加载模型
device = "cuda" if torch.cuda.is_available() else "cpu"
model = LingBotMap.from_pretrained("/path/to/checkpoint.pt")
model = model.to(device).eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model = LingBotMap.from_pretrained("/path/to/checkpoint.pt")
model = model.to(device).eval()
Streaming inference over image list
对图像列表进行流式推理
from pathlib import Path
from PIL import Image
import torchvision.transforms as T
transform = T.Compose([
T.Resize((378, 518)),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
image_paths = sorted(Path("/path/to/images").glob("*.jpg"))
with torch.no_grad():
for img_path in image_paths:
img = Image.open(img_path).convert("RGB")
frame = transform(img).unsqueeze(0).to(device)
output = model.stream(frame)
# output contains: pointmap, confidence, camera pose
undefinedfrom pathlib import Path
from PIL import Image
import torchvision.transforms as T
transform = T.Compose([
T.Resize((378, 518)),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
image_paths = sorted(Path("/path/to/images").glob("*.jpg"))
with torch.no_grad():
for img_path in image_paths:
img = Image.open(img_path).convert("RGB")
frame = transform(img).unsqueeze(0).to(device)
output = model.stream(frame)
# output包含:pointmap、置信度、相机姿态
undefinedLoading and Running the Demo Programmatically
程序化加载并运行演示
python
undefinedpython
undefinedThe demo.py script is the primary entry point
demo.py脚本是主要入口
Run it as a subprocess or study it for API patterns
可通过子进程运行或参考其API模式
import subprocess
result = subprocess.run([
"python", "demo.py",
"--model_path", "/path/to/checkpoint.pt",
"--image_folder", "example/church",
"--mask_sky",
"--port", "8080"
], check=True)
undefinedimport subprocess
result = subprocess.run([
"python", "demo.py",
"--model_path", "/path/to/checkpoint.pt",
"--image_folder", "example/church",
"--mask_sky",
"--port", "8080"
], check=True)
undefinedVideo Input Pattern
视频输入模式
python
import cv2
import torchpython
import cv2
import torchExtract frames from video for batch processing
从视频中提取帧用于批量处理
def extract_frames(video_path: str, fps: int = 10):
cap = cv2.VideoCapture(video_path)
video_fps = cap.get(cv2.CAP_PROP_FPS)
interval = max(1, int(video_fps / fps))
frames = []
frame_idx = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
if frame_idx % interval == 0:
# Convert BGR to RGB
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frames.append(frame_rgb)
frame_idx += 1
cap.release()
return framesframes = extract_frames("video.mp4", fps=10)
undefineddef extract_frames(video_path: str, fps: int = 10):
cap = cv2.VideoCapture(video_path)
video_fps = cap.get(cv2.CAP_PROP_FPS)
interval = max(1, int(video_fps / fps))
frames = []
frame_idx = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
if frame_idx % interval == 0:
# 将BGR转换为RGB
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frames.append(frame_rgb)
frame_idx += 1
cap.release()
return framesframes = extract_frames("video.mp4", fps=10)
undefinedCommon Patterns
常用模式
Pattern 1: Outdoor Scene Reconstruction
模式1:户外场景重建
bash
undefinedbash
undefinedAlways use --mask_sky for outdoor scenes to remove noisy sky points
户外场景请始终使用--mask_sky以去除噪点较多的天空点
python demo.py
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5
undefinedpython demo.py
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5
undefinedPattern 2: Long Indoor Sequence
模式2:长室内序列
bash
undefinedbash
undefinedUse keyframe_interval to manage KV cache for sequences 320-3000 frames
对于320-3000帧的序列,使用keyframe_interval管理KV缓存
python demo.py
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5
undefinedpython demo.py
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5
undefinedPattern 3: Very Long Video (>3000 frames)
模式3:超长视频(>3000帧)
bash
undefinedbash
undefinedUse windowed mode for extremely long sequences
对于极长序列,使用窗口模式
python demo.py
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64
undefinedpython demo.py
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64
undefinedPattern 4: High Quality Dense Reconstruction
模式4:高质量密集重建
bash
undefinedbash
undefinedLower conf_threshold keeps more points, smaller downsample shows more detail
降低conf_threshold保留更多点,减小downsample_factor显示更多细节
python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005
undefinedpython demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005
undefinedPattern 5: CPU / No FlashInfer Fallback
模式5:CPU / 无FlashInfer回退
bash
undefinedbash
undefinedWhen FlashInfer is unavailable, use SDPA
当FlashInfer不可用时,使用SDPA
python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa
undefinedpython demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa
undefinedArchitecture Concepts
架构概念
| Component | Role |
|---|---|
| Anchor Context | Coordinate grounding to prevent drift |
| Pose-Reference Window | Dense geometric cues from recent frames |
| Trajectory Memory | Long-range drift correction across the sequence |
| Paged KV Cache | Efficient attention over long streaming sequences |
| 组件 | 作用 |
|---|---|
| Anchor Context(锚点上下文) | 坐标锚定,防止漂移 |
| Pose-Reference Window(姿态参考窗口) | 从最近帧中提取密集几何线索 |
| Trajectory Memory(轨迹记忆) | 跨序列的长距离漂移校正 |
| Paged KV Cache(分页KV缓存) | 对长流式序列进行高效注意力计算 |
Troubleshooting
故障排除
FlashInfer Not Available
FlashInfer不可用
bash
undefinedbash
undefinedError: FlashInfer not found
错误提示:FlashInfer not found
Solution: Install or use SDPA fallback
解决方案:安装FlashInfer或回退到SDPA
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
Or add --use_sdpa to any command
或在命令中添加--use_sdpa
python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa
undefinedpython demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa
undefinedCUDA Out of Memory on Long Sequences
长序列处理时CUDA内存不足
bash
undefinedbash
undefinedReduce memory with keyframe interval
通过关键帧间隔减少内存占用
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --keyframe_interval 6
--image_folder ./images --keyframe_interval 6
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --keyframe_interval 6
--image_folder ./images --keyframe_interval 6
Or switch to windowed mode
或切换到窗口模式
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mode windowed --window_size 32
--image_folder ./images --mode windowed --window_size 32
undefinedpython demo.py --model_path ./checkpoint.pt
--image_folder ./images --mode windowed --window_size 32
--image_folder ./images --mode windowed --window_size 32
undefinedSky Mask Model Download Fails
天空遮罩模型下载失败
bash
undefinedbash
undefinedManual download of skyseg.onnx
手动下载skyseg.onnx
Place in expected path or specify via --sky_mask_dir
将文件放置到预期路径或通过--sky_mask_dir指定
undefinedundefinedLow Quality / Noisy Point Cloud
点云质量低/存在噪点
bash
undefinedbash
undefinedIncrease confidence threshold to filter noisy points
提高置信度阈值以过滤噪点
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --conf_threshold 2.5
--image_folder ./images --conf_threshold 2.5
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --conf_threshold 2.5
--image_folder ./images --conf_threshold 2.5
For outdoor, always add sky masking
户外场景请始终添加天空遮罩
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mask_sky --conf_threshold 2.0
--image_folder ./images --mask_sky --conf_threshold 2.0
undefinedpython demo.py --model_path ./checkpoint.pt
--image_folder ./images --mask_sky --conf_threshold 2.0
--image_folder ./images --mask_sky --conf_threshold 2.0
undefinedPort Already in Use
端口已被占用
bash
undefinedbash
undefinedChange the viewer port
修改查看器端口
python demo.py --model_path ./checkpoint.pt
--image_folder ./images --port 8090
--image_folder ./images --port 8090
undefinedpython demo.py --model_path ./checkpoint.pt
--image_folder ./images --port 8090
--image_folder ./images --port 8090
undefinedImages Not Loading
图像无法加载
bash
undefinedbash
undefinedEnsure images are sorted and in supported formats (jpg, png)
确保图像已排序且格式支持(jpg、png等)
ls /path/to/images | head -5
ls /path/to/images | head -5
Supported: .jpg, .jpeg, .png, .bmp, .webp
支持格式:.jpg, .jpeg, .png, .bmp, .webp
undefinedundefinedPerformance Guidelines
性能指南
| Sequence Length | Recommended Mode | Notes |
|---|---|---|
| < 320 frames | Default streaming | Full KV cache |
| 320–3000 frames | | Reduces cache by 6x |
| > 3000 frames | | Sliding window |
- Target resolution: 518×378 for ~20 FPS throughput
- GPU: CUDA-capable GPU required for practical speeds
- Model size: ~4.63 GB checkpoint
| 序列长度 | 推荐模式 | 说明 |
|---|---|---|
| < 320帧 | 默认流式模式 | 完整KV缓存 |
| 320–3000帧 | | 缓存占用减少6倍 |
| > 3000帧 | | 滑动窗口 |
- 目标分辨率:518×378,可达到约20 FPS的处理速度
- GPU要求:需要支持CUDA的GPU以实现实用速度
- 模型大小:检查点文件约4.63 GB
Citation
引用
bibtex
@article{chen2026geometric,
title={Geometric Context Transformer for Streaming 3D Reconstruction},
author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
journal={arXiv preprint arXiv:2604.14141},
year={2026}
}bibtex
@article{chen2026geometric,
title={Geometric Context Transformer for Streaming 3D Reconstruction},
author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
journal={arXiv preprint arXiv:2604.14141},
year={2026}
}