lingbot-map-3d-reconstruction

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LingBot-Map 3D Reconstruction Skill

LingBot-Map 3D重建Skill

Skill by ara.so — Daily 2026 Skills collection.

LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from streaming image or video data using a Geometric Context Transformer. It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames via paged KV cache attention.

由ara.so开发的Skill — 属于Daily 2026 Skills合集。

LingBot-Map是一款前馈式3D基础模型，借助Geometric Context Transformer从流式图像或视频数据中重建场景。通过分页KV缓存注意力机制，在518×378分辨率下处理超过10000帧的序列时，可达到约20 FPS的速度。

What It Does

功能介绍

Streaming 3D reconstruction from image sequences or video
Feed-forward inference (no iterative optimization needed)
Outputs: point clouds with per-point confidence, camera poses, depth maps
Key features: anchor context, pose-reference window, trajectory memory for drift correction

流式3D重建：支持从图像序列或视频中重建场景
前馈式推理：无需迭代优化
输出内容：带逐点置信度的点云、相机姿态、深度图
核心特性：锚点上下文、姿态参考窗口、用于漂移校正的轨迹记忆

Installation

安装步骤

bash

undefined

bash

undefined

1. Create environment

1. 创建环境

conda create -n lingbot-map python=3.10 -y conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

2. 安装PyTorch（CUDA 12.8版本）

pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128

3. Install lingbot-map

3. 安装lingbot-map

git clone https://github.com/Robbyant/lingbot-map.git cd lingbot-map pip install -e .

4. Install FlashInfer for fast paged KV cache attention (recommended)

4. 安装FlashInfer以实现高速分页KV缓存注意力（推荐）

pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

5. Optional: visualization support

5. 可选：可视化支持

pip install -e ".[vis]"

6. Optional: sky masking for outdoor scenes

6. 可选：户外场景天空遮罩

pip install onnxruntime # CPU pip install onnxruntime-gpu # GPU

undefined

pip install onnxruntime # CPU版本 pip install onnxruntime-gpu # GPU版本

undefined

Model Download

模型下载

Models available on HuggingFace and ModelScope:

python

undefined

模型可从HuggingFace和ModelScope获取：

python

undefined

Download via huggingface_hub

通过huggingface_hub下载

from huggingface_hub import hf_hub_download

model_path = hf_hub_download( repo_id="robbyant/lingbot-map", filename="checkpoint.pt" )


Or manually download from:
- HuggingFace: `https://huggingface.co/robbyant/lingbot-map`
- ModelScope: `https://www.modelscope.cn/models/Robbyant/lingbot-map`

from huggingface_hub import hf_hub_download

model_path = hf_hub_download( repo_id="robbyant/lingbot-map", filename="checkpoint.pt" )


或手动从以下地址下载：
- HuggingFace: `https://huggingface.co/robbyant/lingbot-map`
- ModelScope: `https://www.modelscope.cn/models/Robbyant/lingbot-map`

CLI Commands

CLI命令

Demo with Interactive 3D Viewer (browser at localhost:8080)

带交互式3D查看器的演示（浏览器访问localhost:8080）

bash

undefined

bash

undefined

From image folder

从图像文件夹读取

python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/

From video file

从视频文件读取

python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10

Outdoor scene with sky masking

带天空遮罩的户外场景

python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --mask_sky

Example scenes included in repo

仓库中包含的示例场景

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/church --mask_sky

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/oxford --mask_sky

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/university4 --mask_sky

undefined

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/church --mask_sky

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/oxford --mask_sky

python demo.py --model_path /path/to/checkpoint.pt
--image_folder example/university4 --mask_sky

undefined

Long Sequence Handling

长序列处理

bash

undefined

bash

undefined

Keyframe interval: store every Nth frame in KV cache (saves memory)

关键帧间隔：每N帧在KV缓存中存储一次（节省内存）

Use when sequence > 320 frames

当序列长度>320帧时使用

python demo.py --model_path /path/to/checkpoint.pt
--image_folder /path/to/images/ --keyframe_interval 6

Windowed mode: for very long sequences (>3000 frames)

窗口模式：用于超长序列（>3000帧）

python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--mode windowed --window_size 64

undefined

python demo.py --model_path /path/to/checkpoint.pt
--video_path video.mp4 --fps 10
--mode windowed --window_size 64

undefined

Without FlashInfer (SDPA fallback)

不使用FlashInfer（回退到SDPA）

bash

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

bash

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

Sky Masking with Custom Paths

自定义路径的天空遮罩

bash

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky \
    --sky_mask_dir /path/to/cached_masks/ \
    --sky_mask_visualization_dir /path/to/mask_viz/

bash

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky \
    --sky_mask_dir /path/to/cached_masks/ \
    --sky_mask_visualization_dir /path/to/mask_viz/

CLI Arguments Reference

CLI参数参考

Input

输入参数

Argument	Description
`--model_path`	Path to model checkpoint (.pt file)
`--image_folder`	Directory of input images
`--video_path`	Input video file path
`--fps`	Frames per second to sample from video

参数	描述
`--model_path`	模型检查点路径（.pt文件）
`--image_folder`	输入图像目录
`--video_path`	输入视频文件路径
`--fps`	从视频中采样的帧率

Inference Mode

推理模式

Argument	Default	Description
`--mode`	`streaming`	`streaming` or `windowed`
`--window_size`	`64`	Window size for windowed mode
`--keyframe_interval`	`1`	Store every Nth frame in KV cache
`--use_sdpa`	`False`	Use PyTorch SDPA instead of FlashInfer

参数	默认值	描述
`--mode`	`streaming`	可选 `streaming` 或 `windowed`
`--window_size`	`64`	窗口模式下的窗口大小
`--keyframe_interval`	`1`	每N帧在KV缓存中存储一次
`--use_sdpa`	`False`	使用PyTorch SDPA替代FlashInfer

Sky Masking

天空遮罩

Argument	Description
`--mask_sky`	Enable sky segmentation and masking
`--sky_mask_dir`	Custom directory for cached sky masks
`--sky_mask_visualization_dir`	Save side-by-side mask visualizations

参数	描述
`--mask_sky`	启用天空分割与遮罩
`--sky_mask_dir`	缓存天空遮罩的自定义目录
`--sky_mask_visualization_dir`	保存遮罩对比可视化结果

Visualization

可视化

Argument	Default	Description
`--port`	`8080`	Viser viewer port
`--conf_threshold`	`1.5`	Filter low-confidence points
`--point_size`	`0.00001`	Point cloud point size
`--downsample_factor`	`10`	Spatial downsampling for display

参数	默认值	描述
`--port`	`8080`	Viser查看器端口
`--conf_threshold`	`1.5`	过滤低置信度点
`--point_size`	`0.00001`	点云的点大小
`--downsample_factor`	`10`	用于显示的空间下采样因子

Python API Usage

Python API使用

Basic Streaming Inference

基础流式推理

python

import torch
from lingbot_map import LingBotMap  # adjust import to actual module structure

python

import torch
from lingbot_map import LingBotMap  # 根据实际模块结构调整导入路径

Load model

加载模型

device = "cuda" if torch.cuda.is_available() else "cpu" model = LingBotMap.from_pretrained("/path/to/checkpoint.pt") model = model.to(device).eval()

Streaming inference over image list

对图像列表进行流式推理

from pathlib import Path from PIL import Image import torchvision.transforms as T

transform = T.Compose([ T.Resize((378, 518)), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])

image_paths = sorted(Path("/path/to/images").glob("*.jpg"))

with torch.no_grad(): for img_path in image_paths: img = Image.open(img_path).convert("RGB") frame = transform(img).unsqueeze(0).to(device) output = model.stream(frame) # output contains: pointmap, confidence, camera pose

undefined

from pathlib import Path from PIL import Image import torchvision.transforms as T

transform = T.Compose([ T.Resize((378, 518)), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])

image_paths = sorted(Path("/path/to/images").glob("*.jpg"))

undefined

Loading and Running the Demo Programmatically

程序化加载并运行演示

python

undefined

python

undefined

The demo.py script is the primary entry point

demo.py脚本是主要入口

Run it as a subprocess or study it for API patterns

可通过子进程运行或参考其API模式

import subprocess

result = subprocess.run([ "python", "demo.py", "--model_path", "/path/to/checkpoint.pt", "--image_folder", "example/church", "--mask_sky", "--port", "8080" ], check=True)

undefined

import subprocess

result = subprocess.run([ "python", "demo.py", "--model_path", "/path/to/checkpoint.pt", "--image_folder", "example/church", "--mask_sky", "--port", "8080" ], check=True)

undefined

Video Input Pattern

视频输入模式

python

import cv2
import torch

python

import cv2
import torch

Extract frames from video for batch processing

从视频中提取帧用于批量处理

def extract_frames(video_path: str, fps: int = 10): cap = cv2.VideoCapture(video_path) video_fps = cap.get(cv2.CAP_PROP_FPS) interval = max(1, int(video_fps / fps))

frames = []
frame_idx = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    if frame_idx % interval == 0:
        # Convert BGR to RGB
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(frame_rgb)
    frame_idx += 1

cap.release()
return frames

frames = extract_frames("video.mp4", fps=10)

undefined

def extract_frames(video_path: str, fps: int = 10): cap = cv2.VideoCapture(video_path) video_fps = cap.get(cv2.CAP_PROP_FPS) interval = max(1, int(video_fps / fps))

frames = []
frame_idx = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    if frame_idx % interval == 0:
        # 将BGR转换为RGB
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(frame_rgb)
    frame_idx += 1

cap.release()
return frames

frames = extract_frames("video.mp4", fps=10)

undefined

Common Patterns

常用模式

Pattern 1: Outdoor Scene Reconstruction

模式1：户外场景重建

bash

undefined

bash

undefined

Always use --mask_sky for outdoor scenes to remove noisy sky points

户外场景请始终使用--mask_sky以去除噪点较多的天空点

python demo.py
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5

undefined

python demo.py
--model_path ./checkpoint.pt
--image_folder ./outdoor_images
--mask_sky
--conf_threshold 2.0
--downsample_factor 5

undefined

Pattern 2: Long Indoor Sequence

模式2：长室内序列

bash

undefined

bash

undefined

Use keyframe_interval to manage KV cache for sequences 320-3000 frames

对于320-3000帧的序列，使用keyframe_interval管理KV缓存

python demo.py
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5

undefined

python demo.py
--model_path ./checkpoint.pt
--image_folder ./long_sequence
--keyframe_interval 6
--conf_threshold 1.5

undefined

Pattern 3: Very Long Video (>3000 frames)

模式3：超长视频（>3000帧）

bash

undefined

bash

undefined

Use windowed mode for extremely long sequences

对于极长序列，使用窗口模式

python demo.py
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64

undefined

python demo.py
--model_path ./checkpoint.pt
--video_path long_video.mp4
--fps 5
--mode windowed
--window_size 64

undefined

Pattern 4: High Quality Dense Reconstruction

模式4：高质量密集重建

bash

undefined

bash

undefined

Lower conf_threshold keeps more points, smaller downsample shows more detail

降低conf_threshold保留更多点，减小downsample_factor显示更多细节

python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005

undefined

python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--conf_threshold 1.0
--downsample_factor 1
--point_size 0.00005

undefined

Pattern 5: CPU / No FlashInfer Fallback

模式5：CPU / 无FlashInfer回退

bash

undefined

bash

undefined

When FlashInfer is unavailable, use SDPA

当FlashInfer不可用时，使用SDPA

python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa

undefined

python demo.py
--model_path ./checkpoint.pt
--image_folder ./images
--use_sdpa

undefined

Architecture Concepts

架构概念

Component	Role
Anchor Context	Coordinate grounding to prevent drift
Pose-Reference Window	Dense geometric cues from recent frames
Trajectory Memory	Long-range drift correction across the sequence
Paged KV Cache	Efficient attention over long streaming sequences

组件	作用
Anchor Context（锚点上下文）	坐标锚定，防止漂移
Pose-Reference Window（姿态参考窗口）	从最近帧中提取密集几何线索
Trajectory Memory（轨迹记忆）	跨序列的长距离漂移校正
Paged KV Cache（分页KV缓存）	对长流式序列进行高效注意力计算

Troubleshooting

故障排除

FlashInfer Not Available

FlashInfer不可用

bash

undefined

bash

undefined

Error: FlashInfer not found

错误提示：FlashInfer not found

Solution: Install or use SDPA fallback

解决方案：安装FlashInfer或回退到SDPA

pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

Or add --use_sdpa to any command

或在命令中添加--use_sdpa

python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa

undefined

python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa

undefined

CUDA Out of Memory on Long Sequences

长序列处理时CUDA内存不足

bash

undefined

bash

undefined

Reduce memory with keyframe interval

通过关键帧间隔减少内存占用

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --keyframe_interval 6

Or switch to windowed mode

或切换到窗口模式

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mode windowed --window_size 32

undefined

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mode windowed --window_size 32

undefined

Sky Mask Model Download Fails

天空遮罩模型下载失败

bash

undefined

bash

undefined

Manual download of skyseg.onnx

手动下载skyseg.onnx

wget https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx

Place in expected path or specify via --sky_mask_dir

将文件放置到预期路径或通过--sky_mask_dir指定

undefined

undefined

Low Quality / Noisy Point Cloud

点云质量低/存在噪点

bash

undefined

bash

undefined

Increase confidence threshold to filter noisy points

提高置信度阈值以过滤噪点

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --conf_threshold 2.5

For outdoor, always add sky masking

户外场景请始终添加天空遮罩

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mask_sky --conf_threshold 2.0

undefined

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --mask_sky --conf_threshold 2.0

undefined

Port Already in Use

端口已被占用

bash

undefined

bash

undefined

Change the viewer port

修改查看器端口

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --port 8090

undefined

python demo.py --model_path ./checkpoint.pt
--image_folder ./images --port 8090

undefined

Images Not Loading

图像无法加载

bash

undefined

bash

undefined

Ensure images are sorted and in supported formats (jpg, png)

确保图像已排序且格式支持（jpg、png等）

ls /path/to/images | head -5

Supported: .jpg, .jpeg, .png, .bmp, .webp

支持格式：.jpg, .jpeg, .png, .bmp, .webp

undefined

undefined

Performance Guidelines

性能指南

Sequence Length	Recommended Mode	Notes
< 320 frames	Default streaming	Full KV cache
320–3000 frames	`--keyframe_interval 6`	Reduces cache by 6x
> 3000 frames	`--mode windowed --window_size 64`	Sliding window

Target resolution: 518×378 for ~20 FPS throughput
GPU: CUDA-capable GPU required for practical speeds
Model size: ~4.63 GB checkpoint

序列长度	推荐模式	说明
< 320帧	默认流式模式	完整KV缓存
320–3000帧	`--keyframe_interval 6`	缓存占用减少6倍
> 3000帧	`--mode windowed --window_size 64`	滑动窗口

目标分辨率：518×378，可达到约20 FPS的处理速度
GPU要求：需要支持CUDA的GPU以实现实用速度
模型大小：检查点文件约4.63 GB

Citation

引用

bibtex

@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}

bibtex

@article{chen2026geometric,
  title={Geometric Context Transformer for Streaming 3D Reconstruction},
  author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
  journal={arXiv preprint arXiv:2604.14141},
  year={2026}
}