Loading...
Loading...
Feed-forward 3D foundation model for streaming scene reconstruction using Geometric Context Transformer
npx skill4agent add aradotso/trending-skills lingbot-map-3d-reconstructionSkill by ara.so — Daily 2026 Skills collection.
# 1. Create environment
conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map
# 2. Install PyTorch (CUDA 12.8)
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
# 3. Install lingbot-map
git clone https://github.com/Robbyant/lingbot-map.git
cd lingbot-map
pip install -e .
# 4. Install FlashInfer for fast paged KV cache attention (recommended)
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
# 5. Optional: visualization support
pip install -e ".[vis]"
# 6. Optional: sky masking for outdoor scenes
pip install onnxruntime # CPU
pip install onnxruntime-gpu # GPU# Download via huggingface_hub
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="robbyant/lingbot-map",
filename="checkpoint.pt"
)https://huggingface.co/robbyant/lingbot-maphttps://www.modelscope.cn/models/Robbyant/lingbot-map# From image folder
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/
# From video file
python demo.py --model_path /path/to/checkpoint.pt \
--video_path video.mp4 --fps 10
# Outdoor scene with sky masking
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --mask_sky
# Example scenes included in repo
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder example/church --mask_sky
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder example/oxford --mask_sky
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder example/university4 --mask_sky# Keyframe interval: store every Nth frame in KV cache (saves memory)
# Use when sequence > 320 frames
python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --keyframe_interval 6
# Windowed mode: for very long sequences (>3000 frames)
python demo.py --model_path /path/to/checkpoint.pt \
--video_path video.mp4 --fps 10 \
--mode windowed --window_size 64python demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --use_sdpapython demo.py --model_path /path/to/checkpoint.pt \
--image_folder /path/to/images/ --mask_sky \
--sky_mask_dir /path/to/cached_masks/ \
--sky_mask_visualization_dir /path/to/mask_viz/| Argument | Description |
|---|---|
| Path to model checkpoint (.pt file) |
| Directory of input images |
| Input video file path |
| Frames per second to sample from video |
| Argument | Default | Description |
|---|---|---|
| | |
| | Window size for windowed mode |
| | Store every Nth frame in KV cache |
| | Use PyTorch SDPA instead of FlashInfer |
| Argument | Description |
|---|---|
| Enable sky segmentation and masking |
| Custom directory for cached sky masks |
| Save side-by-side mask visualizations |
| Argument | Default | Description |
|---|---|---|
| | Viser viewer port |
| | Filter low-confidence points |
| | Point cloud point size |
| | Spatial downsampling for display |
import torch
from lingbot_map import LingBotMap # adjust import to actual module structure
# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
model = LingBotMap.from_pretrained("/path/to/checkpoint.pt")
model = model.to(device).eval()
# Streaming inference over image list
from pathlib import Path
from PIL import Image
import torchvision.transforms as T
transform = T.Compose([
T.Resize((378, 518)),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
image_paths = sorted(Path("/path/to/images").glob("*.jpg"))
with torch.no_grad():
for img_path in image_paths:
img = Image.open(img_path).convert("RGB")
frame = transform(img).unsqueeze(0).to(device)
output = model.stream(frame)
# output contains: pointmap, confidence, camera pose# The demo.py script is the primary entry point
# Run it as a subprocess or study it for API patterns
import subprocess
result = subprocess.run([
"python", "demo.py",
"--model_path", "/path/to/checkpoint.pt",
"--image_folder", "example/church",
"--mask_sky",
"--port", "8080"
], check=True)import cv2
import torch
# Extract frames from video for batch processing
def extract_frames(video_path: str, fps: int = 10):
cap = cv2.VideoCapture(video_path)
video_fps = cap.get(cv2.CAP_PROP_FPS)
interval = max(1, int(video_fps / fps))
frames = []
frame_idx = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
if frame_idx % interval == 0:
# Convert BGR to RGB
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frames.append(frame_rgb)
frame_idx += 1
cap.release()
return frames
frames = extract_frames("video.mp4", fps=10)# Always use --mask_sky for outdoor scenes to remove noisy sky points
python demo.py \
--model_path ./checkpoint.pt \
--image_folder ./outdoor_images \
--mask_sky \
--conf_threshold 2.0 \
--downsample_factor 5# Use keyframe_interval to manage KV cache for sequences 320-3000 frames
python demo.py \
--model_path ./checkpoint.pt \
--image_folder ./long_sequence \
--keyframe_interval 6 \
--conf_threshold 1.5# Use windowed mode for extremely long sequences
python demo.py \
--model_path ./checkpoint.pt \
--video_path long_video.mp4 \
--fps 5 \
--mode windowed \
--window_size 64# Lower conf_threshold keeps more points, smaller downsample shows more detail
python demo.py \
--model_path ./checkpoint.pt \
--image_folder ./images \
--conf_threshold 1.0 \
--downsample_factor 1 \
--point_size 0.00005# When FlashInfer is unavailable, use SDPA
python demo.py \
--model_path ./checkpoint.pt \
--image_folder ./images \
--use_sdpa| Component | Role |
|---|---|
| Anchor Context | Coordinate grounding to prevent drift |
| Pose-Reference Window | Dense geometric cues from recent frames |
| Trajectory Memory | Long-range drift correction across the sequence |
| Paged KV Cache | Efficient attention over long streaming sequences |
# Error: FlashInfer not found
# Solution: Install or use SDPA fallback
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
# Or add --use_sdpa to any command
python demo.py --model_path ./checkpoint.pt --image_folder ./imgs --use_sdpa# Reduce memory with keyframe interval
python demo.py --model_path ./checkpoint.pt \
--image_folder ./images --keyframe_interval 6
# Or switch to windowed mode
python demo.py --model_path ./checkpoint.pt \
--image_folder ./images --mode windowed --window_size 32# Manual download of skyseg.onnx
wget https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx
# Place in expected path or specify via --sky_mask_dir# Increase confidence threshold to filter noisy points
python demo.py --model_path ./checkpoint.pt \
--image_folder ./images --conf_threshold 2.5
# For outdoor, always add sky masking
python demo.py --model_path ./checkpoint.pt \
--image_folder ./images --mask_sky --conf_threshold 2.0# Change the viewer port
python demo.py --model_path ./checkpoint.pt \
--image_folder ./images --port 8090# Ensure images are sorted and in supported formats (jpg, png)
ls /path/to/images | head -5
# Supported: .jpg, .jpeg, .png, .bmp, .webp| Sequence Length | Recommended Mode | Notes |
|---|---|---|
| < 320 frames | Default streaming | Full KV cache |
| 320–3000 frames | | Reduces cache by 6x |
| > 3000 frames | | Sliding window |
@article{chen2026geometric,
title={Geometric Context Transformer for Streaming 3D Reconstruction},
author={Chen, Lin-Zhuo and Gao, Jian and Chen, Yihang and Cheng, Ka Leong and Sun, Yipengjing and Hu, Liangxiao and Xue, Nan and Zhu, Xing and Shen, Yujun and Yao, Yao and Xu, Yinghao},
journal={arXiv preprint arXiv:2604.14141},
year={2026}
}