Loading...
Loading...
Found 41 Skills
Analyze images — segment objects, detect, run OCR, describe, and answer visual questions via fal.ai vision models.
图片分析与识别,可分析本地图片、网络图片、视频、文件。适用于 OCR、物体识别、场景理解等。当用户发送图片或要求分析图片时必须使用此技能。
Feed-forward 3D foundation model for streaming scene reconstruction using Geometric Context Transformer
Masked Auto-Encoder (MAE) for self-supervised pretraining and fine-tuning. Masks random patches and reconstructs them to learn visual representations; supports pretrain and finetune stages. Use when training, evaluating, exporting, or running inference for a TAO MAE backbone. Trigger phrases include "pretrain MAE", "self-supervised vision pretraining", "Masked Autoencoder", "Mask Auto-Encoder", "MAE fine-tune".
NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation (semantic, instance, and panoptic masks) with occupancy completion. Built on a VGGT backbone with a Mask2Former-style head and 3D frustum reconstruction. Use when training, evaluating, exporting, or running inference for a TAO NVPanoptix3D model. Trigger phrases include "train NVPanoptix3D", "panoptic 3D reconstruction", "3D scene segmentation", "occupancy completion".