tribev2-brain-encoding
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTRIBE v2 Brain Encoding Model
TRIBE v2 大脑编码模型
Skill by ara.so — Daily 2026 Skills collection
TRIBE v2 is Meta's multimodal foundation model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines LLaMA 3.2 (text), V-JEPA2 (video), and Wav2Vec-BERT (audio) encoders into a unified Transformer architecture that maps multimodal representations onto the cortical surface (fsaverage5, ~20k vertices).
技能来自 ara.so —— 2026每日技能合集
TRIBE v2是Meta推出的多模态基础模型,可预测自然刺激(视频、音频、文本)下的fMRI大脑响应。它将LLaMA 3.2(文本)、V-JEPA2(视频)和Wav2Vec-BERT(音频)编码器整合到统一的Transformer架构中,将多模态表征映射到大脑皮层表面(fsaverage5,约20000个顶点)。
Installation
安装
bash
undefinedbash
undefinedInference only
Inference only
pip install -e .
pip install -e .
With brain visualization (PyVista & Nilearn)
With brain visualization (PyVista & Nilearn)
pip install -e ".[plotting]"
pip install -e ".[plotting]"
Full training dependencies (PyTorch Lightning, W&B, etc.)
Full training dependencies (PyTorch Lightning, W&B, etc.)
pip install -e ".[training]"
undefinedpip install -e ".[training]"
undefinedQuick Start — Inference
快速开始——推理
Load pretrained model and predict from video
加载预训练模型并从视频预测
python
from tribev2 import TribeModelpython
from tribev2 import TribeModelLoad from HuggingFace (downloads weights to cache)
Load from HuggingFace (downloads weights to cache)
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
Build events dataframe from a video file
Build events dataframe from a video file
df = model.get_events_dataframe(video_path="path/to/video.mp4")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
Predict brain responses
Predict brain responses
preds, segments = model.predict(events=df)
print(preds.shape) # (n_timesteps, n_vertices) on fsaverage5
undefinedpreds, segments = model.predict(events=df)
print(preds.shape) # (n_timesteps, n_vertices) on fsaverage5
undefinedMultimodal input — video + audio + text
多模态输入——视频+音频+文本
python
from tribev2 import TribeModel
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")python
from tribev2 import TribeModel
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")All modalities together (text is auto-converted to speech and transcribed)
All modalities together (text is auto-converted to speech and transcribed)
df = model.get_events_dataframe(
video_path="path/to/video.mp4",
audio_path="path/to/audio.wav", # optional, overrides video audio
text_path="path/to/script.txt", # optional, auto-timed
)
preds, segments = model.predict(events=df)
print(preds.shape) # (n_timesteps, n_vertices)
undefineddf = model.get_events_dataframe(
video_path="path/to/video.mp4",
audio_path="path/to/audio.wav", # optional, overrides video audio
text_path="path/to/script.txt", # optional, auto-timed
)
preds, segments = model.predict(events=df)
print(preds.shape) # (n_timesteps, n_vertices)
undefinedText-only prediction
仅文本预测
python
from tribev2 import TribeModel
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)python
from tribev2 import TribeModel
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)Brain Visualization
大脑可视化
python
from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)python
from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)Plot a single timepoint on the cortical surface
Plot a single timepoint on the cortical surface
plot_brain_surface(preds[0], backend="nilearn") # or backend="pyvista"
undefinedplot_brain_surface(preds[0], backend="nilearn") # or backend="pyvista"
undefinedTraining a Model from Scratch
从头开始训练模型
1. Set environment variables
1. 设置环境变量
bash
export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"bash
export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"2. Authenticate with HuggingFace (required for LLaMA 3.2)
2. 登录HuggingFace(LLaMA 3.2必需)
bash
huggingface-cli loginbash
huggingface-cli loginPaste a HuggingFace read token when prompted
Paste a HuggingFace read token when prompted
Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B
Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B
undefinedundefined3. Local test run
3. 本地测试运行
bash
python -m tribev2.grids.test_runbash
python -m tribev2.grids.test_run4. Full grid search on Slurm
4. 在Slurm上进行完整网格搜索
bash
undefinedbash
undefinedCortical surface model
Cortical surface model
python -m tribev2.grids.run_cortical
python -m tribev2.grids.run_cortical
Subcortical regions
Subcortical regions
python -m tribev2.grids.run_subcortical
undefinedpython -m tribev2.grids.run_subcortical
undefinedKey API — TribeModel
核心API — TribeModel
python
from tribev2 import TribeModelpython
from tribev2 import TribeModelLoad pretrained weights
Load pretrained weights
model = TribeModel.from_pretrained(
"facebook/tribev2",
cache_folder="./cache" # local cache for HuggingFace weights
)
model = TribeModel.from_pretrained(
"facebook/tribev2",
cache_folder="./cache" # local cache for HuggingFace weights
)
Build events dataframe (word-level timings, chunking, etc.)
Build events dataframe (word-level timings, chunking, etc.)
df = model.get_events_dataframe(
video_path=None, # str path to .mp4
audio_path=None, # str path to .wav
text_path=None, # str path to .txt
)
df = model.get_events_dataframe(
video_path=None, # str path to .mp4
audio_path=None, # str path to .wav
text_path=None, # str path to .txt
)
Run prediction
Run prediction
preds, segments = model.predict(events=df)
preds, segments = model.predict(events=df)
preds: np.ndarray of shape (n_timesteps, n_vertices)
preds: np.ndarray of shape (n_timesteps, n_vertices)
segments: list of segment metadata dicts
segments: list of segment metadata dicts
undefinedundefinedProject Structure
项目结构
tribev2/
├── main.py # Experiment pipeline: Data, TribeExperiment
├── model.py # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py # PyTorch Lightning training module
├── demo_utils.py # TribeModel and inference helpers
├── eventstransforms.py # Event transforms (word extraction, chunking)
├── utils.py # Multi-study loading, splitting, subject weighting
├── utils_fmri.py # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│ ├── defaults.py # Full default experiment configuration
│ └── test_run.py # Quick local test entry point
├── plotting/ # Brain visualization backends
└── studies/ # Dataset definitions (Algonauts2025, Lahner2024, …)tribev2/
├── main.py # Experiment pipeline: Data, TribeExperiment
├── model.py # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py # PyTorch Lightning training module
├── demo_utils.py # TribeModel and inference helpers
├── eventstransforms.py # Event transforms (word extraction, chunking)
├── utils.py # Multi-study loading, splitting, subject weighting
├── utils_fmri.py # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│ ├── defaults.py # Full default experiment configuration
│ └── test_run.py # Quick local test entry point
├── plotting/ # Brain visualization backends
└── studies/ # Dataset definitions (Algonauts2025, Lahner2024, …)Configuration — Defaults
配置——默认设置
Edit or set environment variables:
tribev2/grids/defaults.pypython
undefined编辑或设置环境变量:
tribev2/grids/defaults.pypython
undefinedtribev2/grids/defaults.py (key fields)
tribev2/grids/defaults.py (key fields)
{
"datapath": "/path/to/studies", # override with DATAPATH env var
"savepath": "/path/to/output", # override with SAVEPATH env var
"slurm_partition": "learnfair", # override with SLURM_PARTITION env var
"model": "FmriEncoder",
"modalities": ["video", "audio", "text"],
"surface": "fsaverage5", # ~20k vertices
}
undefined{
"datapath": "/path/to/studies", # override with DATAPATH env var
"savepath": "/path/to/output", # override with SAVEPATH env var
"slurm_partition": "learnfair", # override with SLURM_PARTITION env var
"model": "FmriEncoder",
"modalities": ["video", "audio", "text"],
"surface": "fsaverage5", # ~20k vertices
}
undefinedCustom Experiment with PyTorch Lightning
使用PyTorch Lightning进行自定义实验
python
from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as plpython
from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as plConfigure experiment
Configure experiment
experiment = TribeExperiment(
datapath="/path/to/studies",
savepath="/path/to/output",
modalities=["video", "audio", "text"],
)
data = Data(experiment)
module = TribePLModule(experiment)
trainer = pl.Trainer(
max_epochs=50,
accelerator="gpu",
devices=4,
)
trainer.fit(module, data)
undefinedexperiment = TribeExperiment(
datapath="/path/to/studies",
savepath="/path/to/output",
modalities=["video", "audio", "text"],
)
data = Data(experiment)
module = TribePLModule(experiment)
trainer = pl.Trainer(
max_epochs=50,
accelerator="gpu",
devices=4,
)
trainer.fit(module, data)
undefinedWorking with fMRI Surfaces
处理fMRI皮层表面数据
python
from tribev2.utils_fmri import project_to_fsaverage, get_roi_maskpython
from tribev2.utils_fmri import project_to_fsaverage, get_roi_maskProject MNI coordinates to fsaverage5 surface
Project MNI coordinates to fsaverage5 surface
surface_data = project_to_fsaverage(mni_data, target="fsaverage5")
surface_data = project_to_fsaverage(mni_data, target="fsaverage5")
Get a specific ROI mask (e.g., early visual cortex)
Get a specific ROI mask (e.g., early visual cortex)
roi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5")
v1_responses = preds[:, roi_mask]
print(v1_responses.shape) # (n_timesteps, n_v1_vertices)
undefinedroi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5")
v1_responses = preds[:, roi_mask]
print(v1_responses.shape) # (n_timesteps, n_v1_vertices)
undefinedCommon Patterns
常见模式
Batch prediction over multiple videos
批量预测多个视频
python
from tribev2 import TribeModel
import numpy as np
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []
for vp in video_paths:
df = model.get_events_dataframe(video_path=vp)
preds, segments = model.predict(events=df)
all_predictions.append(preds)python
from tribev2 import TribeModel
import numpy as np
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []
for vp in video_paths:
df = model.get_events_dataframe(video_path=vp)
preds, segments = model.predict(events=df)
all_predictions.append(preds)all_predictions: list of (n_timesteps_i, n_vertices) arrays
all_predictions: list of (n_timesteps_i, n_vertices) arrays
undefinedundefinedExtract predictions for specific brain region
提取特定脑区的预测结果
python
from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)python
from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)Focus on auditory cortex
Focus on auditory cortex
ac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5")
auditory_responses = preds[:, ac_mask] # (n_timesteps, n_ac_vertices)
undefinedac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5")
auditory_responses = preds[:, ac_mask] # (n_timesteps, n_ac_vertices)
undefinedAccess segment timing metadata
访问片段时序元数据
python
preds, segments = model.predict(events=df)
for i, seg in enumerate(segments):
print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
print(f" Brain response shape: {preds[i].shape}")python
preds, segments = model.predict(events=df)
for i, seg in enumerate(segments):
print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
print(f" Brain response shape: {preds[i].shape}")Troubleshooting
故障排除
LLaMA 3.2 access denied
bash
undefinedLLaMA 3.2访问被拒绝
bash
undefinedMust request access at https://huggingface.co/meta-llama/Llama-3.2-3B
Must request access at https://huggingface.co/meta-llama/Llama-3.2-3B
Then authenticate:
Then authenticate:
huggingface-cli login
huggingface-cli login
Use a HuggingFace token with read permissions
Use a HuggingFace token with read permissions
**CUDA out of memory during inference**
```python
**推理时CUDA内存不足**
```pythonUse CPU for inference on smaller machines
Use CPU for inference on smaller machines
import torch
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
model.to("cpu")
**Missing visualization dependencies**
```bash
pip install -e ".[plotting]"import torch
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
model.to("cpu")
**缺少可视化依赖**
```bash
pip install -e ".[plotting]"Installs pyvista and nilearn backends
Installs pyvista and nilearn backends
**Slurm training not submitting**
```bash
**Slurm训练未提交**
```bashCheck env vars are set
Check env vars are set
echo $DATAPATH $SAVEPATH $SLURM_PARTITION
echo $DATAPATH $SAVEPATH $SLURM_PARTITION
Or edit tribev2/grids/defaults.py directly
Or edit tribev2/grids/defaults.py directly
**Video without audio track causes error**
```python
**无音轨视频导致错误**
```pythonProvide audio separately or use text-only mode
Provide audio separately or use text-only mode
df = model.get_events_dataframe(
video_path="silent_video.mp4",
audio_path="separate_audio.wav",
)
undefineddf = model.get_events_dataframe(
video_path="silent_video.mp4",
audio_path="separate_audio.wav",
)
undefinedCitation
引用
bibtex
@article{dAscoli2026TribeV2,
title={A foundation model of vision, audition, and language for in-silico neuroscience},
author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
year={2026}
}bibtex
@article{dAscoli2026TribeV2,
title={A foundation model of vision, audition, and language for in-silico neuroscience},
author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
year={2026}
}