tribev2-brain-encoding

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TRIBE v2 Brain Encoding Model

TRIBE v2 大脑编码模型

Skill by ara.so — Daily 2026 Skills collection
TRIBE v2 is Meta's multimodal foundation model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines LLaMA 3.2 (text), V-JEPA2 (video), and Wav2Vec-BERT (audio) encoders into a unified Transformer architecture that maps multimodal representations onto the cortical surface (fsaverage5, ~20k vertices).
技能来自 ara.so —— 2026每日技能合集
TRIBE v2是Meta推出的多模态基础模型,可预测自然刺激(视频、音频、文本)下的fMRI大脑响应。它将LLaMA 3.2(文本)、V-JEPA2(视频)和Wav2Vec-BERT(音频)编码器整合到统一的Transformer架构中,将多模态表征映射到大脑皮层表面(fsaverage5,约20000个顶点)。

Installation

安装

bash
undefined
bash
undefined

Inference only

Inference only

pip install -e .
pip install -e .

With brain visualization (PyVista & Nilearn)

With brain visualization (PyVista & Nilearn)

pip install -e ".[plotting]"
pip install -e ".[plotting]"

Full training dependencies (PyTorch Lightning, W&B, etc.)

Full training dependencies (PyTorch Lightning, W&B, etc.)

pip install -e ".[training]"
undefined
pip install -e ".[training]"
undefined

Quick Start — Inference

快速开始——推理

Load pretrained model and predict from video

加载预训练模型并从视频预测

python
from tribev2 import TribeModel
python
from tribev2 import TribeModel

Load from HuggingFace (downloads weights to cache)

Load from HuggingFace (downloads weights to cache)

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

Build events dataframe from a video file

Build events dataframe from a video file

df = model.get_events_dataframe(video_path="path/to/video.mp4")
df = model.get_events_dataframe(video_path="path/to/video.mp4")

Predict brain responses

Predict brain responses

preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices) on fsaverage5
undefined
preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices) on fsaverage5
undefined

Multimodal input — video + audio + text

多模态输入——视频+音频+文本

python
from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
python
from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

All modalities together (text is auto-converted to speech and transcribed)

All modalities together (text is auto-converted to speech and transcribed)

df = model.get_events_dataframe( video_path="path/to/video.mp4", audio_path="path/to/audio.wav", # optional, overrides video audio text_path="path/to/script.txt", # optional, auto-timed )
preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices)
undefined
df = model.get_events_dataframe( video_path="path/to/video.mp4", audio_path="path/to/audio.wav", # optional, overrides video audio text_path="path/to/script.txt", # optional, auto-timed )
preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices)
undefined

Text-only prediction

仅文本预测

python
from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)
python
from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)

Brain Visualization

大脑可视化

python
from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)
python
from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)

Plot a single timepoint on the cortical surface

Plot a single timepoint on the cortical surface

plot_brain_surface(preds[0], backend="nilearn") # or backend="pyvista"
undefined
plot_brain_surface(preds[0], backend="nilearn") # or backend="pyvista"
undefined

Training a Model from Scratch

从头开始训练模型

1. Set environment variables

1. 设置环境变量

bash
export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"
bash
export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"

2. Authenticate with HuggingFace (required for LLaMA 3.2)

2. 登录HuggingFace(LLaMA 3.2必需)

bash
huggingface-cli login
bash
huggingface-cli login

Paste a HuggingFace read token when prompted

Paste a HuggingFace read token when prompted

undefined
undefined

3. Local test run

3. 本地测试运行

bash
python -m tribev2.grids.test_run
bash
python -m tribev2.grids.test_run

4. Full grid search on Slurm

4. 在Slurm上进行完整网格搜索

bash
undefined
bash
undefined

Cortical surface model

Cortical surface model

python -m tribev2.grids.run_cortical
python -m tribev2.grids.run_cortical

Subcortical regions

Subcortical regions

python -m tribev2.grids.run_subcortical
undefined
python -m tribev2.grids.run_subcortical
undefined

Key API — TribeModel

核心API — TribeModel

python
from tribev2 import TribeModel
python
from tribev2 import TribeModel

Load pretrained weights

Load pretrained weights

model = TribeModel.from_pretrained( "facebook/tribev2", cache_folder="./cache" # local cache for HuggingFace weights )
model = TribeModel.from_pretrained( "facebook/tribev2", cache_folder="./cache" # local cache for HuggingFace weights )

Build events dataframe (word-level timings, chunking, etc.)

Build events dataframe (word-level timings, chunking, etc.)

df = model.get_events_dataframe( video_path=None, # str path to .mp4 audio_path=None, # str path to .wav text_path=None, # str path to .txt )
df = model.get_events_dataframe( video_path=None, # str path to .mp4 audio_path=None, # str path to .wav text_path=None, # str path to .txt )

Run prediction

Run prediction

preds, segments = model.predict(events=df)
preds, segments = model.predict(events=df)

preds: np.ndarray of shape (n_timesteps, n_vertices)

preds: np.ndarray of shape (n_timesteps, n_vertices)

segments: list of segment metadata dicts

segments: list of segment metadata dicts

undefined
undefined

Project Structure

项目结构

tribev2/
├── main.py              # Experiment pipeline: Data, TribeExperiment
├── model.py             # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py         # PyTorch Lightning training module
├── demo_utils.py        # TribeModel and inference helpers
├── eventstransforms.py  # Event transforms (word extraction, chunking)
├── utils.py             # Multi-study loading, splitting, subject weighting
├── utils_fmri.py        # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│   ├── defaults.py      # Full default experiment configuration
│   └── test_run.py      # Quick local test entry point
├── plotting/            # Brain visualization backends
└── studies/             # Dataset definitions (Algonauts2025, Lahner2024, …)
tribev2/
├── main.py              # Experiment pipeline: Data, TribeExperiment
├── model.py             # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py         # PyTorch Lightning training module
├── demo_utils.py        # TribeModel and inference helpers
├── eventstransforms.py  # Event transforms (word extraction, chunking)
├── utils.py             # Multi-study loading, splitting, subject weighting
├── utils_fmri.py        # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│   ├── defaults.py      # Full default experiment configuration
│   └── test_run.py      # Quick local test entry point
├── plotting/            # Brain visualization backends
└── studies/             # Dataset definitions (Algonauts2025, Lahner2024, …)

Configuration — Defaults

配置——默认设置

Edit
tribev2/grids/defaults.py
or set environment variables:
python
undefined
编辑
tribev2/grids/defaults.py
或设置环境变量:
python
undefined

tribev2/grids/defaults.py (key fields)

tribev2/grids/defaults.py (key fields)

{ "datapath": "/path/to/studies", # override with DATAPATH env var "savepath": "/path/to/output", # override with SAVEPATH env var "slurm_partition": "learnfair", # override with SLURM_PARTITION env var "model": "FmriEncoder", "modalities": ["video", "audio", "text"], "surface": "fsaverage5", # ~20k vertices }
undefined
{ "datapath": "/path/to/studies", # override with DATAPATH env var "savepath": "/path/to/output", # override with SAVEPATH env var "slurm_partition": "learnfair", # override with SLURM_PARTITION env var "model": "FmriEncoder", "modalities": ["video", "audio", "text"], "surface": "fsaverage5", # ~20k vertices }
undefined

Custom Experiment with PyTorch Lightning

使用PyTorch Lightning进行自定义实验

python
from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as pl
python
from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as pl

Configure experiment

Configure experiment

experiment = TribeExperiment( datapath="/path/to/studies", savepath="/path/to/output", modalities=["video", "audio", "text"], )
data = Data(experiment) module = TribePLModule(experiment)
trainer = pl.Trainer( max_epochs=50, accelerator="gpu", devices=4, ) trainer.fit(module, data)
undefined
experiment = TribeExperiment( datapath="/path/to/studies", savepath="/path/to/output", modalities=["video", "audio", "text"], )
data = Data(experiment) module = TribePLModule(experiment)
trainer = pl.Trainer( max_epochs=50, accelerator="gpu", devices=4, ) trainer.fit(module, data)
undefined

Working with fMRI Surfaces

处理fMRI皮层表面数据

python
from tribev2.utils_fmri import project_to_fsaverage, get_roi_mask
python
from tribev2.utils_fmri import project_to_fsaverage, get_roi_mask

Project MNI coordinates to fsaverage5 surface

Project MNI coordinates to fsaverage5 surface

surface_data = project_to_fsaverage(mni_data, target="fsaverage5")
surface_data = project_to_fsaverage(mni_data, target="fsaverage5")

Get a specific ROI mask (e.g., early visual cortex)

Get a specific ROI mask (e.g., early visual cortex)

roi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5") v1_responses = preds[:, roi_mask] print(v1_responses.shape) # (n_timesteps, n_v1_vertices)
undefined
roi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5") v1_responses = preds[:, roi_mask] print(v1_responses.shape) # (n_timesteps, n_v1_vertices)
undefined

Common Patterns

常见模式

Batch prediction over multiple videos

批量预测多个视频

python
from tribev2 import TribeModel
import numpy as np

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []

for vp in video_paths:
    df = model.get_events_dataframe(video_path=vp)
    preds, segments = model.predict(events=df)
    all_predictions.append(preds)
python
from tribev2 import TribeModel
import numpy as np

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []

for vp in video_paths:
    df = model.get_events_dataframe(video_path=vp)
    preds, segments = model.predict(events=df)
    all_predictions.append(preds)

all_predictions: list of (n_timesteps_i, n_vertices) arrays

all_predictions: list of (n_timesteps_i, n_vertices) arrays

undefined
undefined

Extract predictions for specific brain region

提取特定脑区的预测结果

python
from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)
python
from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)

Focus on auditory cortex

Focus on auditory cortex

ac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5") auditory_responses = preds[:, ac_mask] # (n_timesteps, n_ac_vertices)
undefined
ac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5") auditory_responses = preds[:, ac_mask] # (n_timesteps, n_ac_vertices)
undefined

Access segment timing metadata

访问片段时序元数据

python
preds, segments = model.predict(events=df)

for i, seg in enumerate(segments):
    print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
    print(f"  Brain response shape: {preds[i].shape}")
python
preds, segments = model.predict(events=df)

for i, seg in enumerate(segments):
    print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
    print(f"  Brain response shape: {preds[i].shape}")

Troubleshooting

故障排除

LLaMA 3.2 access denied
bash
undefined
LLaMA 3.2访问被拒绝
bash
undefined

Then authenticate:

Then authenticate:

huggingface-cli login
huggingface-cli login

Use a HuggingFace token with read permissions

Use a HuggingFace token with read permissions


**CUDA out of memory during inference**
```python

**推理时CUDA内存不足**
```python

Use CPU for inference on smaller machines

Use CPU for inference on smaller machines

import torch model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache") model.to("cpu")

**Missing visualization dependencies**
```bash
pip install -e ".[plotting]"
import torch model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache") model.to("cpu")

**缺少可视化依赖**
```bash
pip install -e ".[plotting]"

Installs pyvista and nilearn backends

Installs pyvista and nilearn backends


**Slurm training not submitting**
```bash

**Slurm训练未提交**
```bash

Check env vars are set

Check env vars are set

echo $DATAPATH $SAVEPATH $SLURM_PARTITION
echo $DATAPATH $SAVEPATH $SLURM_PARTITION

Or edit tribev2/grids/defaults.py directly

Or edit tribev2/grids/defaults.py directly


**Video without audio track causes error**
```python

**无音轨视频导致错误**
```python

Provide audio separately or use text-only mode

Provide audio separately or use text-only mode

df = model.get_events_dataframe( video_path="silent_video.mp4", audio_path="separate_audio.wav", )
undefined
df = model.get_events_dataframe( video_path="silent_video.mp4", audio_path="separate_audio.wav", )
undefined

Citation

引用

bibtex
@article{dAscoli2026TribeV2,
  title={A foundation model of vision, audition, and language for in-silico neuroscience},
  author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
          and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
  year={2026}
}
bibtex
@article{dAscoli2026TribeV2,
  title={A foundation model of vision, audition, and language for in-silico neuroscience},
  author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
          and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
  year={2026}
}

Resources

资源