TRIBE v2 Brain Encoding Model

TRIBE v2 大脑编码模型

Skill by ara.so — Daily 2026 Skills collection

TRIBE v2 is Meta's multimodal foundation model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines LLaMA 3.2 (text), V-JEPA2 (video), and Wav2Vec-BERT (audio) encoders into a unified Transformer architecture that maps multimodal representations onto the cortical surface (fsaverage5, ~20k vertices).

技能来自 ara.so —— 2026每日技能合集

TRIBE v2是Meta推出的多模态基础模型，可预测自然刺激（视频、音频、文本）下的fMRI大脑响应。它将LLaMA 3.2（文本）、V-JEPA2（视频）和Wav2Vec-BERT（音频）编码器整合到统一的Transformer架构中，将多模态表征映射到大脑皮层表面（fsaverage5，约20000个顶点）。

Installation

安装

bash

undefined

bash

undefined

Inference only

pip install -e .

With brain visualization (PyVista & Nilearn)

pip install -e ".[plotting]"

Full training dependencies (PyTorch Lightning, W&B, etc.)

pip install -e ".[training]"

undefined

pip install -e ".[training]"

undefined

Quick Start — Inference

快速开始——推理

Load pretrained model and predict from video

加载预训练模型并从视频预测

python

from tribev2 import TribeModel

python

from tribev2 import TribeModel

Load from HuggingFace (downloads weights to cache)

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

Build events dataframe from a video file

df = model.get_events_dataframe(video_path="path/to/video.mp4")

Predict brain responses

preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices) on fsaverage5

undefined

preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices) on fsaverage5

undefined

Multimodal input — video + audio + text

多模态输入——视频+音频+文本

python

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

python

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

All modalities together (text is auto-converted to speech and transcribed)

df = model.get_events_dataframe( video_path="path/to/video.mp4", audio_path="path/to/audio.wav", # optional, overrides video audio text_path="path/to/script.txt", # optional, auto-timed )

preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices)

undefined

df = model.get_events_dataframe( video_path="path/to/video.mp4", audio_path="path/to/audio.wav", # optional, overrides video audio text_path="path/to/script.txt", # optional, auto-timed )

preds, segments = model.predict(events=df) print(preds.shape) # (n_timesteps, n_vertices)

undefined

Text-only prediction

仅文本预测

python

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)

python

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

df = model.get_events_dataframe(text_path="path/to/narration.txt")
preds, segments = model.predict(events=df)

Brain Visualization

大脑可视化

python

from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)

python

from tribev2 import TribeModel
from tribev2.plotting import plot_brain_surface

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)

Plot a single timepoint on the cortical surface

plot_brain_surface(preds[0], backend="nilearn") # or backend="pyvista"

undefined

plot_brain_surface(preds[0], backend="nilearn") # or backend="pyvista"

undefined

Training a Model from Scratch

从头开始训练模型

1. Set environment variables

1. 设置环境变量

bash

export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"

bash

export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_slurm_partition"

2. Authenticate with HuggingFace (required for LLaMA 3.2)

2. 登录HuggingFace（LLaMA 3.2必需）

bash

huggingface-cli login

bash

huggingface-cli login

Paste a HuggingFace read token when prompted

Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B

undefined

undefined

3. Local test run

3. 本地测试运行

bash

python -m tribev2.grids.test_run

bash

python -m tribev2.grids.test_run

4. Full grid search on Slurm

4. 在Slurm上进行完整网格搜索

bash

undefined

bash

undefined

Cortical surface model

python -m tribev2.grids.run_cortical

Subcortical regions

python -m tribev2.grids.run_subcortical

undefined

python -m tribev2.grids.run_subcortical

undefined

Key API — TribeModel

核心API — TribeModel

python

from tribev2 import TribeModel

python

from tribev2 import TribeModel

Load pretrained weights

model = TribeModel.from_pretrained( "facebook/tribev2", cache_folder="./cache" # local cache for HuggingFace weights )

Build events dataframe (word-level timings, chunking, etc.)

df = model.get_events_dataframe( video_path=None, # str path to .mp4 audio_path=None, # str path to .wav text_path=None, # str path to .txt )

Run prediction

preds, segments = model.predict(events=df)

preds: np.ndarray of shape (n_timesteps, n_vertices)

segments: list of segment metadata dicts

undefined

undefined

Project Structure

项目结构

tribev2/
├── main.py              # Experiment pipeline: Data, TribeExperiment
├── model.py             # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py         # PyTorch Lightning training module
├── demo_utils.py        # TribeModel and inference helpers
├── eventstransforms.py  # Event transforms (word extraction, chunking)
├── utils.py             # Multi-study loading, splitting, subject weighting
├── utils_fmri.py        # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│   ├── defaults.py      # Full default experiment configuration
│   └── test_run.py      # Quick local test entry point
├── plotting/            # Brain visualization backends
└── studies/             # Dataset definitions (Algonauts2025, Lahner2024, …)

tribev2/
├── main.py              # Experiment pipeline: Data, TribeExperiment
├── model.py             # FmriEncoder: Transformer multimodal→fMRI model
├── pl_module.py         # PyTorch Lightning training module
├── demo_utils.py        # TribeModel and inference helpers
├── eventstransforms.py  # Event transforms (word extraction, chunking)
├── utils.py             # Multi-study loading, splitting, subject weighting
├── utils_fmri.py        # Surface projection (MNI / fsaverage) and ROI analysis
├── grids/
│   ├── defaults.py      # Full default experiment configuration
│   └── test_run.py      # Quick local test entry point
├── plotting/            # Brain visualization backends
└── studies/             # Dataset definitions (Algonauts2025, Lahner2024, …)

Configuration — Defaults

配置——默认设置

Edit

tribev2/grids/defaults.py

or set environment variables:

python

undefined

编辑

tribev2/grids/defaults.py

或设置环境变量：

python

undefined

tribev2/grids/defaults.py (key fields)

{ "datapath": "/path/to/studies", # override with DATAPATH env var "savepath": "/path/to/output", # override with SAVEPATH env var "slurm_partition": "learnfair", # override with SLURM_PARTITION env var "model": "FmriEncoder", "modalities": ["video", "audio", "text"], "surface": "fsaverage5", # ~20k vertices }

undefined

{ "datapath": "/path/to/studies", # override with DATAPATH env var "savepath": "/path/to/output", # override with SAVEPATH env var "slurm_partition": "learnfair", # override with SLURM_PARTITION env var "model": "FmriEncoder", "modalities": ["video", "audio", "text"], "surface": "fsaverage5", # ~20k vertices }

undefined

Custom Experiment with PyTorch Lightning

使用PyTorch Lightning进行自定义实验

python

from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as pl

python

from tribev2.main import Data, TribeExperiment
from tribev2.pl_module import TribePLModule
import pytorch_lightning as pl

Configure experiment

experiment = TribeExperiment( datapath="/path/to/studies", savepath="/path/to/output", modalities=["video", "audio", "text"], )

data = Data(experiment) module = TribePLModule(experiment)

trainer = pl.Trainer( max_epochs=50, accelerator="gpu", devices=4, ) trainer.fit(module, data)

undefined

experiment = TribeExperiment( datapath="/path/to/studies", savepath="/path/to/output", modalities=["video", "audio", "text"], )

data = Data(experiment) module = TribePLModule(experiment)

trainer = pl.Trainer( max_epochs=50, accelerator="gpu", devices=4, ) trainer.fit(module, data)

undefined

Working with fMRI Surfaces

处理fMRI皮层表面数据

python

from tribev2.utils_fmri import project_to_fsaverage, get_roi_mask

python

from tribev2.utils_fmri import project_to_fsaverage, get_roi_mask

Project MNI coordinates to fsaverage5 surface

surface_data = project_to_fsaverage(mni_data, target="fsaverage5")

Get a specific ROI mask (e.g., early visual cortex)

roi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5") v1_responses = preds[:, roi_mask] print(v1_responses.shape) # (n_timesteps, n_v1_vertices)

undefined

roi_mask = get_roi_mask(roi_name="V1", surface="fsaverage5") v1_responses = preds[:, roi_mask] print(v1_responses.shape) # (n_timesteps, n_v1_vertices)

undefined

Common Patterns

常见模式

Batch prediction over multiple videos

批量预测多个视频

python

from tribev2 import TribeModel
import numpy as np

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []

for vp in video_paths:
    df = model.get_events_dataframe(video_path=vp)
    preds, segments = model.predict(events=df)
    all_predictions.append(preds)

python

from tribev2 import TribeModel
import numpy as np

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

video_paths = ["video1.mp4", "video2.mp4", "video3.mp4"]
all_predictions = []

for vp in video_paths:
    df = model.get_events_dataframe(video_path=vp)
    preds, segments = model.predict(events=df)
    all_predictions.append(preds)

all_predictions: list of (n_timesteps_i, n_vertices) arrays

undefined

undefined

Extract predictions for specific brain region

提取特定脑区的预测结果

python

from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)

python

from tribev2 import TribeModel
from tribev2.utils_fmri import get_roi_mask

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="video.mp4")
preds, segments = model.predict(events=df)

Focus on auditory cortex

ac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5") auditory_responses = preds[:, ac_mask] # (n_timesteps, n_ac_vertices)

undefined

ac_mask = get_roi_mask("auditory_cortex", surface="fsaverage5") auditory_responses = preds[:, ac_mask] # (n_timesteps, n_ac_vertices)

undefined

Access segment timing metadata

访问片段时序元数据

python

preds, segments = model.predict(events=df)

for i, seg in enumerate(segments):
    print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
    print(f"  Brain response shape: {preds[i].shape}")

python

preds, segments = model.predict(events=df)

for i, seg in enumerate(segments):
    print(f"Segment {i}: onset={seg['onset']:.2f}s, duration={seg['duration']:.2f}s")
    print(f"  Brain response shape: {preds[i].shape}")

Troubleshooting

故障排除

LLaMA 3.2 access denied

bash

undefined

LLaMA 3.2访问被拒绝

bash

undefined

Must request access at https://huggingface.co/meta-llama/Llama-3.2-3B

Then authenticate:

huggingface-cli login

Use a HuggingFace token with read permissions


**CUDA out of memory during inference**
```python


**推理时CUDA内存不足**
```python

Use CPU for inference on smaller machines

import torch model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache") model.to("cpu")


**Missing visualization dependencies**
```bash
pip install -e ".[plotting]"

import torch model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache") model.to("cpu")


**缺少可视化依赖**
```bash
pip install -e ".[plotting]"

Installs pyvista and nilearn backends


**Slurm training not submitting**
```bash


**Slurm训练未提交**
```bash

Check env vars are set

echo $DATAPATH $SAVEPATH $SLURM_PARTITION

Or edit tribev2/grids/defaults.py directly


**Video without audio track causes error**
```python


**无音轨视频导致错误**
```python

Provide audio separately or use text-only mode

df = model.get_events_dataframe( video_path="silent_video.mp4", audio_path="separate_audio.wav", )

undefined

df = model.get_events_dataframe( video_path="silent_video.mp4", audio_path="separate_audio.wav", )

undefined

Citation

引用

bibtex

@article{dAscoli2026TribeV2,
  title={A foundation model of vision, audition, and language for in-silico neuroscience},
  author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
          and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
  year={2026}
}

bibtex

@article{dAscoli2026TribeV2,
  title={A foundation model of vision, audition, and language for in-silico neuroscience},
  author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon
          and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
  year={2026}
}

tribev2-brain-encoding

Original

Translation

TRIBE v2 Brain Encoding Model

TRIBE v2 大脑编码模型

Installation

安装

Inference only

Inference only

With brain visualization (PyVista & Nilearn)

With brain visualization (PyVista & Nilearn)

Full training dependencies (PyTorch Lightning, W&B, etc.)

Full training dependencies (PyTorch Lightning, W&B, etc.)

Quick Start — Inference

快速开始——推理

Load pretrained model and predict from video

加载预训练模型并从视频预测

Load from HuggingFace (downloads weights to cache)

Load from HuggingFace (downloads weights to cache)

Build events dataframe from a video file

Build events dataframe from a video file

Predict brain responses

Predict brain responses

Multimodal input — video + audio + text

多模态输入——视频+音频+文本

All modalities together (text is auto-converted to speech and transcribed)

All modalities together (text is auto-converted to speech and transcribed)

Text-only prediction

仅文本预测

Brain Visualization

大脑可视化

Plot a single timepoint on the cortical surface

Plot a single timepoint on the cortical surface

Training a Model from Scratch

从头开始训练模型

1. Set environment variables

1. 设置环境变量

2. Authenticate with HuggingFace (required for LLaMA 3.2)

2. 登录HuggingFace（LLaMA 3.2必需）

Paste a HuggingFace read token when prompted

Paste a HuggingFace read token when prompted

Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B

Request access at: https://huggingface.co/meta-llama/Llama-3.2-3B

3. Local test run

3. 本地测试运行

4. Full grid search on Slurm

4. 在Slurm上进行完整网格搜索

Cortical surface model

Cortical surface model

Subcortical regions

Subcortical regions

Key API — TribeModel

核心API — TribeModel

Load pretrained weights

Load pretrained weights

Build events dataframe (word-level timings, chunking, etc.)

Build events dataframe (word-level timings, chunking, etc.)

Run prediction

Run prediction

preds: np.ndarray of shape (n_timesteps, n_vertices)

preds: np.ndarray of shape (n_timesteps, n_vertices)

segments: list of segment metadata dicts

segments: list of segment metadata dicts

Project Structure

项目结构

Configuration — Defaults

配置——默认设置

tribev2/grids/defaults.py (key fields)

tribev2/grids/defaults.py (key fields)

Custom Experiment with PyTorch Lightning

使用PyTorch Lightning进行自定义实验

Configure experiment

Configure experiment

Working with fMRI Surfaces

处理fMRI皮层表面数据

Project MNI coordinates to fsaverage5 surface

Project MNI coordinates to fsaverage5 surface

Get a specific ROI mask (e.g., early visual cortex)

Get a specific ROI mask (e.g., early visual cortex)

Common Patterns