Loading...
Loading...
Found 2 Skills
Use TRIBE v2, Meta's multimodal foundation model for predicting fMRI brain responses to video, audio, and text stimuli
Provides image recognition capabilities for non-multimodal models (such as pure text models like deepseek-v4-pro, GLM-5.1, mimo-v2.5-pro, etc.). This skill is automatically triggered when the main model cannot recognize images, when users send screenshots/design drafts/UI screenshots for analysis, or when users say 'Look at this image', 'Analyze this screenshot', 'What's wrong with this image'. It also applies to any scenario where users paste images but the current model does not support image input. Supports simultaneous recognition of multiple images, with primary-backup fallback achieved by configuring multiple image recognition models. It can also be manually triggered using the commands /skill:vision-support or /vision. Iron Rule: The models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. Note: If the current model is itself a multimodal model (such as Claude Sonnet 4, GPT-4o, Gemini, etc. that can directly recognize images), do not use this skill; let the main model recognize directly.