Search Results: modal

Found 293 Skills

AI & Machine Learningaradotso/trending-skills

hy-world-2-0-3d-world-model

Expert skill for using HY-World 2.0, Tencent's multi-modal world model for reconstructing, generating, and simulating 3D worlds from text, images, and video.

🇺🇸|EnglishTranslated

AI & Machine Learningxsir0/xsir-skills

google-gemini-media

Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".

🇺🇸|EnglishTranslated

19 scripts/Checked

Uncategorizedchrislemke/stoffy

metaphysics-ontology

Master metaphysics and ontology - the study of being, existence, and fundamental reality. Use for: existence, being, substance, identity, causation, modality, time, universals. Triggers: 'ontological', 'metaphysical', 'what exists', 'substance', 'essence', 'existence', 'being', 'identity', 'persistence', 'causation', 'modality', 'possible worlds', 'universals', 'particulars', 'properties', 'abstract objects', 'time', 'change', 'composition'.

🇺🇸|EnglishTranslated

AI & Machine Learningpexoai/pexo-skills

seedance-2.0-prompter

Expert prompt engineering for Seedance 2.0. Use when the user wants to generate a video with multimodal assets (images, videos, audio) and needs the best possible prompt.

🇺🇸|EnglishTranslated

AI & Machine Learningtanstack-skills/tanstack-...

tanstack-ai

Provider-agnostic, type-safe AI SDK for streaming, tool calling, structured output, and multimodal content.

🇺🇸|EnglishTranslated

AI & Machine Learningleekkk2/transcendence-mem...

transcendence-memory

Use when connecting to a self-hosted memory backend, searching, storing, or managing memories, importing connection tokens, or troubleshooting retrieval issues. Use this skill whenever the user mentions memory search, RAG retrieval, embedding, memory storage, multimodal document upload, knowledge queries, or wants to connect to a memory service, even if they do not explicitly say "transcendence-memory".

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningpenfick/skills

vision-support

Provides image recognition capabilities for non-multimodal models (such as pure text models like deepseek-v4-pro, GLM-5.1, mimo-v2.5-pro, etc.). This skill is automatically triggered when the main model cannot recognize images, when users send screenshots/design drafts/UI screenshots for analysis, or when users say 'Look at this image', 'Analyze this screenshot', 'What's wrong with this image'. It also applies to any scenario where users paste images but the current model does not support image input. Supports simultaneous recognition of multiple images, with primary-backup fallback achieved by configuring multiple image recognition models. It can also be manually triggered using the commands /skill:vision-support or /vision. Iron Rule: The models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. Note: If the current model is itself a multimodal model (such as Claude Sonnet 4, GPT-4o, Gemini, etc. that can directly recognize images), do not use this skill; let the main model recognize directly.

🇨🇳|ChineseTranslated

5 scripts/Attention

Tools & Utilitiesvlm-run/skills

mm-cli-skill

Use the mm CLI to index, explore, query, and extract content from multimodal directories containing images, videos, PDFs, code, and other files. Triggers: exploring a directory's contents, listing/finding files by type or size, extracting text from PDFs, getting image metadata, searching across file contents, counting tokens, viewing directory trees, extracting PDF page mosaics, video keyframe extraction, 'what files are in this folder', 'find all images', 'show me the PDFs', 'how much storage do videos use', 'extract text from this PDF', 'search documents for X', 'analyze this directory', 'how many tokens', 'show the tree'.

🇺🇸|EnglishTranslated

AI & Machine Learningzpankz/mcp-skillset

gemini

Execute Google Gemini CLI for large-context code analysis, multimodal reasoning, and repository-scale reviews. Also use for delegating tasks requiring 1M token context windows or Gemini-specific capabilities.

🇺🇸|EnglishTranslated

AI & Machine Learningfirebase/skills

firebase-ai-logic

Official skill for integrating Firebase AI Logic (Gemini API) into web applications. Covers setup, multimodal inference, structured output, and security.

🇺🇸|EnglishTranslated

AI & Machine Learningtippyentertainment/skills

comfyui-retro-anime

Multi-modal ComfyUI skill that generates retro late-1990s / early-2000s anime-style characters, movie frames, images, sounds, and voices using a single, consistent Midjourney-style prompt template.

🇺🇸|EnglishTranslated

Backend Developmentgoogle-labs-code/stitch-s...

stitch-sdk-development

Develop the Stitch SDK. Covers the generation pipeline, dual modality (agent vs SDK), error handling, and Traffic Light (Red-Green-Yellow) implementation workflow. Use when adding features, fixing bugs, or understanding the architecture.

🇺🇸|EnglishTranslated