Loading...
Loading...
Found 293 Skills
Expert skill for using HY-World 2.0, Tencent's multi-modal world model for reconstructing, generating, and simulating 3D worlds from text, images, and video.
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understanding".
Master metaphysics and ontology - the study of being, existence, and fundamental reality. Use for: existence, being, substance, identity, causation, modality, time, universals. Triggers: 'ontological', 'metaphysical', 'what exists', 'substance', 'essence', 'existence', 'being', 'identity', 'persistence', 'causation', 'modality', 'possible worlds', 'universals', 'particulars', 'properties', 'abstract objects', 'time', 'change', 'composition'.
Expert prompt engineering for Seedance 2.0. Use when the user wants to generate a video with multimodal assets (images, videos, audio) and needs the best possible prompt.
Provider-agnostic, type-safe AI SDK for streaming, tool calling, structured output, and multimodal content.
Use when connecting to a self-hosted memory backend, searching, storing, or managing memories, importing connection tokens, or troubleshooting retrieval issues. Use this skill whenever the user mentions memory search, RAG retrieval, embedding, memory storage, multimodal document upload, knowledge queries, or wants to connect to a memory service, even if they do not explicitly say "transcendence-memory".
Provides image recognition capabilities for non-multimodal models (such as pure text models like deepseek-v4-pro, GLM-5.1, mimo-v2.5-pro, etc.). This skill is automatically triggered when the main model cannot recognize images, when users send screenshots/design drafts/UI screenshots for analysis, or when users say 'Look at this image', 'Analyze this screenshot', 'What's wrong with this image'. It also applies to any scenario where users paste images but the current model does not support image input. Supports simultaneous recognition of multiple images, with primary-backup fallback achieved by configuring multiple image recognition models. It can also be manually triggered using the commands /skill:vision-support or /vision. Iron Rule: The models configured for this skill are only used for image content recognition and will never participate in main logical reasoning. Note: If the current model is itself a multimodal model (such as Claude Sonnet 4, GPT-4o, Gemini, etc. that can directly recognize images), do not use this skill; let the main model recognize directly.
Use the mm CLI to index, explore, query, and extract content from multimodal directories containing images, videos, PDFs, code, and other files. Triggers: exploring a directory's contents, listing/finding files by type or size, extracting text from PDFs, getting image metadata, searching across file contents, counting tokens, viewing directory trees, extracting PDF page mosaics, video keyframe extraction, 'what files are in this folder', 'find all images', 'show me the PDFs', 'how much storage do videos use', 'extract text from this PDF', 'search documents for X', 'analyze this directory', 'how many tokens', 'show the tree'.
Execute Google Gemini CLI for large-context code analysis, multimodal reasoning, and repository-scale reviews. Also use for delegating tasks requiring 1M token context windows or Gemini-specific capabilities.
Official skill for integrating Firebase AI Logic (Gemini API) into web applications. Covers setup, multimodal inference, structured output, and security.
Multi-modal ComfyUI skill that generates retro late-1990s / early-2000s anime-style characters, movie frames, images, sounds, and voices using a single, consistent Midjourney-style prompt template.
Develop the Stitch SDK. Covers the generation pipeline, dual modality (agent vs SDK), error handling, and Traffic Light (Red-Green-Yellow) implementation workflow. Use when adding features, fixing bugs, or understanding the architecture.