Loading...
Loading...
Found 28 Skills
Generate cinematic short-form video with ByteDance Seedance 2.0 Pro on RunComfy. Documents Seedance 2.0 Pro's strengths (multi-modal references — up to 9 images, 3 videos, 3 audio — synchronized in-pass audio with natural lip-sync, cinematic motion refinement), the 4–15s duration schema, and when to route to HappyHorse 1.0 / Wan 2.7 / Kling instead. Calls `runcomfy run bytedance/seedance-v2/pro` through the local RunComfy CLI. Triggers on "seedance", "seedance 2", "seedance v2", "seedance pro", "bytedance video", or any explicit ask to generate video with this model.
Use this skill when building applications with Gemini models, Gemini API, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or needing current model specifications. Covers SDK usage (google-genai for Python, @google/genai for JavaScript/TypeScript), model selection, and API capabilities.
Expert guidance for working with Hugging Face Transformers library for NLP, computer vision, and multimodal AI tasks.
Build production-ready LLM applications, advanced RAG systems, and intelligent agents. Implements vector search, multimodal AI, agent orchestration, and enterprise AI integrations. Use PROACTIVELY for LLM features, chatbots, AI agents, or AI-powered applications.
Integrate Gemini API with @google/genai SDK (NOT deprecated @google/generative-ai). Text generation, multimodal (images/video/audio/PDFs), function calling, thinking mode, streaming. 1M input tokens. Prevents 14 documented errors. Use when: Gemini integration, multimodal AI, reasoning with thinking mode. Troubleshoot: SDK deprecation, model not found, context window, function calling errors, streaming corruption, safety settings, rate limits.
Generate images, video, speech, and transcribe audio using Aliyun Bailian models.
Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.
This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.
Claude AI cookbooks - code examples, tutorials, and best practices for using Claude API. Use when learning Claude API integration, building Claude-powered applications, or exploring Claude capabilities.
AnyCap CLI -- capability runtime for AI agents. One CLI for image generation, image read, video analysis, audio analysis, music composition, text-to-speech, web search, web crawling, file download, static site hosting, and cloud file storage. Use when the agent needs to generate images, analyze images, video, or audio, produce audio/music, search or crawl the web, download remote files, deploy static sites, or store and share files. Also use when the agent needs to authenticate with AnyCap (login, API key, credentials), or when encountering errors from AnyCap to submit feedback via 'anycap feedback'. Trigger on mentions of AnyCap, multimodal capabilities, AI-generated media, page hosting, or drive storage.
On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.
This skill should be used when the user asks to "generate video prompts", "create Seedance prompts", "write video descriptions", mentions "Seedance", "seedance", "Jimeng", "Jimeng Platform", "video prompts", "video generation", "AI video", "short drama", "advertising video", "video extension", or discusses video prompt engineering, AI video generation, or Seedance 2.0 workflows.