Search Results: multimodal-ai

Found 28 Skills

AI & Machine Learningagentspace-so/runcomfy-ag...

seedance-v2

Generate cinematic short-form video with ByteDance Seedance 2.0 Pro on RunComfy. Documents Seedance 2.0 Pro's strengths (multi-modal references — up to 9 images, 3 videos, 3 audio — synchronized in-pass audio with natural lip-sync, cinematic motion refinement), the 4–15s duration schema, and when to route to HappyHorse 1.0 / Wan 2.7 / Kling instead. Calls `runcomfy run bytedance/seedance-v2/pro` through the local RunComfy CLI. Triggers on "seedance", "seedance 2", "seedance v2", "seedance pro", "bytedance video", or any explicit ask to generate video with this model.

🇺🇸|EnglishTranslated

20.6k

AI & Machine Learninggoogle-gemini/gemini-skil...

gemini-api-dev

Use this skill when building applications with Gemini models, Gemini API, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or needing current model specifications. Covers SDK usage (google-genai for Python, @google/genai for JavaScript/TypeScript), model selection, and API capabilities.

🇺🇸|EnglishTranslated

11.5k

AI & Machine Learningmindrally/skills

transformers-huggingface

Expert guidance for working with Hugging Face Transformers library for NLP, computer vision, and multimodal AI tasks.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

ai-engineer

Build production-ready LLM applications, advanced RAG systems, and intelligent agents. Implements vector search, multimodal AI, agent orchestration, and enterprise AI integrations. Use PROACTIVELY for LLM features, chatbots, AI agents, or AI-powered applications.

🇺🇸|EnglishTranslated

AI & Machine Learningjezweb/claude-skills

google-gemini-api

Integrate Gemini API with @google/genai SDK (NOT deprecated @google/generative-ai). Text generation, multimodal (images/video/audio/PDFs), function calling, thinking mode, streaming. 1M input tokens. Prevents 14 documented errors. Use when: Gemini integration, multimodal AI, reasoning with thinking mode. Troubleshoot: SDK deprecation, model not found, context window, function calling errors, streaming corruption, safety settings, rate limits.

🇺🇸|EnglishTranslated

15 scripts/Attention

AI & Machine Learningcclank/openclaw_provider_...

bailian-multimodal-skills

Generate images, video, speech, and transcribe audio using Aliyun Bailian models.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningdavila7/claude-code-templ...

blip-2-vision-language

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

transformers

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

🇺🇸|EnglishTranslated

AI & Machine Learning2025emma/vibe-coding-cn

claude-cookbooks

Claude AI cookbooks - code examples, tutorials, and best practices for using Claude API. Use when learning Claude API integration, building Claude-powered applications, or exploring Claude capabilities.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learninganycap-ai/anycap

anycap-cli

AnyCap CLI -- capability runtime for AI agents. One CLI for image generation, image read, video analysis, audio analysis, music composition, text-to-speech, web search, web crawling, file download, static site hosting, and cloud file storage. Use when the agent needs to generate images, analyze images, video, or audio, produce audio/music, search or crawl the web, download remote files, deploy static sites, or store and share files. Also use when the agent needs to authenticate with AnyCap (login, API key, credentials), or when encountering errors from AnyCap to submit feedback via 'anycap feedback'. Trigger on mentions of AnyCap, multimodal capabilities, AI-generated media, page hosting, or drive storage.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

parlor-on-device-ai

On-device, real-time multimodal AI voice and vision assistant powered by Gemma 4 E2B and Kokoro TTS, running entirely locally via FastAPI WebSocket server.

🇺🇸|EnglishTranslated

AI & Machine Learningsongguoxs/seedance-prompt...

seedance

This skill should be used when the user asks to "generate video prompts", "create Seedance prompts", "write video descriptions", mentions "Seedance", "seedance", "Jimeng", "Jimeng Platform", "video prompts", "video generation", "AI video", "short drama", "advertising video", "video extension", or discusses video prompt engineering, AI video generation, or Seedance 2.0 workflows.

🇨🇳|ChineseTranslated