Loading...
Loading...
Found 69 Skills
Generate and edit videos with Alibaba HappyHorse 1.0 models via inference.sh CLI. Models: HappyHorse T2V, I2V, R2V, Video Edit. Capabilities: text-to-video, image-to-video, reference-to-video, video editing with natural language, character preservation, 720P/1080P, up to 15 seconds. Use for: physically realistic video, video editing, character-consistent content, product demos, social media. Triggers: happyhorse, happy horse, alibaba video, happyhorse 1.0, dashscope video, alibaba happyhorse, video editing ai, ai video editor
Create and edit videos using Google's Veo 2 and Veo 3 models. Supports Text-to-Video, Image-to-Video, Inpainting, and Advanced Controls.
[QianWen] Generate videos using Wan models. Supports text-to-video, image-to-video, first+last frame, reference-based role-play, and video editing (VACE). TRIGGER when: user wants to create, generate, or edit video content, mentions video generation/animation/video clips/Wan models, or explicitly invokes this skill by name (e.g. use qianwen-video-generation). DO NOT TRIGGER when: user wants to generate images (use qianwen-image-generation), understand/analyze existing videos (use qianwen-vision), text-only tasks.
Control video generation requests before execution. Use this when the user asks for a simple clip, storyboard video, UGC video, podcast clip, reference video, talking-head, image-to-video, text-to-video, or research-handoff video and the skill must classify the request before handing it to video-request-architect and a runner such as seedance-submitter or video-batch-runner.
Comprehensive creation via Xiaoyunque's AI capabilities, supporting generation and editing of images/videos. Covered scenarios include: Generation (text-to-image, text-to-video, image-to-video, animation creation, draw xxx, create xxx clip), Editing & Revision (replace xxx with yyy, remove xxx, add xxx, change to xxx, adjust xxx, local modification, lens adjustment), Style Transfer (style migration, repainting, style change), video continuation, video/TVC/promotional video replication, short drama/short comic drama generation, music MV creation, product advertisement/demo video production, storyboard design, educational video/short video production. This skill should also be triggered when users mention Xiaoyunque, xyq, uploading reference images/videos, or checking generation progress. Key Judgment: This skill must be triggered whenever the user's request involves AI video creation, generation, editing, or revision, regardless of the wording (e.g., "draw a cat", "make a poster", "create a video", "help me revise this video", "help me replicate this video", "make an MV with this song", "generate a short drama with one sentence")
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Use this skill for AI video generation. Triggers include: "generate video", "create video", "make video", "animate", "text to video", "video from image", "video of", "animate image", "bring to life", "make it move", "add motion", "video with audio", "video with dialogue" Supports text-to-video, image-to-video, video with dialogue/audio using Google Veo 3.1 (default) or OpenAI Sora.
Write better prompts for Kling 3.0 AI video generation. Use when the user wants to create, write, improve, or refine prompts — text-to-video, image-to-video, keyframes, multi-shot sequences, or dialogue scenes.
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Atlas Cloud API integration skill — quickly call 300+ AI image generation, video generation, and LLM models through a unified API. Use this skill when the user needs to integrate AI image generation (e.g., Flux, Seedream, DALL-E), AI video generation (e.g., Kling, Sora, Seedance), or call LLM APIs (OpenAI-compatible format) into their project. Applicable scenarios include: generating images, generating videos, calling large language models, using Atlas Cloud API, configuring ATLASCLOUD_API_KEY, querying available model lists, searching models by keyword, uploading local images/media files, one-step quick generation, image-to-video, text-to-image, text-to-video, AI content creation tool integration. Even if the user doesn't explicitly mention Atlas Cloud, this skill should be considered whenever AI media generation API integration development is involved.
Choose the right fal.ai endpoint for a given task. Modality-organized catalog of production endpoint defaults, text-to-image, image-to-image, text-to-video, image-to-video, and more. Use when the user has not named a specific model, or asks "which model for X", "best endpoint for Y", "what should I use for Z".
Expert guidance for Google Veo 3.1 video generation. Use when the user wants to (1) create text-to-video or image-to-video prompts, (2) optimize for cinematic quality and native audio syncing, (3) maintain character consistency via reference images, (4) structure multi-shot sequences with timestamp prompting, (5) use First/Last Frame interpolation, (6) select between standard and fast generation modes, or (7) troubleshoot physics, motion, or audio issues in generated video.