Loading...
Loading...
Found 55 Skills
AI video generation with LTX-2.3 22B — text-to-video, image-to-video clips for video production. Use when generating video clips, animating images, creating b-roll, animated backgrounds, or motion content. Triggers include video generation, animate image, b-roll, motion, video clip, text-to-video, image-to-video.
Use when generating videos with Alibaba Cloud Model Studio PixVerse models (`pixverse/pixverse-v5.6-t2v`, `pixverse/pixverse-v5.6-it2v`, `pixverse/pixverse-v5.6-kf2v`, `pixverse/pixverse-v5.6-r2v`). Use when building non-Wan text-to-video, first-frame image-to-video, keyframe-to-video, or multi-image reference-to-video workflows on Model Studio.
Generate videos from text and image prompts via Together AI. 15+ models including Veo 2/3, Sora 2, Kling 2.1, Hailuo 02, Seedance, PixVerse, Vidu. Supports text-to-video, image-to-video, keyframe control, and reference images. Use when users want to generate videos, create video content, animate images, or work with any video generation task.
[QwenCloud] Generate videos using Wan models. Supports text-to-video, image-to-video, first+last frame, reference-based role-play, and video editing (VACE). TRIGGER when: user wants to create, generate, or edit video content, mentions video generation/animation/video clips/Wan models, or explicitly invokes this skill by name (e.g. use qwencloud-video-generation). DO NOT TRIGGER when: user wants to generate images (use qwencloud-image-generation), understand/analyze existing videos (use qwencloud-vision), text-only tasks.
Generate and edit videos with Alibaba HappyHorse 1.0 models via inference.sh CLI. Models: HappyHorse T2V, I2V, R2V, Video Edit. Capabilities: text-to-video, image-to-video, reference-to-video, video editing with natural language, character preservation, 720P/1080P, up to 15 seconds. Use for: physically realistic video, video editing, character-consistent content, product demos, social media. Triggers: happyhorse, happy horse, alibaba video, happyhorse 1.0, dashscope video, alibaba happyhorse, video editing ai, ai video editor
Use this skill for AI video generation. Triggers include: "generate video", "create video", "make video", "animate", "text to video", "video from image", "video of", "animate image", "bring to life", "make it move", "add motion", "video with audio", "video with dialogue" Supports text-to-video, image-to-video, video with dialogue/audio using Google Veo 3.1 (default) or OpenAI Sora.
Create and edit videos using Google's Veo 2 and Veo 3 models. Supports Text-to-Video, Image-to-Video, Inpainting, and Advanced Controls.
MiniMax multimodal model skill — use MiniMax Multi-Modal models for speech, music, video, and image. Create voice, music, video, and images with MiniMax AI: TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract). Use when the user mentions MiniMax, multimodal generation, or wants speech/music/video/image AI, MiniMax APIs, or FFmpeg workflows alongside MiniMax outputs.
Alicloud OSS AI Content Awareness Skill. Use for enabling and querying OSS semantic search with AI-powered content understanding. Triggers: "OSS AI Content Awareness", "OSS semantic search", "OSS vector search", "search by text", "text-to-image search", "text-to-video search", "OSS MetaQuery", "OSS data index", "OSS AI内容感知", "OSS语义检索", "OSS向量检索", "以文搜图", "以文搜视频", "OSS数据索引"
Atlas Cloud API integration skill — quickly call 300+ AI image generation, video generation, and LLM models through a unified API. Use this skill when the user needs to integrate AI image generation (e.g., Flux, Seedream, DALL-E), AI video generation (e.g., Kling, Sora, Seedance), or call LLM APIs (OpenAI-compatible format) into their project. Applicable scenarios include: generating images, generating videos, calling large language models, using Atlas Cloud API, configuring ATLASCLOUD_API_KEY, querying available model lists, searching models by keyword, uploading local images/media files, one-step quick generation, image-to-video, text-to-image, text-to-video, AI content creation tool integration. Even if the user doesn't explicitly mention Atlas Cloud, this skill should be considered whenever AI media generation API integration development is involved.
Comprehensive creation via Xiaoyunque's AI capabilities, supporting generation and editing of images/videos. Covered scenarios include: Generation (text-to-image, text-to-video, image-to-video, animation creation, draw xxx, create xxx clip), Editing & Revision (replace xxx with yyy, remove xxx, add xxx, change to xxx, adjust xxx, local modification, lens adjustment), Style Transfer (style migration, repainting, style change), video continuation, video/TVC/promotional video replication, short drama/short comic drama generation, music MV creation, product advertisement/demo video production, storyboard design, educational video/short video production. This skill should also be triggered when users mention Xiaoyunque, xyq, uploading reference images/videos, or checking generation progress. Key Judgment: This skill must be triggered whenever the user's request involves AI video creation, generation, editing, or revision, regardless of the wording (e.g., "draw a cat", "make a poster", "create a video", "help me revise this video", "help me replicate this video", "make an MV with this song", "generate a short drama with one sentence")
Use Alibaba Cloud DashScope API and LingMou to generate AI video and speech. Seven capabilities — (1) LivePortrait talking-head (image + audio → video, two-step), (2) EMO talking-head, (3) AA/AnimateAnyone full-body animation (three-step), (4) T2I text-to-image (Wan 2.x, default wan2.2-t2i-flash), (5) I2V image-to-video (Wan 2.x, default wan2.7-i2v-flash, supports T2I→I2V pipeline), (6) Qwen TTS (auto model/voice by scene, default qwen3-tts-vd-realtime-2026-01-15), (7) LingMou digital-human template video with random template, public-template copy, and script confirmation. Trigger when the user needs talking-head, portrait, full-body animation, text-to-image, text-to-video, or speech synthesis.