Total 50,708 skills, AI & Machine Learning has 8496 skills
Showing 12 of 8496 skills
Use when generating talking, singing, or presentation videos from a single character image and audio with Alibaba Cloud Model Studio digital-human model `wan2.2-s2v`. Use when creating narrated avatar videos, singing portraits, or broadcast-style talking-head clips.
Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.
Use when generating videos with Alibaba Cloud Model Studio PixVerse models (`pixverse/pixverse-v5.6-t2v`, `pixverse/pixverse-v5.6-it2v`, `pixverse/pixverse-v5.6-kf2v`, `pixverse/pixverse-v5.6-r2v`). Use when building non-Wan text-to-video, first-frame image-to-video, keyframe-to-video, or multi-image reference-to-video workflows on Model Studio.
Design Pydantic models and LLM prompt templates for structured extraction pipelines. Use when creating, editing, or reviewing Pydantic models that serve as LLM output schemas, or when writing prompt templates that pair with those models. Trigger: "pydantic model", "structured output", "extraction schema", "LLM output model", "schema design".
Vercel AI SDK expert guidance. Use when building AI-powered features — chat interfaces, text generation, structured output, tool calling, agents, MCP integration, streaming, embeddings, reranking, image generation, or working with any LLM provider.
Persistent memory and context management for AI agents using OpenContext. Keep context across sessions/repos/dates, store conclusions, and provide document search workflows.
Use when you need multi-agent orchestration for OpenAI Codex CLI. Triggers on: omx, $plan, $ralph, $team, $autopilot, $deep-interview. v0.11.10 — 30+ agents, 35+ workflow skills, tmux team runtime, sparkshell, explore, ralplan.
Generate voice messages using local Qwen3-TTS (offline, Apple Silicon). Convert text to speech with customizable voices, emotions, and speed. Use when user asks for voice reply, audio, or TTS.
AI image generation for paid ad creatives. Reads campaign-brief.md and brand-profile.json to produce platform-sized ad images using Gemini (default) or a configured provider. Requires GOOGLE_API_KEY or ADS_IMAGE_PROVIDER + matching key. Triggers on: "generate ads", "create images", "make ad creatives", "generate visuals", "create ad images", "generate campaign images", "make the images", "generate from brief".
Design and enforce AI-friendly verification for a GRACE project. Use when modules need stronger automated tests, traceable logs, execution-trace checks, or verification that is robust enough for autonomous and multi-agent workflows.
Use this skill when you are not sure about a fact, have outdated knowledge, or the question is contested. Explicitly communicate the level of confidence instead of asserting uncertain things as fact.
Research latest ComfyUI models, techniques, and community discoveries. Monitors YouTube channels, GitHub repos, and HuggingFace. Updates reference files with timestamped findings and flags stale information. Invoke with /research comfyui or automatically at session start for staleness checks.