Loading...
Loading...
Found 1,612 Skills
ASR (Automatic Speech Recognition) — enhanced speech-to-text built on Doubao large model, with audio preprocessing, denoising, and extended analysis capabilities. Async API. Choose this skill when: - Input is a video file (mp4/mov/mkv) — auto-extracts audio track - Audio needs denoising before recognition - File exceeds 512MB or 5 hours (no size limit) - Audio source is a TOS internal path (tos://bucket/key) - Need structured JSON output with timestamped utterances and metadata - Need speaker diarization, emotion/gender detection, speech rate, or sensitive word filtering Supports 99 languages, multiple formats (wav/mp3/m4a/aac/flac/ogg/mp4/mov/mkv), and auto language detection.
AI demos and GPU compute with Gradio Spaces and Hugging Face Spaces ZeroGPU. Use when writing or reviewing code that uses `@spaces.GPU`, configuring `python_version` or `requirements.txt` for a ZeroGPU Space, or handling ZeroGPU-specific code constraints — pickle-based process isolation, `gr.State` semantics across the worker boundary, no `torch.compile` (use AoTI instead), CUDA wheel-only builds (no `nvcc` at build or runtime), large vs xlarge sizing, and dynamic duration callables. Make sure to use this skill whenever the user mentions ZeroGPU, `@spaces.GPU`, or the `spaces` Python package, or hits ZeroGPU-specific code errors like `PicklingError` across the worker boundary, `illegal duration`, or `flash-attn` wheel-build failures — even when the user does not explicitly ask for ZeroGPU coding guidance. Trigger on `import spaces` or `@spaces.GPU` in code.
Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.
Pythonic idioms, PEP 8 standards, type hints, and best practices for building robust, efficient, and maintainable Python applications.
Audit Web Audio API code for sound synthesis best practices. Use when reviewing procedural audio, implementing UI sounds, or checking audio parameter quality. Outputs file:line findings.
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Use this skill for AI music generation. Triggers include: "generate music", "create a song", "make music", "compose", "create a beat", "generate audio", "make a soundtrack", "create a jingle", "instrumental music", "background music", "lo-fi beats", "electronic music" Supports vocals (Suno/Udio) and instrumental-only (Google Lyria).
Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.
Generate images and videos using x402-protected AI models at StableStudio. USE FOR: - Generating images from text prompts - Generating videos from text or images - Editing images with AI - Creating visual content TRIGGERS: - "generate image", "create image", "make a picture" - "generate video", "create video", "make a video" - "edit image", "modify image" - "stablestudio", "nano-banana", "sora", "veo" ALWAYS use `npx agentcash fetch` or `npx agentcash fetch-auth` for stablestudio.dev endpoints.
OmniStudio Integration Procedure creation and validation with 110-point scoring. Use when building server-side process orchestrations that combine Data Mapper actions, Apex Remote Actions, HTTP callouts, and conditional logic. TRIGGER when: user creates Integration Procedures, adds Data Mapper steps, configures Remote Actions, or reviews existing IP configurations. DO NOT TRIGGER when: building OmniScripts (use sf-omniscript), creating Data Mappers directly (use sf-datamapper), or analyzing cross-component dependencies (use sf-omnistudio-analyze).
Use when implementing speech-to-text, audio transcription, real-time streaming STT, audio intelligence features, or voice AI using AssemblyAI APIs or SDKs. Use when user mentions AssemblyAI, voice agents, transcription, speaker diarization, PII redaction of audio, LLM Gateway for audio understanding, or applying LLMs to transcripts. Also use when building voice agents with LiveKit or Pipecat that need speech-to-text, or when the user is working with any audio/video processing pipeline that could benefit from transcription, even if they don't mention AssemblyAI by name.
Use when working with Nuxt Content v3 - provides collections (local/remote/API sources), queryCollection API, MDC rendering, database configuration, NuxtStudio integration, hooks, i18n patterns, and LLMs integration