Loading...
Loading...
Found 1,612 Skills
Connect to PAXS AI platform to create meetings, upload recordings, and generate transcriptions and meeting notes. Use this skill when a user wants to transcribe audio, create meeting notes, or interact with the PAXS platform.
Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable.
Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.
Use when generating talking, singing, or presentation videos from a single character image and audio with Alibaba Cloud Model Studio digital-human model `wan2.2-s2v`. Use when creating narrated avatar videos, singing portraits, or broadcast-style talking-head clips.
Minimal multimodal embedding smoke test for Model Studio VL embedding models.
Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.
Use when user wants to generate music, songs, or audio tracks. Triggers on phrases like "generate a song", "make music", "create a track", "写首歌", "生成音乐", "来一首歌", "帮我做首歌", "纯音乐", "cover", "唱一首", or any request involving music creation, song writing, lyrics generation, or audio production. Also triggers when user provides lyrics and wants them turned into a song, or describes a mood/scene and wants background music. Even casual requests like "给我来点音乐" or "I want a chill beat" should trigger this skill. Do NOT use for music playback of existing files, music theory questions, or music recommendation without generation.
Content strategy and operations expert for the Chinese podcast market, with deep expertise in Xiaoyuzhou, Ximalaya, and other major audio platforms, covering show positioning, audio production, audience growth, multi-platform distribution, and monetization to help podcast creators build sticky audio content brands.
Generate spoken audio from text using OpenAI's API with built-in voices. Useful for narrated explainers, lecture audio, and quick voiceover tracks.
Generate podcast clip visualization video prompts for Seedance 2.0 on Higgsfield. Use for podcast clip videos, audio-to-visual content, audiogram alternatives, podcast highlight reels, interview clip visuals, or any video that transforms audio content into engaging visual format. Triggers on podcast, audio clip, audiogram, interview clip, sound bite, audio visual, podcast video, episode highlight, podcast clip.
Deepgram API reference for speech-to-text, text-to-speech, voice agents, audio intelligence, and account management. Use whenever building with Deepgram APIs — REST or WebSocket. Covers authentication, all endpoints, query parameters, request/response schemas, and WebSocket message formats. Reference files are organized by domain: listen (STT), speak (TTS), agent (voice agents), read (text/audio intelligence), models, projects, auth, and self-hosted.
Reviews Go code for idiomatic patterns, error handling, concurrency safety, and common mistakes. Use when reviewing .go files, checking error handling, goroutine usage, or interface design.