Loading...
Loading...
Found 4 Skills
Vox single-entry voice orchestration skill. Used to complete environment guarding, CLI installation, on-demand model download, ASR transcription, voice cloning, pipeline execution and task troubleshooting through natural language. It is used when users only describe the target without providing specific commands.
Transcribe audio and video files to text using a remote ASR service (Qwen3-ASR or OpenAI-compatible endpoint). Extracts audio from video, sends to configurable ASR endpoint, outputs clean text. Use when the user wants to transcribe recordings, convert audio/video to text, do speech-to-text, or mentions ASR, Qwen ASR, 转录, 语音转文字, 录音转文字, or has a meeting recording, lecture, interview, or screen recording to transcribe.
Create short-video subtitles with Luma / 拾光 / 拾光工具. Use ASR, segmentation, styling, and burn-in as composable steps; keep editorial decisions in the agent instructions.
Thin orchestrator for the end-to-end video localization pipeline. Routes to the four focused sub-skills — /wjs-transcribing-audio, /wjs-translating-subtitles, /wjs-dubbing-video, /wjs-burning-subtitles. Use when the user asks for full localization in one go ("帮我把这个西班牙语视频做成中文字幕+配音", "translate and dub this video", "做完整的本地化"). For any individual step (just transcribe, just translate, just dub, just burn), invoke the sub-skill directly — it's faster and the boundary is cleaner.