Loading...
Loading...
Found 103 Skills
Build backend AI with Vercel AI SDK v6 stable. Covers Output API (replaces generateObject/streamObject), speech synthesis, transcription, embeddings, MCP tools with security guidance. Includes v4→v5 migration and 15 error solutions with workarounds. Use when: implementing AI SDK v5/v6, migrating versions, troubleshooting AI_APICallError, Workers startup issues, Output API errors, Gemini caching issues, Anthropic tool errors, MCP tools, or stream resumption failures.
Expert skill for implementing speech-to-text with Faster Whisper. Covers audio processing, transcription optimization, privacy protection, and secure handling of voice data for JARVIS voice assistant.
Receive and verify ElevenLabs webhooks. Use when setting up ElevenLabs webhook handlers, debugging signature verification, or handling call transcription events.
Receive and verify Deepgram webhooks (callbacks). Use when setting up Deepgram webhook handlers, processing transcription callbacks, or handling asynchronous transcription results.
Refine speech transcription texts (interviews, speeches, podcasts, meetings) into more readable article paragraphs. Trigger this skill when users mention terms like "subtitle refinement", "transcript polish", "subtitle polishing", "organize video subtitles into articles", "interview text organization", processing interview records, transcription text optimization, speech-to-text organization, or when they need to organize long dialogue/speech texts into readable articles. It is suitable for organizing transcription texts of solo speeches or multi-person conversations, requiring the retention of original sentences and words, and rejecting high-level generalization. This skill should also be triggered even if users only say "help me organize this text" and attach obviously colloquial text.
ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe
Use when user asks YouTube video extraction, get, fetch, transcripts, subtitles, or captions. Writes video details and transcription into structured markdown file.
Clean and reconstruct raw auto-generated captions (Zoom, YouTube, Teams, Google Meet, Otter.ai, etc.) into readable, coherent transcripts. Use when the user provides raw caption files (.txt, .vtt, .srt), meeting transcripts with timestamps and speaker tags, or asks to clean up/refine a transcript. Handles: timestamp removal, speaker tag normalization, filler word removal, broken sentence reconstruction, transcription error correction, paragraph formation. Preserves every piece of substantive content while removing noise. Trigger phrases: 'clean this transcript', 'refine captions', 'fix this transcript', 'process Zoom captions', 'clean up meeting notes'.
Spoken video transcription and slip-of-the-tongue recognition. Generate review drafts and deletion task checklists. Trigger phrases: edit spoken video, process video, recognize slip-of-the-tongue
Z.ai API integration for building applications with GLM models. Use when working with Z.ai/ZhipuAI APIs for: (1) Chat completions with GLM-4.7/4.6/4.5 models, (2) Vision/multimodal tasks with GLM-4.6V, (3) Image generation with GLM-Image or CogView-4, (4) Video generation with CogVideoX-3 or Vidu models, (5) Audio transcription with GLM-ASR-2512, (6) Function calling and tool use, (7) Web search integration, (8) Translation, slide/poster generation agents. Triggers: Z.ai, ZhipuAI, GLM, BigModel, Zhipu, CogVideoX, CogView, Vidu.
Video & Podcast Digest — send a video/podcast link, get full transcript + structured summary. Supports YouTube, Bilibili, X/Twitter video, Xiaoyuzhou, Apple Podcasts, and direct audio/video links. Uses yt-dlp for subtitles and Groq Whisper for transcription.
Supercharged video tools for downloading, speed controls, scrubbing, transcriptions, clipping, conversions and more