trx
Original:🇺🇸 English
Translated
Transcribe audio/video using trx CLI and post-process results with agent corrections. Use when: (1) user wants to transcribe a video or audio file, (2) user shares a YouTube/Twitter/Instagram URL for transcription, (3) user says "transcribe", "subtitles", "srt", "transcript", (4) user wants to fix/clean up a whisper transcription, (5) user asks to extract text from a video.
3installs
Sourcecrafter-station/trx
Added on
NPX Install
npx skill4agent add crafter-station/trx trxTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →trx -- Agent-First Transcription CLI
Install:
npx skills add crafter-station/trx -gPrerequisites
Check setup: . If dependencies missing, run .
trx doctor --output jsontrx initInstall:
bun add -g @crafter/trxWorkflow
1. Dry-run first (always)
bash
trx transcribe <input> --dry-run --output jsonValidates input, checks dependencies, shows execution plan without running.
2. Transcribe
For URLs (YouTube, Twitter, Instagram, etc.):
bash
trx transcribe "https://youtube.com/watch?v=..." --output jsonFor local files:
bash
trx transcribe ./recording.mp4 --output jsonAgent-optimized (text only, saves tokens):
bash
trx transcribe <input> --fields text --output json3. Post-process (fix whisper mistakes)
After transcription, read the output and apply corrections. Read whisper-fixes.md for common patterns.
.txtCorrection checklist:
- Punctuation: Whisper drops periods at paragraph boundaries and misplaces commas. Fix sentence boundaries.
- Accents (Spanish): Whisper often drops diacritics. Restore: como -> como/cmo, esta -> esta/est, mas -> mas/ms.
- Technical terms: Whisper misspells domain-specific words. Ask user for a glossary or infer from context.
- Repeated phrases: Whisper sometimes stutters on word boundaries. Remove exact consecutive duplicates.
- Speaker attribution: If user provides speaker names, insert markers.
[Speaker Name]: - Filler words: Remove "um", "uh", "este", "o sea" if user wants clean output.
- Timestamp alignment: If editing , preserve the timestamp structure. Only modify text between timestamps.
.srt
4. Schema introspection
bash
trx schema transcribe
trx schema initCommands
| Command | Example |
|---|---|
| |
| |
| |
| |
Shorthand
trx <input>trx transcribe <input>Output format
- : Machine-readable (default when piped)
--output json - : Human-readable with progress (default when TTY)
--output table - : Only return transcript text (saves tokens)
--fields text - : Only return metadata (language, model)
--fields metadata - : Validate without executing
--dry-run
Flags reference
| Flag | Description | Default |
|---|---|---|
| ISO 639-1 language code | |
| Override model: tiny, base, small, medium, large | from config |
| Output directory | |
| Skip yt-dlp (local files only) | false |
| Skip ffmpeg audio cleaning | false |
| Raw JSON input | - |
Edge cases
- yt-dlp extension mismatch: yt-dlp sometimes outputs instead of
.mp4.webm. The CLI handles this by scanning for the downloaded file by prefix..mp4 - Large files (>1hr): Whisper processes in segments. Works but is slow on CPU. Consider for speed.
--model tiny - No GPU: whisper-cli uses CPU by default. Acceptable for tiny/base/small models.
- Auto-detect language: When , Whisper detects the language from the first 30 seconds. For multilingual content, specify the primary language.
--language auto