trx

Original🇺🇸 English
Translated

Transcribe audio/video using trx CLI and post-process results with agent corrections. Use when: (1) user wants to transcribe a video or audio file, (2) user shares a YouTube/Twitter/Instagram URL for transcription, (3) user says "transcribe", "subtitles", "srt", "transcript", (4) user wants to fix/clean up a whisper transcription, (5) user asks to extract text from a video.

3installs
Added on

NPX Install

npx skill4agent add crafter-station/trx trx

Tags

Translated version includes tags in frontmatter

trx -- Agent-First Transcription CLI

Install:
npx skills add crafter-station/trx -g

Prerequisites

Check setup:
trx doctor --output json
. If dependencies missing, run
trx init
.
Install:
bun add -g @crafter/trx

Workflow

1. Dry-run first (always)

bash
trx transcribe <input> --dry-run --output json
Validates input, checks dependencies, shows execution plan without running.

2. Transcribe

For URLs (YouTube, Twitter, Instagram, etc.):
bash
trx transcribe "https://youtube.com/watch?v=..." --output json
For local files:
bash
trx transcribe ./recording.mp4 --output json
Agent-optimized (text only, saves tokens):
bash
trx transcribe <input> --fields text --output json

3. Post-process (fix whisper mistakes)

After transcription, read the
.txt
output and apply corrections. Read whisper-fixes.md for common patterns.
Correction checklist:
  1. Punctuation: Whisper drops periods at paragraph boundaries and misplaces commas. Fix sentence boundaries.
  2. Accents (Spanish): Whisper often drops diacritics. Restore: como -> como/cmo, esta -> esta/est, mas -> mas/ms.
  3. Technical terms: Whisper misspells domain-specific words. Ask user for a glossary or infer from context.
  4. Repeated phrases: Whisper sometimes stutters on word boundaries. Remove exact consecutive duplicates.
  5. Speaker attribution: If user provides speaker names, insert
    [Speaker Name]:
    markers.
  6. Filler words: Remove "um", "uh", "este", "o sea" if user wants clean output.
  7. Timestamp alignment: If editing
    .srt
    , preserve the timestamp structure. Only modify text between timestamps.

4. Schema introspection

bash
trx schema transcribe
trx schema init

Commands

CommandExample
init
trx init --model small
transcribe
trx transcribe <url-or-file> --output json
doctor
trx doctor --output json
schema
trx schema transcribe

Shorthand

trx <input>
is equivalent to
trx transcribe <input>
.

Output format

  • --output json
    : Machine-readable (default when piped)
  • --output table
    : Human-readable with progress (default when TTY)
  • --fields text
    : Only return transcript text (saves tokens)
  • --fields metadata
    : Only return metadata (language, model)
  • --dry-run
    : Validate without executing

Flags reference

FlagDescriptionDefault
--language <code>
ISO 639-1 language code
auto
(from config)
--model <size>
Override model: tiny, base, small, medium, largefrom config
--output-dir <dir>
Output directory
.
(cwd)
--no-download
Skip yt-dlp (local files only)false
--no-clean
Skip ffmpeg audio cleaningfalse
--json <payload>
Raw JSON input-

Edge cases

  • yt-dlp extension mismatch: yt-dlp sometimes outputs
    .mp4.webm
    instead of
    .mp4
    . The CLI handles this by scanning for the downloaded file by prefix.
  • Large files (>1hr): Whisper processes in segments. Works but is slow on CPU. Consider
    --model tiny
    for speed.
  • No GPU: whisper-cli uses CPU by default. Acceptable for tiny/base/small models.
  • Auto-detect language: When
    --language auto
    , Whisper detects the language from the first 30 seconds. For multilingual content, specify the primary language.