trx

Original：🇺🇸 English

Translated

Transcribe audio/video using trx CLI and post-process results with agent corrections. Use when: (1) user wants to transcribe a video or audio file, (2) user shares a YouTube/Twitter/Instagram URL for transcription, (3) user says "transcribe", "subtitles", "srt", "transcript", (4) user wants to fix/clean up a whisper transcription, (5) user asks to extract text from a video.

8installs

Sourcecrafter-station/trx

Added on2026-04-05

NPX Install

npx skill4agent add crafter-station/trx trx

SKILL.md Content

View Translation Comparison →

trx -- Agent-First Transcription CLI

Install:

npx skills add crafter-station/trx -g

Prerequisites

Check setup:

trx doctor --output json

. If dependencies missing, run

trx init

.

Install:

bun add -g @crafter/trx

Workflow

1. Dry-run first (always)

bash

trx transcribe <input> --dry-run --output json

Validates input, checks dependencies, shows execution plan without running.

2. Transcribe

For URLs (YouTube, Twitter, Instagram, etc.):

bash

trx transcribe "https://youtube.com/watch?v=..." --output json

For local files:

bash

trx transcribe ./recording.mp4 --output json

Agent-optimized (text only, saves tokens):

bash

trx transcribe <input> --fields text --output json

3. Post-process (fix whisper mistakes)

After transcription, read the

.txt

output and apply corrections. Read whisper-fixes.md for common patterns.

Correction checklist:

Punctuation: Whisper drops periods at paragraph boundaries and misplaces commas. Fix sentence boundaries.
Accents (Spanish): Whisper often drops diacritics. Restore: como -> como/cmo, esta -> esta/est, mas -> mas/ms.
Technical terms: Whisper misspells domain-specific words. Ask user for a glossary or infer from context.
Repeated phrases: Whisper sometimes stutters on word boundaries. Remove exact consecutive duplicates.
Speaker attribution: If user provides speaker names, insert
```
[Speaker Name]:
```
markers.
Filler words: Remove "um", "uh", "este", "o sea" if user wants clean output.
Timestamp alignment: If editing
```
.srt
```
, preserve the timestamp structure. Only modify text between timestamps.

4. Schema introspection

bash

trx schema transcribe
trx schema init

Commands

Command	Example
`init`	`trx init --model small`
`transcribe`	`trx transcribe <url-or-file> --output json`
`doctor`	`trx doctor --output json`
`schema`	`trx schema transcribe`

Shorthand

trx <input>

is equivalent to

trx transcribe <input>

.

Output format

```
--output json
```
: Machine-readable (default when piped)
```
--output table
```
: Human-readable with progress (default when TTY)
```
--fields text
```
: Only return transcript text (saves tokens)
```
--fields metadata
```
: Only return metadata (language, model)
```
--dry-run
```
: Validate without executing

Flags reference

Flag	Description	Default
`--language <code>`	ISO 639-1 language code	`auto` (from config)
`--model <size>`	Override model: tiny, base, small, medium, large	from config
`--output-dir <dir>`	Output directory	`.` (cwd)
`--no-download`	Skip yt-dlp (local files only)	false
`--no-clean`	Skip ffmpeg audio cleaning	false
`--json <payload>`	Raw JSON input	-

Edge cases

yt-dlp extension mismatch: yt-dlp sometimes outputs
```
.mp4.webm
```
instead of
```
.mp4
```
. The CLI handles this by scanning for the downloaded file by prefix.
Large files (>1hr): Whisper processes in segments. Works but is slow on CPU. Consider
```
--model tiny
```
for speed.
No GPU: whisper-cli uses CPU by default. Acceptable for tiny/base/small models.
Auto-detect language: When
```
--language auto
```
, Whisper detects the language from the first 30 seconds. For multilingual content, specify the primary language.

trx

NPX Install

Tags

SKILL.md Content

trx -- Agent-First Transcription CLI

Prerequisites

Workflow

1. Dry-run first (always)

2. Transcribe

3. Post-process (fix whisper mistakes)

4. Schema introspection

Commands

Shorthand

Output format

Flags reference

Edge cases