Loading...
Loading...
Convert written documents to narrated video scripts with TTS audio and word-level timing. Use when preparing essays, blog posts, or articles for video narration. Outputs scene files, audio, and VTT with precise word timestamps. Keywords: narration, voiceover, TTS, scenes, audio, timing, video script, spoken.
npx skill4agent add jwynia/agent-skills document-to-narrationtts/model/Document (.md)
↓ [agent interprets scene breaks]
Scene .txt files (01-scene-name.txt, 02-scene-name.txt, ...)
↓ [TTS via narrate-full.py - SINGLE PASS]
full-narration.wav (one consistent audio file)
↓ [Whisper via transcribe-full.py]
full-narration.json + full-narration.vtt (word-level timing)
↓ [extract-scene-boundaries.py]
Scene timing boundaries for video compositionScene .txt files
↓ [TTS via narrate-scenes.py - MULTIPLE PASSES]
Scene .wav files (volume may vary between scenes)
↓ [concatenate]
Combined audio (may have clipping at boundaries)Warning: Per-scene TTS generates audio with different volume levels and pacing. When concatenated, this causes audible jumps and clipping. Use the full narration pipeline instead.
cd .claude/skills/document-to-narration
source tts/.venv/bin/activate
# 1. Split document into scenes (manual or scripted)
deno run --allow-read --allow-write scripts/split-to-scenes.ts input.md --output ./output/
# 2. Generate single audio file
python scripts/narrate-full.py ./output/scenes/
# 3. Transcribe with word-level timestamps
python scripts/transcribe-full.py ./output/full-narration.wav
# 4. Extract scene boundaries for video timing
python scripts/extract-scene-boundaries.py ./output/scenes/ ./output/full-narration.json --typescript# 1. Split document into scenes
deno run --allow-read --allow-write scripts/split-to-scenes.ts input.md --output ./output/
# 2. Generate audio per scene (may have volume inconsistencies)
source tts/.venv/bin/activate
python scripts/narrate-scenes.py ./output/scenes/
# 3. Transcribe (DEPRECATED: transcribe-scenes.ts requires whisper-cpp)
# Use transcribe-full.py instead after concatenating audiocd .claude/skills/document-to-narration/tts
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txttts/model/tts/model/
├── config.json
├── generation_config.json
├── model.safetensors # Main model weights
├── tokenizer_config.json
├── vocab.json
├── merges.txt
└── speech_tokenizer/
└── ...import { installWhisperCpp, downloadWhisperModel } from '@remotion/install-whisper-cpp';
await installWhisperCpp({ to: './whisper-cpp', version: '1.5.5' });
await downloadWhisperModel({ model: 'medium', folder: './whisper-cpp' });deno run -A scripts/full-pipeline.ts /path/to/essay.md --output ./output/essay-name/output/essay-name/
├── scenes/
│ ├── 01-opening-hook.txt # Scene script
│ ├── 01-opening-hook.wav # Generated audio
│ ├── 01-opening-hook.vtt # Word-level captions
│ ├── 02-core-argument.txt
│ ├── 02-core-argument.wav
│ ├── 02-core-argument.vtt
│ └── ...
└── manifest.json # Complete timing datadeno run --allow-read --allow-write scripts/split-to-scenes.ts input.md --output ./output/
deno run --allow-read --allow-write scripts/split-to-scenes.ts input.md --output ./output/ --adapt
deno run --allow-read scripts/split-to-scenes.ts input.md --dry-run--output--adapt--dry-run.txtmanifest.jsonpython scripts/narrate-full.py ./output/scenes/
python scripts/narrate-full.py ./output/scenes/ --force
python scripts/narrate-full.py ./output/scenes/ --speaker jwynia
python scripts/narrate-full.py ./output/scenes/ --output ./custom/path/audio.wav--force--speaker--output../full-narration.wavfull-narration.wavpython scripts/narrate-scenes.py ./output/scenes/
python scripts/narrate-scenes.py ./output/scenes/ --force
python scripts/narrate-scenes.py ./output/scenes/ --speaker jwynia--force--speaker.wav.txtpython scripts/transcribe-full.py ./output/full-narration.wav
python scripts/transcribe-full.py ./output/full-narration.wav --model large-v3
python scripts/transcribe-full.py ./output/full-narration.wav --output-dir ./captions/--model--output-dir.vtt.jsonopenai-whisperpip install openai-whisper# Human-readable table
python scripts/extract-scene-boundaries.py ./output/scenes/ ./output/full-narration.json
# JSON output
python scripts/extract-scene-boundaries.py ./output/scenes/ ./output/full-narration.json --json
# TypeScript for Video.tsx
python scripts/extract-scene-boundaries.py ./output/scenes/ ./output/full-narration.json --typescript--json--typescriptDeprecated: Requires whisper-cpp binary which may not be installed. Useinstead.transcribe-full.py
deno run --allow-read --allow-write --allow-run scripts/transcribe-scenes.ts ./output/scenes/.vttdeno run -A scripts/full-pipeline.ts input.md --output ./output/project-name/--output--adapt--skip-tts--skip-transcribe{
"source": "appliance-vs-trade-tool-draft.md",
"created_at": "2024-01-15T10:30:00Z",
"total_scenes": 9,
"total_duration_seconds": 420,
"scenes": [
{
"number": 1,
"slug": "popcorn-opening",
"word_count": 185,
"audio_duration_seconds": 55.2,
"files": {
"text": "scenes/01-popcorn-opening.txt",
"audio": "scenes/01-popcorn-opening.wav",
"captions": "scenes/01-popcorn-opening.vtt"
},
"captions": [
{ "text": "Two", "startMs": 0, "endMs": 180, "confidence": 0.98 },
{ "text": "people", "startMs": 180, "endMs": 450, "confidence": 0.97 }
]
}
]
}WEBVTT
00:00.000 --> 00:00.180
Two
00:00.180 --> 00:00.450
people
00:00.450 --> 00:00.720
walk
00:00.720 --> 00:01.100
into--adapt| Written | Spoken |
|---|---|
| Parenthetical asides | Em-dash or separate sentence |
| "e.g." | "for example" |
| "i.e." | "that is" |
| Long nested clauses | Split into multiple sentences |
| Semicolons | Periods |
| Context-appropriate stress |
import { Audio, useCurrentFrame, Sequence } from 'remotion';
import manifest from './output/manifest.json';
// Use scene durations for Sequence timing
{manifest.scenes.map((scene, i) => (
<Sequence
from={accumulatedFrames}
durationInFrames={scene.audio_duration_seconds * fps}
>
<Audio src={staticFile(scene.files.audio)} />
<CaptionRenderer captions={scene.captions} />
</Sequence>
))}ffmpeg -i input.wav -ar 16000 -ac 1 output_16khz.wavtts/model/