Loading...
Loading...
Train custom TTS voices for Piper (ONNX format) using fine-tuning or from-scratch approaches. Use when creating new synthetic voices, fine-tuning existing Piper checkpoints, preparing audio datasets for TTS training, or deploying voice models to devices like Raspberry Pi or Home Assistant. Covers dataset preparation, Whisper-based validation, training configuration, and ONNX export.
npx skill4agent add sammcj/agentic-coding piper-tts-trainingsox -v 0.95 input.wav -r 22050 -t wav output.wav-v 0.95import whisper
from piper_phonemize import phonemize_text
model = whisper.load_model("base")
def validate_sample(audio_path, expected_text):
result = model.transcribe(audio_path)
transcribed = result["text"].strip()
# Compare phonemically to handle spelling/punctuation differences
expected_phonemes = phonemize_text(expected_text, "en-gb")
transcribed_phonemes = phonemize_text(transcribed, "en-gb")
return expected_phonemes == transcribed_phonemesdataset/
├── metadata.csv
└── wavs/
├── sample_0001.wav
├── sample_0002.wav
└── ...{id}|{text}sample_0001|The quick brown fox jumps over the lazy dog.
sample_0002|Pack my box with five dozen liquor jugs.python3 -m piper_train.preprocess \
--language en-gb \
--input-dir dataset/ \
--output-dir piper_training_dir/ \
--dataset-format ljspeechen-gbpython3 -m piper_train \
--dataset-dir piper_training_dir/ \
--accelerator gpu \
--devices 1 \
--batch-size 12 \
--max_epochs 3000 \
--resume_from_checkpoint ljspeech-2000.ckpt \
--checkpoint-epochs 100 \
--quality high \
--precision 32--batch-size--resume_from_checkpoint--precision 32--validation-split 0.0 --num-test-examples 0loss_disc_allpython3 -m piper_train.export_onnx checkpoint.ckpt output.onnx.unoptimized
onnxsim output.onnx.unoptimized output.onnxoutput.onnx.jsonconfig.jsonscripts/convert_spelling.pyen-gben-au| American | Australian/UK |
|---|---|
| -ize | -ise |
| -or | -our |
| -er | -re |
| -og | -ogue |
| -ense | -ence |
language="en"pytorch-lightning==1.9.3
torch<2.6.0
piper-phonemize
onnxruntime-gpu
onnxsim| Issue | Solution |
|---|---|
| CUDA OOM | Reduce batch-size (try 8 or 4) |
| Checkpoint won't load | Check pytorch-lightning version matches checkpoint |
| Garbled output | Insufficient training epochs or dataset too small |
| Wrong accent | Check espeak-ng language code and corpus spelling |