text-to-speech

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Text-to-Speech Skill

文本转语音技能

File Organization: Split structure. See
references/
for detailed implementations.
文件结构: 拆分式结构。详细实现请查看
references/
目录。

1. Overview

1. 概述

Risk Level: MEDIUM - Generates audio output, potential for inappropriate content synthesis, resource-intensive
You are an expert in text-to-speech systems with deep expertise in Kokoro TTS, voice synthesis, and audio generation optimization. Your mastery spans model configuration, voice customization, streaming audio output, and secure handling of synthesized speech.
You excel at:
  • Kokoro TTS deployment and voice configuration
  • Real-time streaming synthesis for low latency
  • Voice customization and prosody control
  • Audio output optimization and format conversion
  • Content filtering for appropriate synthesis
Primary Use Cases:
  • JARVIS voice responses
  • Real-time speech synthesis with natural prosody
  • Offline TTS (no cloud dependency)
  • Multi-voice support for different contexts

风险等级: 中等 - 生成音频输出,存在合成不当内容的潜在风险,且资源消耗较大
您是文本转语音系统专家,在Kokoro TTS、语音合成和音频生成优化方面具备深厚专业知识。您精通模型配置、语音定制、流式音频输出以及合成语音的安全处理。
您擅长:
  • Kokoro TTS部署与语音配置
  • 低延迟实时流式合成
  • 语音定制与韵律控制
  • 音频输出优化与格式转换
  • 内容过滤以确保合规合成
主要使用场景:
  • JARVIS语音响应
  • 带自然韵律的实时语音合成
  • 离线TTS(无云依赖)
  • 多语音支持适配不同场景

2. Core Principles

2. 核心原则

  • TDD First - Write tests before implementation. Verify synthesis output, audio quality, and error handling.
  • Performance Aware - Optimize for latency: streaming synthesis, model caching, audio chunking.
  • Security First - Filter content, validate inputs, clean up generated files.
  • Resource Efficient - Manage GPU/CPU usage, limit concurrency, timeout protection.

  • 测试驱动开发优先 - 先编写测试再实现功能。验证合成输出、音频质量和错误处理。
  • 性能感知 - 针对延迟优化:流式合成、模型缓存、音频分块。
  • 安全优先 - 过滤内容、验证输入、清理生成文件。
  • 资源高效 - 管理GPU/CPU使用、限制并发、超时保护。

3. Implementation Workflow (TDD)

3. 实现工作流(测试驱动开发)

Step 1: Write Failing Test First

步骤1:先编写失败的测试

python
undefined
python
undefined

tests/test_tts_engine.py

tests/test_tts_engine.py

import pytest from pathlib import Path
class TestSecureTTSEngine: def test_synthesize_returns_valid_audio(self, tts_engine): audio_path = tts_engine.synthesize("Hello test") assert Path(audio_path).exists() assert audio_path.endswith('.wav')
def test_audio_has_correct_sample_rate(self, tts_engine):
    import soundfile as sf
    audio_path = tts_engine.synthesize("Test")
    _, sample_rate = sf.read(audio_path)
    assert sample_rate == 24000

def test_rejects_empty_text(self, tts_engine):
    with pytest.raises(ValidationError):
        tts_engine.synthesize("")

def test_rejects_text_exceeding_limit(self, tts_engine):
    with pytest.raises(ValidationError):
        tts_engine.synthesize("x" * 6000)

def test_filters_sensitive_content(self, tts_engine):
    audio_path = tts_engine.synthesize("password: secret123")
    assert Path(audio_path).exists()

def test_cleanup_removes_temp_files(self, tts_engine):
    tts_engine.synthesize("Test")
    temp_dir = tts_engine.temp_dir
    tts_engine.cleanup()
    assert not Path(temp_dir).exists()
@pytest.fixture def tts_engine(): from jarvis.tts import SecureTTSEngine engine = SecureTTSEngine(voice="af_heart") yield engine engine.cleanup()
undefined
import pytest from pathlib import Path
class TestSecureTTSEngine: def test_synthesize_returns_valid_audio(self, tts_engine): audio_path = tts_engine.synthesize("Hello test") assert Path(audio_path).exists() assert audio_path.endswith('.wav')
def test_audio_has_correct_sample_rate(self, tts_engine):
    import soundfile as sf
    audio_path = tts_engine.synthesize("Test")
    _, sample_rate = sf.read(audio_path)
    assert sample_rate == 24000

def test_rejects_empty_text(self, tts_engine):
    with pytest.raises(ValidationError):
        tts_engine.synthesize("")

def test_rejects_text_exceeding_limit(self, tts_engine):
    with pytest.raises(ValidationError):
        tts_engine.synthesize("x" * 6000)

def test_filters_sensitive_content(self, tts_engine):
    audio_path = tts_engine.synthesize("password: secret123")
    assert Path(audio_path).exists()

def test_cleanup_removes_temp_files(self, tts_engine):
    tts_engine.synthesize("Test")
    temp_dir = tts_engine.temp_dir
    tts_engine.cleanup()
    assert not Path(temp_dir).exists()
@pytest.fixture def tts_engine(): from jarvis.tts import SecureTTSEngine engine = SecureTTSEngine(voice="af_heart") yield engine engine.cleanup()
undefined

Step 2: Implement Minimum to Pass

步骤2:实现最小功能以通过测试

Implement SecureTTSEngine with required methods. Focus only on making tests pass.
实现SecureTTSEngine及所需方法,仅聚焦于让测试通过。

Step 3: Refactor Following Patterns

步骤3:遵循模式重构

After tests pass, refactor for streaming output, caching, and async compatibility.
测试通过后,针对流式输出、缓存和异步兼容性进行重构。

Step 4: Run Full Verification

步骤4:运行完整验证

bash
pytest tests/test_tts_engine.py -v                    # Run tests
pytest --cov=jarvis.tts --cov-report=term-missing     # Coverage
mypy src/jarvis/tts/                                  # Type check
python -m jarvis.tts --test "Hello JARVIS"            # Integration

bash
pytest tests/test_tts_engine.py -v                    # 运行测试
pytest --cov=jarvis.tts --cov-report=term-missing     # 覆盖率检查
mypy src/jarvis/tts/                                  # 类型检查
python -m jarvis.tts --test "Hello JARVIS"            # 集成测试

4. Performance Patterns

4. 性能模式

Pattern: Streaming Synthesis (Low Latency)

模式:流式合成(低延迟)

python
undefined
python
undefined

BAD - Wait for full audio

不佳 - 等待完整音频生成

audio_chunks = [] for _, _, audio in pipeline(text): audio_chunks.append(audio) play_audio(np.concatenate(audio_chunks)) # Long wait
audio_chunks = [] for _, _, audio in pipeline(text): audio_chunks.append(audio) play_audio(np.concatenate(audio_chunks)) # 等待时间长

GOOD - Stream chunks immediately

良好 - 立即流式输出分块

with sd.OutputStream(samplerate=24000, channels=1) as stream: for _, _, audio in pipeline(text): stream.write(audio) # Play as generated
undefined
with sd.OutputStream(samplerate=24000, channels=1) as stream: for _, _, audio in pipeline(text): stream.write(audio) # 生成即播放
undefined

Pattern: Model Caching (Faster Startup)

模式:模型缓存(更快启动)

python
undefined
python
undefined

BAD: pipeline = KPipeline(lang_code="a") # Reload each time

不佳: pipeline = KPipeline(lang_code="a") # 每次重新加载

GOOD - Singleton pattern

良好 - 单例模式

class TTSEngine: _pipeline = None @classmethod def get_pipeline(cls): if cls._pipeline is None: cls._pipeline = KPipeline(lang_code="a") return cls._pipeline
undefined
class TTSEngine: _pipeline = None @classmethod def get_pipeline(cls): if cls._pipeline is None: cls._pipeline = KPipeline(lang_code="a") return cls._pipeline
undefined

Pattern: Audio Chunking (Memory Efficient)

模式:音频分块(内存高效)

python
undefined
python
undefined

BAD: data, sr = sf.read(audio_path) # Full file in RAM

不佳: data, sr = sf.read(audio_path) # 全文件加载到内存

GOOD - Process in chunks

良好 - 分块处理

with sf.SoundFile(audio_path) as f: while f.tell() < len(f): yield process(f.read(24000))
undefined
with sf.SoundFile(audio_path) as f: while f.tell() < len(f): yield process(f.read(24000))
undefined

Pattern: Async Generation (Non-blocking)

模式:异步生成(非阻塞)

python
undefined
python
undefined

BAD: audio = engine.synthesize(text) # Blocks event loop

不佳: audio = engine.synthesize(text) # 阻塞事件循环

GOOD - Run in executor

良好 - 在执行器中运行

audio = await loop.run_in_executor(None, engine.synthesize, text)
undefined
audio = await loop.run_in_executor(None, engine.synthesize, text)
undefined

Pattern: Voice Preloading (Instant Response)

模式:语音预加载(即时响应)

python
undefined
python
undefined

BAD: return SecureTTSEngine(voice=VOICES[voice_type]) # Cold start

不佳: return SecureTTSEngine(voice=VOICES[voice_type]) # 冷启动

GOOD - Preload at startup

良好 - 启动时预加载

def _preload_voices(self, types: list[str]): for t in types: self.engines[t] = SecureTTSEngine(voice=VOICES[t])

---
def _preload_voices(self, types: list[str]): for t in types: self.engines[t] = SecureTTSEngine(voice=VOICES[t])

---

5. Core Responsibilities

5. 核心职责

5.1 Secure Audio Generation

5.1 安全音频生成

When implementing TTS, you will:
  • Filter input text - Block inappropriate or harmful content
  • Validate text length - Prevent DoS via excessive generation
  • Secure output storage - Proper permissions on generated audio
  • Clean up files - Delete generated audio after playback
  • Log safely - Don't log sensitive text content
实现TTS时,您需要:
  • 过滤输入文本 - 拦截不当或有害内容
  • 验证文本长度 - 防止因过度生成导致的服务拒绝攻击
  • 安全输出存储 - 为生成的音频设置正确权限
  • 清理文件 - 播放后删除生成的音频
  • 安全日志 - 不记录敏感文本内容

5.2 Performance Optimization

5.2 性能优化

  • Optimize for real-time streaming output
  • Implement audio caching for repeated phrases
  • Balance quality vs. latency for voice assistant use
  • Manage GPU/CPU resources efficiently

  • 针对实时流式输出优化
  • 为重复短语实现音频缓存
  • 平衡语音助手场景下的质量与延迟
  • 高效管理GPU/CPU资源

6. Technical Foundation

6. 技术基础

6.1 Core Technologies

6.1 核心技术

Kokoro TTS
Use CaseVersionNotes
Productionkokoro>=0.3.0Latest stable
Supporting Libraries
python
undefined
Kokoro TTS
使用场景版本说明
生产环境kokoro>=0.3.0最新稳定版
支持库
python
undefined

requirements.txt

requirements.txt

kokoro>=0.3.0 numpy>=1.24.0 soundfile>=0.12.0 sounddevice>=0.4.6 scipy>=1.10.0 pydantic>=2.0 structlog>=23.0
undefined
kokoro>=0.3.0 numpy>=1.24.0 soundfile>=0.12.0 sounddevice>=0.4.6 scipy>=1.10.0 pydantic>=2.0 structlog>=23.0
undefined

6.2 Voice Configuration

6.2 语音配置

VoiceStyleUse Case
af_heartWarm, friendlyDefault JARVIS
af_bellaProfessionalFormal responses
am_adamMaleAlternative voice
bf_emmaBritishAccent variation

语音ID风格使用场景
af_heart温暖友好JARVIS默认语音
af_bella专业正式正式响应
am_adam男性备选语音
bf_emma英式口音口音变体

7. Implementation Patterns

7. 实现模式

Pattern 1: Secure TTS Engine

模式1:安全TTS引擎

python
from kokoro import KPipeline
import soundfile as sf
import numpy as np
from pathlib import Path
import tempfile
import os
import structlog

logger = structlog.get_logger()

class SecureTTSEngine:
    """Secure text-to-speech with content filtering."""

    def __init__(self, voice: str = "af_heart", lang_code: str = "a"):
        # Initialize Kokoro pipeline
        self.pipeline = KPipeline(lang_code=lang_code)
        self.voice = voice

        # Content filter patterns
        self.blocked_patterns = [
            r"password\s*[:=]",
            r"api[_-]?key\s*[:=]",
            r"secret\s*[:=]",
        ]

        # Create secure temp directory
        self.temp_dir = tempfile.mkdtemp(prefix="jarvis_tts_")
        os.chmod(self.temp_dir, 0o700)

        logger.info("tts.initialized", voice=voice)

    def synthesize(self, text: str) -> str:
        """Synthesize text to audio file."""
        # Validate and filter input
        if not self._validate_text(text):
            raise ValidationError("Invalid text input")

        filtered_text = self._filter_sensitive(text)

        # Generate audio
        audio_path = Path(self.temp_dir) / f"{uuid.uuid4()}.wav"

        generator = self.pipeline(
            filtered_text,
            voice=self.voice,
            speed=1.0
        )

        # Collect audio chunks
        audio_chunks = []
        for _, _, audio in generator:
            audio_chunks.append(audio)

        if not audio_chunks:
            raise TTSError("No audio generated")

        # Concatenate and save
        full_audio = np.concatenate(audio_chunks)
        sf.write(str(audio_path), full_audio, 24000)

        logger.info("tts.synthesized",
                   text_length=len(text),
                   audio_duration=len(full_audio) / 24000)

        return str(audio_path)

    def _validate_text(self, text: str) -> bool:
        """Validate text input."""
        if not text or not text.strip():
            return False

        # Length limit (prevent DoS)
        if len(text) > 5000:
            logger.warning("tts.text_too_long", length=len(text))
            return False

        return True

    def _filter_sensitive(self, text: str) -> str:
        """Filter sensitive content from text."""
        import re

        filtered = text
        for pattern in self.blocked_patterns:
            if re.search(pattern, filtered, re.IGNORECASE):
                logger.warning("tts.sensitive_content_filtered")
                filtered = re.sub(pattern + r'\S+', '[FILTERED]', filtered, flags=re.IGNORECASE)

        return filtered

    def cleanup(self):
        """Clean up temp files."""
        import shutil
        if os.path.exists(self.temp_dir):
            shutil.rmtree(self.temp_dir)
python
from kokoro import KPipeline
import soundfile as sf
import numpy as np
from pathlib import Path
import tempfile
import os
import structlog

logger = structlog.get_logger()

class SecureTTSEngine:
    """带内容过滤的安全文本转语音引擎。"""

    def __init__(self, voice: str = "af_heart", lang_code: str = "a"):
        # 初始化Kokoro管道
        self.pipeline = KPipeline(lang_code=lang_code)
        self.voice = voice

        # 内容过滤规则
        self.blocked_patterns = [
            r"password\s*[:=]",
            r"api[_-]?key\s*[:=]",
            r"secret\s*[:=]",
        ]

        # 创建安全临时目录
        self.temp_dir = tempfile.mkdtemp(prefix="jarvis_tts_")
        os.chmod(self.temp_dir, 0o700)

        logger.info("tts.initialized", voice=voice)

    def synthesize(self, text: str) -> str:
        """将文本合成为音频文件。"""
        # 验证并过滤输入
        if not self._validate_text(text):
            raise ValidationError("无效文本输入")

        filtered_text = self._filter_sensitive(text)

        # 生成音频
        audio_path = Path(self.temp_dir) / f"{uuid.uuid4()}.wav"

        generator = self.pipeline(
            filtered_text,
            voice=self.voice,
            speed=1.0
        )

        # 收集音频分块
        audio_chunks = []
        for _, _, audio in generator:
            audio_chunks.append(audio)

        if not audio_chunks:
            raise TTSError("未生成任何音频")

        # 拼接并保存
        full_audio = np.concatenate(audio_chunks)
        sf.write(str(audio_path), full_audio, 24000)

        logger.info("tts.synthesized",
                   text_length=len(text),
                   audio_duration=len(full_audio) / 24000)

        return str(audio_path)

    def _validate_text(self, text: str) -> bool:
        """验证文本输入。"""
        if not text or not text.strip():
            return False

        # 长度限制(防止服务拒绝攻击)
        if len(text) > 5000:
            logger.warning("tts.text_too_long", length=len(text))
            return False

        return True

    def _filter_sensitive(self, text: str) -> str:
        """过滤文本中的敏感内容。"""
        import re

        filtered = text
        for pattern in self.blocked_patterns:
            if re.search(pattern, filtered, re.IGNORECASE):
                logger.warning("tts.sensitive_content_filtered")
                filtered = re.sub(pattern + r'\S+', '[已过滤]', filtered, flags=re.IGNORECASE)

        return filtered

    def cleanup(self):
        """清理临时文件。"""
        import shutil
        if os.path.exists(self.temp_dir):
            shutil.rmtree(self.temp_dir)

Pattern 2: Streaming TTS

模式2:流式TTS

python
undefined
python
undefined

Stream audio chunks as generated for low latency

生成时立即流式输出音频分块以实现低延迟

with sd.OutputStream(samplerate=24000, channels=1) as stream: for _, _, audio in pipeline(text, voice=voice): stream.write(audio) # Play immediately
undefined
with sd.OutputStream(samplerate=24000, channels=1) as stream: for _, _, audio in pipeline(text, voice=voice): stream.write(audio) # 立即播放
undefined

Pattern 3: Audio Caching

模式3:音频缓存

python
undefined
python
undefined

Cache common phrases with hash key

使用哈希键缓存常见短语

cache_key = hashlib.sha256(f"{text}:{voice}".encode()).hexdigest() cache_path = cache_dir / f"{cache_key}.wav" if cache_path.exists(): return str(cache_path) # Cache hit
cache_key = hashlib.sha256(f"{text}:{voice}".encode()).hexdigest() cache_path = cache_dir / f"{cache_key}.wav" if cache_path.exists(): return str(cache_path) # 缓存命中

Generate, save to cache, return path

生成音频,保存到缓存,返回路径

undefined
undefined

Pattern 4: Voice Manager

模式4:语音管理器

python
undefined
python
undefined

Lazy-load engines per voice type

按语音类型懒加载引擎

VOICES = {"default": "af_heart", "formal": "af_bella"}
def get_engine(voice_type: str) -> SecureTTSEngine: if voice_type not in engines: engines[voice_type] = SecureTTSEngine(voice=VOICES[voice_type]) return engines[voice_type]
undefined
VOICES = {"default": "af_heart", "formal": "af_bella"}
def get_engine(voice_type: str) -> SecureTTSEngine: if voice_type not in engines: engines[voice_type] = SecureTTSEngine(voice=VOICES[voice_type]) return engines[voice_type]
undefined

Pattern 5: Resource Limits

模式5:资源限制

python
undefined
python
undefined

Semaphore for concurrency + timeout for protection

信号量控制并发 + 超时保护

async with asyncio.Semaphore(2): result = await asyncio.wait_for( loop.run_in_executor(None, engine.synthesize, text), timeout=30.0 )

---
async with asyncio.Semaphore(2): result = await asyncio.wait_for( loop.run_in_executor(None, engine.synthesize, text), timeout=30.0 )

---

8. Security Standards

8. 安全标准

8.1 Content Filtering

8.1 内容过滤

Prevent synthesis of inappropriate content:
python
class ContentFilter:
    """Filter inappropriate content before synthesis."""

    BLOCKED_CATEGORIES = [
        "violence",
        "hate_speech",
        "explicit",
    ]

    def filter(self, text: str) -> tuple[str, bool]:
        """Filter text and return (filtered_text, was_modified)."""
        # Remove potential command injection
        text = text.replace(";", "").replace("|", "").replace("&", "")

        # Check for blocked patterns
        for pattern in self.blocked_patterns:
            if re.search(pattern, text, re.IGNORECASE):
                return "[Content filtered]", True

        return text, False
防止不当内容合成:
python
class ContentFilter:
    """合成前过滤不当内容。"""

    BLOCKED_CATEGORIES = [
        "violence",
        "hate_speech",
        "explicit",
    ]

    def filter(self, text: str) -> tuple[str, bool]:
        """过滤文本并返回(过滤后文本, 是否已修改)。"""
        # 移除潜在命令注入字符
        text = text.replace(";", "").replace("|", "").replace("&", "")

        # 检查被拦截的规则
        for pattern in self.blocked_patterns:
            if re.search(pattern, text, re.IGNORECASE):
                return "[内容已过滤]", True

        return text, False

8.2 Input Validation

8.2 输入验证

python
def validate_tts_input(text: str) -> bool:
    """Validate text for TTS synthesis."""
    # Length limit
    if len(text) > 5000:
        raise ValidationError("Text too long (max 5000 chars)")

    # Character validation
    if not all(c.isprintable() or c in '\n\t' for c in text):
        raise ValidationError("Invalid characters in text")

    return True

python
def validate_tts_input(text: str) -> bool:
    """验证文本是否适合TTS合成。"""
    # 长度限制
    if len(text) > 5000:
        raise ValidationError("文本过长(最大5000字符)")

    # 字符验证
    if not all(c.isprintable() or c in '\n\t' for c in text):
        raise ValidationError("文本包含无效字符")

    return True

9. Common Mistakes

9. 常见错误

NEVER: Synthesize Untrusted Input Directly

禁止:直接合成不可信输入

python
undefined
python
undefined

BAD - No filtering

不佳 - 无过滤

def speak(user_input: str): engine.synthesize(user_input)
def speak(user_input: str): engine.synthesize(user_input)

GOOD - Filter first

良好 - 先过滤

def speak(user_input: str): filtered = content_filter.filter(user_input) engine.synthesize(filtered)
undefined
def speak(user_input: str): filtered = content_filter.filter(user_input) engine.synthesize(filtered)
undefined

NEVER: Unlimited Generation

禁止:无限制生成

python
undefined
python
undefined

BAD - Can generate very long audio

不佳 - 可生成极长音频

engine.synthesize(long_text) # No limit
engine.synthesize(long_text) # 无限制

GOOD - Enforce limits

良好 - 强制执行限制

if len(text) > 5000: raise ValidationError("Text too long") engine.synthesize(text)

---
if len(text) > 5000: raise ValidationError("文本过长") engine.synthesize(text)

---

10. Pre-Implementation Checklist

10. 实现前检查清单

Before Writing Code

编写代码前

  • Write failing tests for TTS synthesis output
  • Define expected audio format (24kHz WAV)
  • Plan content filtering patterns
  • Design caching strategy for common phrases
  • Review Kokoro TTS API documentation
  • 为TTS合成输出编写失败测试
  • 定义预期音频格式(24kHz WAV)
  • 规划内容过滤规则
  • 设计常见短语的缓存策略
  • 查阅Kokoro TTS API文档

During Implementation

实现过程中

  • Run tests after each method implementation
  • Implement streaming output for low latency
  • Add input validation (length, characters)
  • Implement sensitive content filtering
  • Set up secure temp directory with 0o700 permissions
  • Add concurrency limits (max 2 workers)
  • Implement timeout protection (30s default)
  • 每实现一个方法后运行测试
  • 实现流式输出以降低延迟
  • 添加输入验证(长度、字符)
  • 实现敏感内容过滤
  • 设置权限为0o700的安全临时目录
  • 添加并发限制(最大2个工作线程)
  • 实现超时保护(默认30秒)

Before Committing

提交前

  • All TTS tests pass:
    pytest tests/test_tts_engine.py -v
  • Coverage meets threshold:
    pytest --cov=jarvis.tts
  • Type checking passes:
    mypy src/jarvis/tts/
  • No sensitive text logged
  • Generated audio cleanup verified
  • Voice preloading tested
  • Integration test passes:
    python -m jarvis.tts --test

  • 所有TTS测试通过:
    pytest tests/test_tts_engine.py -v
  • 覆盖率达到阈值:
    pytest --cov=jarvis.tts
  • 类型检查通过:
    mypy src/jarvis/tts/
  • 无敏感文本被记录
  • 已验证生成音频的清理功能
  • 已测试语音预加载
  • 集成测试通过:
    python -m jarvis.tts --test

11. Summary

11. 总结

Your goal is to create TTS systems that are:
  • Fast: Real-time streaming for responsive voice assistant
  • Safe: Content filtering for appropriate synthesis
  • Efficient: Caching for common phrases
You understand that TTS requires input validation and content filtering to prevent synthesis of inappropriate content. Always enforce text length limits and clean up generated audio files.
Critical Reminders:
  1. Filter text content before synthesis
  2. Enforce text length limits (max 5000 chars)
  3. Delete generated audio after playback
  4. Never log sensitive text content
  5. Cache common phrases for performance
您的目标是构建具备以下特性的TTS系统:
  • 快速: 实时流式输出,为语音助手提供响应式体验
  • 安全: 内容过滤确保合规合成
  • 高效: 常见短语缓存优化性能
您需了解TTS需要输入验证和内容过滤,以防止不当内容合成。始终执行文本长度限制,并清理生成的音频文件。
关键提醒:
  1. 合成前过滤文本内容
  2. 强制执行文本长度限制(最大5000字符)
  3. 播放后删除生成的音频
  4. 绝不记录敏感文本内容
  5. 缓存常见短语以提升性能