multimedia-backend-integrator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Multimedia Backend Integrator

多媒体后端集成指南

Reference guide for adding new media generation backends to MassGen's unified
generate_media
tool.
用于向MassGen统一的
generate_media
工具添加新媒体生成后端的参考指南。

Architecture Overview

架构概览

_base.py          -- Registration: API keys, default models, priority lists
_selector.py      -- Auto-selection logic: picks best backend by key + priority
_image.py         -- Image backends: OpenAI, Google (Gemini/Imagen), Grok, OpenRouter
_video.py         -- Video backends: Grok, Google Veo, OpenAI Sora
_audio.py         -- Audio backends: ElevenLabs, OpenAI TTS
generate_media.py -- Entry point: routing, validation, batch mode, image-to-image
_base.py          -- Registration: API keys, default models, priority lists
_selector.py      -- Auto-selection logic: picks best backend by key + priority
_image.py         -- Image backends: OpenAI, Google (Gemini/Imagen), Grok, OpenRouter
_video.py         -- Video backends: Grok, Google Veo, OpenAI Sora
_audio.py         -- Audio backends: ElevenLabs, OpenAI TTS
generate_media.py -- Entry point: routing, validation, batch mode, image-to-image

Complete Checklist: Adding a New Backend

新增后端的完整检查清单

1. Registration (
_base.py
)

1. 注册(
_base.py

  • Add to
    BACKEND_API_KEYS
    : map backend name to env var(s)
  • Add to
    DEFAULT_MODELS
    : map backend name to
    {MediaType: model_name}
    for each supported type
  • Add to
    BACKEND_PRIORITY
    : insert at correct position per media type
  • 添加到
    BACKEND_API_KEYS
    :将后端名称映射到环境变量
  • 添加到
    DEFAULT_MODELS
    :针对后端支持的每种媒体类型,将后端名称映射到
    {MediaType: model_name}
    结构
  • 添加到
    BACKEND_PRIORITY
    :按媒体类型插入到正确的优先级位置

2. Implementation (
_image.py
/
_video.py
/
_audio.py
)

2. 实现(
_image.py
/
_video.py
/
_audio.py

  • Add
    import
    for SDK at module top
  • Implement
    _generate_{media}_{backend}(config) -> GenerationResult
  • Check API key first, return error result if missing
  • Create SDK client with API key
  • Map
    config.*
    fields to SDK parameters
  • Handle continuation (if applicable) — see Continuation Store Patterns
  • Write output bytes to
    config.output_path
  • Return
    GenerationResult
    with metadata
  • Wrap in try/except, log errors
  • 在模块顶部导入对应的SDK
  • 实现
    _generate_{media}_{backend}(config) -> GenerationResult
    函数
  • 首先检查API key,若缺失则返回错误结果
  • 使用API key创建SDK客户端
  • config.*
    字段映射到SDK参数
  • 处理续跑逻辑(如适用)—— 参考续跑存储模式章节
  • 将输出字节流写入
    config.output_path
    指定路径
  • 返回携带元数据的
    GenerationResult
    对象
  • 使用try/except包裹逻辑,打印错误日志

3. Dispatcher Update

3. 调度器更新

  • Add
    elif backend == "new_backend":
    in the media type's
    generate_{media}()
    function
  • 在对应媒体类型的
    generate_{media}()
    函数中添加
    elif backend == "new_backend":
    分支

4. Image-to-Image Support (
generate_media.py
)

4. 图生图支持(
generate_media.py

  • Add backend name to the
    selected_backend not in (...)
    check in
    _generate_single_with_input_images
  • Add fallback:
    elif has_api_key("new_backend"):
    in the auto-selection chain
  • Update error message to mention new backend + env var
  • 将后端名称添加到
    _generate_single_with_input_images
    函数中的
    selected_backend not in (...)
    校验列表中
  • 在自动选择链路中添加降级逻辑:
    elif has_api_key("new_backend"):
  • 更新错误提示,提及新增后端及其对应的环境变量

5. Documentation

5. 文档

  • TOOL.md
    : Add env var to frontmatter, backend to tables, keywords
  • generate_media.py
    docstring: Update
    backend_type
    list and
    Supported Backends
  • TOOL.md
    :在前置元数据中添加环境变量,在表格和关键词中补充新增后端
  • generate_media.py
    的文档字符串:更新
    backend_type
    列表和「支持的后端」章节

6. Tests

6. 测试

  • Backend registration tests (API keys, default models, priority order)
  • Auto-selection tests (with only this backend's key, with multiple keys)
  • SDK call verification (correct params passed through)
  • Output file written correctly
  • Continuation flow (if applicable)
  • Error handling (missing key, API errors)
  • Parameter mapping (aspect_ratio, size, duration)
  • Update existing tests that assert priority list length/contents
  • 后端注册测试(API key、默认模型、优先级顺序)
  • 自动选择测试(仅配置该后端key、同时配置多个后端key两种场景)
  • SDK调用校验(确认参数传递正确)
  • 输出文件写入正确性校验
  • 续跑流程测试(如适用)
  • 错误处理测试(key缺失、API报错场景)
  • 参数映射测试(宽高比、尺寸、时长)
  • 更新所有断言优先级列表长度/内容的已有测试用例

Continuation Store Patterns

续跑存储模式

Each backend that supports iterative editing needs a continuation mechanism:
BackendStore TypeKey FormatWhat's StoredHow Continuation Works
OpenAIStateless (server-side)
response.id
Nothing locallyPass
previous_response_id
to next call
Gemini
_GeminiChatStore
(in-memory)
gemini_chat_{uuid12}
(client, chat) tuplesReuse chat object for
send_message()
; client kept alive to prevent HTTP connection GC
Grok
_GrokImageStore
(in-memory)
grok_img_{uuid12}
Base64 stringsPass stored base64 as
image_url
data URI
所有支持迭代编辑的后端都需要配套续跑机制:
后端存储类型Key格式存储内容续跑实现逻辑
OpenAI无状态(服务端存储)
response.id
本地无存储
previous_response_id
传入下一次调用
Gemini
_GeminiChatStore
(内存存储)
gemini_chat_{uuid12}
(client, chat)元组复用chat对象调用
send_message()
;保持client存活避免HTTP连接被GC回收
Grok
_GrokImageStore
(内存存储)
grok_img_{uuid12}
Base64字符串将存储的Base64作为
image_url
数据URI传入

Store Pattern Template

存储模式模板

python
class _NewBackendStore:
    def __init__(self, max_items: int = 50):
        self._store: OrderedDict[str, Any] = OrderedDict()
        self._max = max_items

    def save(self, data: Any) -> str:
        store_id = f"prefix_{uuid.uuid4().hex[:12]}"
        if len(self._store) >= self._max:
            self._store.popitem(last=False)  # LRU eviction
        self._store[store_id] = data
        return store_id

    def get(self, store_id: str) -> Any | None:
        return self._store.get(store_id)

_store = _NewBackendStore()
python
class _NewBackendStore:
    def __init__(self, max_items: int = 50):
        self._store: OrderedDict[str, Any] = OrderedDict()
        self._max = max_items

    def save(self, data: Any) -> str:
        store_id = f"prefix_{uuid.uuid4().hex[:12]}"
        if len(self._store) >= self._max:
            self._store.popitem(last=False)  # LRU eviction
        self._store[store_id] = data
        return store_id

    def get(self, store_id: str) -> Any | None:
        return self._store.get(store_id)

_store = _NewBackendStore()

Common Pitfalls

常见问题

  1. Missing from priority list — Backend works when explicitly specified but never auto-selected
  2. Sync vs async — Some SDKs are sync-only; wrap in
    asyncio.to_thread()
    if needed
  3. Ephemeral URLs — Some APIs return temporary URLs; always prefer base64 or download immediately
  4. Falsy duration
    duration or default
    treats
    0
    as falsy; use
    if duration is not None
  5. Existing test breakage — Adding to priority list changes auto-selection; update existing tests that clear env vars
  6. Image-to-image gating — The
    _generate_single_with_input_images
    function has a backend allowlist
  1. 优先级列表缺失 —— 手动指定后端时可正常运行,但永远不会被自动选中
  2. 同步/异步适配问题 —— 部分SDK仅支持同步调用;必要时使用
    asyncio.to_thread()
    包裹
  3. 临时URL问题 —— 部分API返回临时链接;优先使用Base64或者立即下载资源
  4. 时长假值问题 ——
    duration or default
    写法会把
    0
    识别为假值;请使用
    if duration is not None
    判断
  5. 现有测试损坏 —— 向优先级列表添加新后端会改变自动选择逻辑;请更新所有清空环境变量的现有测试用例
  6. 图生图权限校验 ——
    _generate_single_with_input_images
    函数存在后端白名单限制

Reference Files

参考文件

FilePurpose
massgen/tool/_multimodal_tools/generation/_base.py
API keys, default models, priorities
massgen/tool/_multimodal_tools/generation/_selector.py
Backend auto-selection logic
massgen/tool/_multimodal_tools/generation/_image.py
Image generation backends
massgen/tool/_multimodal_tools/generation/_video.py
Video generation backends
massgen/tool/_multimodal_tools/generation/_audio.py
Audio generation backends
massgen/tool/_multimodal_tools/generation/generate_media.py
Entry point and routing
massgen/tool/_multimodal_tools/TOOL.md
User-facing documentation
massgen/tests/test_grok_multimedia_generation.py
Reference: Grok backend tests
massgen/tests/test_grok_multimedia_backend_selection.py
Reference: Grok selection tests
massgen/tests/test_multimodal_image_backend_selection.py
Reference: image selection tests
文件路径用途
massgen/tool/_multimodal_tools/generation/_base.py
API key、默认模型、优先级配置
massgen/tool/_multimodal_tools/generation/_selector.py
后端自动选择逻辑
massgen/tool/_multimodal_tools/generation/_image.py
图片生成后端实现
massgen/tool/_multimodal_tools/generation/_video.py
视频生成后端实现
massgen/tool/_multimodal_tools/generation/_audio.py
音频生成后端实现
massgen/tool/_multimodal_tools/generation/generate_media.py
入口点与路由逻辑
massgen/tool/_multimodal_tools/TOOL.md
面向用户的文档
massgen/tests/test_grok_multimedia_generation.py
参考:Grok后端测试用例
massgen/tests/test_grok_multimedia_backend_selection.py
参考:Grok后端选择测试用例
massgen/tests/test_multimodal_image_backend_selection.py
参考:图片后端选择测试用例