massgen-develops-massgen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MassGen Develops MassGen

用MassGen开发MassGen

This skill provides guidance for using MassGen to develop and improve itself. Choose the appropriate workflow based on what you're testing.
本Skill提供使用MassGen进行自我开发与改进的指导。请根据测试需求选择合适的工作流。

Two Workflows

两种工作流

  1. Automation Mode - Test backend functionality, coordination logic, agent responses
  2. Visual Evaluation - Test terminal display, colors, layout, UX

  1. Automation Mode(自动化模式) - 测试后端功能、协调逻辑、Agent响应
  2. Visual Evaluation(视觉评估) - 测试终端显示、颜色、布局、UX

Workflow 1: Automation Mode

工作流1:Automation Mode

Use this to test functionality without visual inspection. Ideal for programmatic testing.
无需视觉检查即可测试功能,非常适合程序化测试。

Running MassGen with Automation

以自动化模式运行MassGen

Run MassGen in the background (exact mechanism depends on your tooling):
bash
uv run massgen --automation --config massgen/configs/basic/multi/two_agents_gemini.yaml "What is 2+2?"
For MassGen agents: Use
start_background_shell
MCP tool. For Claude Code: Use Bash tool's
run_in_background
parameter.
在后台运行MassGen(具体机制取决于你的工具链):
bash
uv run massgen --automation --config massgen/configs/basic/multi/two_agents_gemini.yaml "What is 2+2?"
对于MassGen Agent:使用
start_background_shell
MCP工具。 对于Claude Code:使用Bash工具的
run_in_background
参数。

Why Automation Mode

为什么选择自动化模式

FeatureBenefit
Clean output~10 parseable lines vs 3,000+ ANSI codes
LOG_DIR printedFirst line shows log directory path
status.jsonReal-time monitoring file
Exit codes0=success, 1=config, 2=execution, 3=timeout, 4=interrupted
Workspace isolationSafe parallel execution
特性优势
简洁输出约10行可解析内容,而非3000+行ANSI代码
打印LOG_DIR第一行显示日志目录路径
status.json实时监控文件
退出码0=成功,1=配置错误,2=执行错误,3=超时,4=中断
工作区隔离支持安全的并行执行

Expected Output

预期输出

LOG_DIR: .massgen/massgen_logs/log_20251120_143022_123456
STATUS: .massgen/massgen_logs/log_20251120_143022_123456/status.json

🤖 Multi-Agent Mode
Agents: gemini-2.5-pro1, gemini-2.5-pro2
Question: What is 2+2?

============================================================
QUESTION: What is 2+2?
[Coordination in progress - monitor status.json for real-time updates]

WINNER: gemini-2.5-pro1
DURATION: 33.4s
ANSWER_PREVIEW: The answer is 4.

COMPLETED: 2 agents, 35.2s total
Parse
LOG_DIR
from the first line to find the log directory.
LOG_DIR: .massgen/massgen_logs/log_20251120_143022_123456
STATUS: .massgen/massgen_logs/log_20251120_143022_123456/status.json

🤖 Multi-Agent Mode
Agents: gemini-2.5-pro1, gemini-2.5-pro2
Question: What is 2+2?

============================================================
QUESTION: What is 2+2?
[Coordination in progress - monitor status.json for real-time updates]

WINNER: gemini-2.5-pro1
DURATION: 33.4s
ANSWER_PREVIEW: The answer is 4.

COMPLETED: 2 agents, 35.2s total
从第一行解析
LOG_DIR
以找到日志目录。

Monitoring Progress

监控进度

Read the status.json file (updated every 2 seconds):
bash
cat .massgen/massgen_logs/log_20251120_143022_123456/status.json
Key fields:
json
{
  "coordination": {
    "completion_percentage": 65,
    "phase": "enforcement"
  },
  "results": {
    "winner": null  // null = running, "agent_id" = done
  },
  "agents": {
    "agent_a": {
      "status": "streaming",
      "error": null
    }
  }
}
Agent status values:
waiting
,
streaming
,
answered
,
voted
,
completed
,
error
读取status.json文件(每2秒更新一次):
bash
cat .massgen/massgen_logs/log_20251120_143022_123456/status.json
关键字段:
json
{
  "coordination": {
    "completion_percentage": 65,
    "phase": "enforcement"
  },
  "results": {
    "winner": null  // null = 运行中, "agent_id" = 完成
  },
  "agents": {
    "agent_a": {
      "status": "streaming",
      "error": null
    }
  }
}
Agent状态值:
waiting
,
streaming
,
answered
,
voted
,
completed
,
error

Reading Results

读取结果

After completion (exit code 0):
bash
undefined
完成后(退出码0):
bash
undefined

Read final answer

读取最终答案

cat [log_dir]/final/[winner]/answer.txt
undefined
cat [log_dir]/final/[winner]/answer.txt
undefined

Timing Expectations

时间预期

  • Standard tasks: 2-10 minutes
  • Complex/meta tasks: 10-30 minutes
  • Check if stuck: Read status.json - if
    completion_percentage
    increases, it's working
  • 标准任务:2-10分钟
  • 复杂/元任务:10-30分钟
  • 检查是否卡住:读取status.json - 如果
    completion_percentage
    持续增加,则系统仍在运行

Advanced: Multiple Background Monitors

进阶:多后台监控

You can create multiple background monitoring tasks that run independently alongside the main MassGen process. Each monitor can track different aspects and write to separate log files for later inspection.
你可以创建多个独立的后台监控任务,与主MassGen进程并行运行。每个监控任务可以跟踪不同的方面,并将数据写入单独的日志文件,供后续检查。

Approach

实现方法

Create small Python scripts that run in background shells. Each script:
  • Monitors a specific aspect (tokens, errors, progress, coordination, etc.)
  • Writes timestamped data to its own log file
  • Runs in a loop with
    sleep()
    intervals
  • Can be checked anytime without blocking the main task
创建小型Python脚本并在后台shell中运行。每个脚本:
  • 监控特定方面(令牌使用、错误、进度、协调等)
  • 将带时间戳的数据写入专属日志文件
  • 循环运行并包含
    sleep()
    间隔
  • 可随时检查,不会阻塞主任务

Example Monitor Scripts

监控脚本示例

Token Usage Monitor (
token_monitor.py
):
python
import json, time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])  # Pass LOG_DIR as argument
while True:
    if (log_dir / "status.json").exists():
        with open(log_dir / "status.json") as f:
            data = json.load(f)
        with open("token_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            log.write(f"Tokens: {data.get('total_tokens_used', 0)}\n")
            log.write(f"Cost: ${data.get('total_cost', 0):.4f}\n\n")
    time.sleep(5)
Error Monitor (
error_monitor.py
):
python
import time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])
while True:
    if log_dir.exists():
        with open("error_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            errors = []
            for logfile in log_dir.glob("*.log"):
                with open(logfile) as f:
                    for line in f:
                        if any(x in line.lower() for x in ['error', 'warning', 'failed']):
                            errors.append(line.strip())
            log.write('\n'.join(errors[-5:]) if errors else "No errors\n")
            log.write("\n")
    time.sleep(5)
Progress Monitor (
progress_monitor.py
):
python
import json, time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])
while True:
    if (log_dir / "status.json").exists():
        with open(log_dir / "status.json") as f:
            data = json.load(f)
        with open("progress_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            progress = data.get('completion_percentage', 0)
            active = sum(1 for a in data.get('agents', {}).values()
                        if a.get('status') == 'active')
            log.write(f"Progress: {progress}% Active agents: {active}\n\n")
    time.sleep(5)
Coordination Monitor (
coordination_monitor.py
):
python
import json, time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])
while True:
    if (log_dir / "status.json").exists():
        with open(log_dir / "status.json") as f:
            data = json.load(f)
        coord = data.get('coordination', {})
        with open("coordination_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            log.write(f"Phase: {coord.get('phase', 'unknown')}\n")
            log.write(f"Round: {coord.get('round', 0)}\n")
            log.write(f"Total answers: {coord.get('total_answers', 0)}\n\n")
    time.sleep(5)
令牌使用监控 (
token_monitor.py
):
python
import json, time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])  # 将LOG_DIR作为参数传入
while True:
    if (log_dir / "status.json").exists():
        with open(log_dir / "status.json") as f:
            data = json.load(f)
        with open("token_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            log.write(f"Tokens: {data.get('total_tokens_used', 0)}\n")
            log.write(f"Cost: ${data.get('total_cost', 0):.4f}\n\n")
    time.sleep(5)
错误监控 (
error_monitor.py
):
python
import time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])
while True:
    if log_dir.exists():
        with open("error_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            errors = []
            for logfile in log_dir.glob("*.log"):
                with open(logfile) as f:
                    for line in f:
                        if any(x in line.lower() for x in ['error', 'warning', 'failed']):
                            errors.append(line.strip())
            log.write('\n'.join(errors[-5:]) if errors else "No errors\n")
            log.write("\n")
    time.sleep(5)
进度监控 (
progress_monitor.py
):
python
import json, time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])
while True:
    if (log_dir / "status.json").exists():
        with open(log_dir / "status.json") as f:
            data = json.load(f)
        with open("progress_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            progress = data.get('completion_percentage', 0)
            active = sum(1 for a in data.get('agents', {}).values()
                        if a.get('status') == 'active')
            log.write(f"Progress: {progress}% Active agents: {active}\n\n")
    time.sleep(5)
协调监控 (
coordination_monitor.py
):
python
import json, time, sys
from pathlib import Path

log_dir = Path(sys.argv[1])
while True:
    if (log_dir / "status.json").exists():
        with open(log_dir / "status.json") as f:
            data = json.load(f)
        coord = data.get('coordination', {})
        with open("coordination_monitor.log", "a") as log:
            log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
            log.write(f"Phase: {coord.get('phase', 'unknown')}\n")
            log.write(f"Round: {coord.get('round', 0)}\n")
            log.write(f"Total answers: {coord.get('total_answers', 0)}\n\n")
    time.sleep(5)

Workflow

工作流

  1. Launch main task, parse the LOG_DIR from output
  2. Create monitor scripts as needed (write Python files)
  3. Launch monitors in background shells:
    python3 token_monitor.py [LOG_DIR] &
  4. Check monitor logs anytime by reading the .log files
  5. When complete, kill monitor processes and analyze logs
  1. 启动主任务,从输出中解析LOG_DIR
  2. 创建监控脚本(按需编写Python文件)
  3. 在后台shell中启动监控
    python3 token_monitor.py [LOG_DIR] &
  4. 随时检查监控日志,读取.log文件即可
  5. 任务完成后,终止监控进程并分析日志

Custom Monitors

自定义监控

Create monitors for any metric you want to track:
  • Model-specific performance metrics
  • Memory/context usage patterns
  • Real-time cost accumulation
  • Answer quality trends
  • Agent coordination patterns
  • Specific error categories
Benefits:
  • Non-blocking inspection of specific metrics on demand
  • Historical data captured for post-run analysis
  • Independent monitoring streams for different aspects
  • Easy to add new monitors without modifying configs

你可以创建监控脚本来跟踪任何你关注的指标:
  • 模型特定的性能指标
  • 内存/上下文使用模式
  • 实时成本累积
  • 答案质量趋势
  • Agent协调模式
  • 特定错误类别
优势:
  • 可按需非阻塞地检查特定指标
  • 捕获历史数据用于事后分析
  • 为不同方面提供独立的监控流
  • 无需修改配置即可轻松添加新监控

Workflow 2: Visual Evaluation

工作流2:视觉评估

Use this to analyze and improve MassGen's terminal display quality. Requires tools from
custom_tools/_multimodal_tools/
.
Important: This workflow records the rich terminal display, so the actual recording does NOT use
--automation
mode. However, you should ALWAYS pre-test with
--automation
first.
用于分析和改进MassGen的终端显示质量。需要使用
custom_tools/_multimodal_tools/
中的工具。
重要提示:此工作流会记录丰富的终端显示内容,因此实际录制不使用
--automation
模式。但你应始终先用
--automation
模式进行预测试

Prerequisites

前提条件

You should have these tools available in your workspace:
  • run_massgen_with_recording
    - Records terminal sessions as video
  • understand_video
    - Analyzes video frames with GPT-4.1 vision
你的工作区应具备以下工具:
  • run_massgen_with_recording
    - 将终端会话录制为视频
  • understand_video
    - 使用GPT-4.1视觉能力分析视频帧

Step 0: Pre-Test with Automation (REQUIRED)

步骤0:用自动化模式预测试(必填)

Before recording the video, verify the config works and API keys are valid:
bash
undefined
在录制视频前,验证配置可用且API密钥有效:
bash
undefined

Start with --automation to verify everything works

先使用--automation模式验证所有功能正常

uv run massgen --automation --config [config_path] "[question]"

**Wait 30-60 seconds** (enough to verify API keys, config parsing, tool initialization), then kill the process.

**Why this is critical:**
- Detects config errors before wasting recording time
- Validates API keys are present and working
- Ensures tools initialize correctly
- Prevents recording a broken session

**If the automation test fails**, fix the issues before proceeding to recording.
uv run massgen --automation --config [config_path] "[question]"

**等待30-60秒**(足够验证API密钥、配置解析、工具初始化),然后终止进程。

**为什么这一步至关重要:**
- 在浪费录制时间前检测配置错误
- 验证API密钥是否存在且可用
- 确保工具初始化正常
- 避免录制失败的会话

**如果自动化测试失败**,请先修复问题,再进行录制。

Step 1: Record a MassGen Session

步骤1:录制MassGen会话

After the automation pre-test succeeds, record the visual session:
python
from custom_tools._multimodal_tools.run_massgen_with_recording import run_massgen_with_recording

result = await run_massgen_with_recording(
    config_path="massgen/configs/basic/multi/two_agents_gemini.yaml",
    question="What is 2+2?",
    output_format="mp4",  # ALWAYS use mp4 for maximum compatibility
    timeout_seconds=120,
    width=1920,
    height=1080
)
Format recommendation: Always use
"mp4"
for maximum compatibility. GIF and WebM are supported but MP4 is preferred.
The recording captures: Rich terminal display with colors, status indicators, coordination visualization (WITHOUT --automation flag).
自动化预测试成功后,录制可视化会话:
python
from custom_tools._multimodal_tools.run_massgen_with_recording import run_massgen_with_recording

result = await run_massgen_with_recording(
    config_path="massgen/configs/basic/multi/two_agents_gemini.yaml",
    question="What is 2+2?",
    output_format="mp4",  # 始终使用mp4以获得最大兼容性
    timeout_seconds=120,
    width=1920,
    height=1080
)
格式建议:始终使用
"mp4"
以获得最大兼容性。支持GIF和WebM,但MP4是首选。
录制内容:包含颜色、状态指示器、协调可视化效果的丰富终端显示(不使用--automation标志)。

Step 2: Analyze the Recording

步骤2:分析录制视频

Use
understand_video
to analyze the MP4 recording.
Call it at least once, but as many as multiple times to analyze different aspects:
python
from custom_tools._multimodal_tools.understand_video import understand_video
使用
understand_video
分析MP4录制文件
。至少调用一次,也可多次调用以分析不同方面:
python
from custom_tools._multimodal_tools.understand_video import understand_video

Overall UX evaluation

整体UX评估

ux_eval = await understand_video( video_path=result["video_path"], # The MP4 file from Step 1 prompt="Evaluate the overall terminal display quality, clarity, and usability", num_frames=12 )
ux_eval = await understand_video( video_path=result["video_path"], # 步骤1生成的MP4文件路径 prompt="Evaluate the overall terminal display quality, clarity, and usability", num_frames=12 )

Focused on coordination

聚焦协调机制

coordination_eval = await understand_video( video_path=result["video_path"], prompt="How clearly does the display show agent coordination phases and voting?", num_frames=8 )
coordination_eval = await understand_video( video_path=result["video_path"], prompt="How clearly does the display show agent coordination phases and voting?", num_frames=8 )

Status indicators

状态指示器评估

status_eval = await understand_video( video_path=result["video_path"], prompt="Are status indicators (streaming, answered, voted) clear and visually distinct?", num_frames=8 )

**Key points:**
- The recording tool saves the video to workspace - use that path for analysis
- You can call `understand_video` multiple times on the same video with different prompts
- Each call focuses on a specific aspect (UX, coordination, status, colors, etc.)
status_eval = await understand_video( video_path=result["video_path"], prompt="Are status indicators (streaming, answered, voted) clear and visually distinct?", num_frames=8 )

**关键点:**
- 录制工具会将视频保存到工作区 - 使用该路径进行分析
- 你可以对同一视频多次调用`understand_video`,使用不同的提示词
- 每次调用聚焦一个特定方面(UX、协调、状态、颜色等)

Evaluation Criteria

评估标准

When analyzing terminal displays, assess:
  1. Visual Clarity - Contrast, colors, font rendering, ANSI handling, spacing
  2. Information Organization - Layout, content density, streaming display, scroll handling
  3. Status Indicators - Agent states, progress tracking, phase transitions, winner selection
  4. User Experience - Real-time feedback, error visibility, cognitive load, information hierarchy
分析终端显示时,需评估以下方面:
  1. 视觉清晰度 - 对比度、颜色、字体渲染、ANSI代码处理、间距
  2. 信息组织 - 布局、内容密度、流式显示、滚动处理
  3. 状态指示器 - Agent状态、进度跟踪、阶段转换、获胜者选择
  4. 用户体验 - 实时反馈、错误可见性、认知负荷、信息层级

Output Format Recommendations

输出格式建议

Default to MP4 - Maximum compatibility and quality.
FormatUse CaseNotes
MP4Default - use for everythingBest quality, universally supported, ideal for detailed analysis
GIFSmaller file size, easy embeddingLower quality, larger files than expected, avoid unless size-constrained
WebMModern web publishingGood quality, not universally supported
Rule of thumb: Use MP4 unless you have a specific reason not to.
默认使用MP4 - 兼容性和质量最佳。
格式使用场景说明
MP4默认 - 所有场景通用质量最佳,支持所有平台,适合详细分析
GIF文件体积较小,易于嵌入质量较低,文件体积可能超出预期,除非受体积限制否则避免使用
WebM现代网页发布质量良好,但并非所有平台都支持
经验法则:除非有特殊原因,否则一律使用MP4。

Frame Count Guidelines

帧数量指南

FramesUse Case
4-8Quick evaluation
8-12Standard evaluation
12-16+Detailed analysis

帧数使用场景
4-8快速评估
8-12标准评估
12-16+详细分析

Which Configs to Test

测试哪些配置

Model Selection Guidelines

模型选择指南

Default to mid-tier models when generating configs or running experiments. These provide the best balance of cost, speed, and capability for development and testing.
CRITICAL: Always check model recency based on TODAY'S DATE. Models older than 6-12 months should be considered outdated.
生成配置或运行实验时,默认选择中端模型。这些模型在成本、速度和能力之间达到最佳平衡,适合开发和测试。
关键提示:始终根据当前日期检查模型的时效性。发布超过6-12个月的模型应视为过时。

How to Select Models

模型选择步骤

Step 1: Read backend files to check release dates
bash
undefined
步骤1:读取后端文件查看发布日期
bash
undefined

Check Gemini models and their release dates

查看Gemini模型及其发布日期

grep -A 5 "model.*2." massgen/backend/gemini.py
grep -A 5 "model.*2." massgen/backend/gemini.py

Check OpenAI models and their release dates

查看OpenAI模型及其发布日期

grep -A 5 "model.*gpt" massgen/backend/openai.py
grep -A 5 "model.*gpt" massgen/backend/openai.py

Check Claude models and their release dates

查看Claude模型及其发布日期

grep -A 5 "model.*claude" massgen/backend/claude.py

**Step 2: Check token costs**
```bash
cat docs/source/reference/token_budget.rst | grep -A 3 "gemini\|gpt\|claude"
Step 3: Compare release dates against today's date
  • Calculate months since release: (today's year-month) - (model release year-month)
  • If > 12 months: Model is outdated
  • If 6-12 months: Model is aging, prefer newer if available
  • If < 6 months: Model is current
grep -A 5 "model.*claude" massgen/backend/claude.py

**步骤2:检查令牌成本**
```bash
cat docs/source/reference/token_budget.rst | grep -A 3 "gemini\|gpt\|claude"
步骤3:对比发布日期与当前日期
  • 计算发布至今的月数:(当前年份-月份) - (模型发布年份-月份)
  • 若>12个月:模型已过时
  • 若6-12个月:模型已老化,优先选择更新的模型
  • 若<6个月:模型为当前主流

Model Selection Examples

模型选择示例

✅ GOOD (Recent, mid-tier patterns):
  • Gemini:
    gemini-2.5-pro
    ,
    gemini-2.5-flash
    (2.x series, 2025)
  • OpenAI:
    gpt-5-mini
    ,
    gpt-4o-mini
    (GPT-5 generation)
  • Claude:
    claude-sonnet-4-*
    (4.x series, 2025)
⚠️ BAD (Outdated patterns - check dates!):
  • gpt-4o
    (2024 release - likely >12 months old)
  • gpt-4-turbo
    (2023-2024 era)
  • gemini-1.5-pro
    (1.x series deprecated by 2.x)
  • claude-3.5-sonnet
    (3.x series when 4.x exists)
Selection criteria:
  • Recency: Released within last 6-12 months (ALWAYS check backend files for dates)
  • Mid-range pricing: Not top-tier (expensive) or bottom-tier (cheap)
  • General availability: Stable release, not experimental/preview/alpha
  • Version numbers: Higher major versions are newer (gemini-2.x > gemini-1.x, gpt-5 > gpt-4, claude-4 > claude-3)
When to deviate:
  • Premium models: Testing model ceiling capabilities (e.g.,
    gpt-5
    ,
    claude-opus-4
    ,
    gemini-3-pro
    )
  • Budget models: Cost optimization experiments (e.g.,
    gpt-5-mini
    ,
    gemini-2.5-flash
    )
  • Legacy testing: Validating backwards compatibility with older models
✅ 推荐(近期中端模型):
  • Gemini:
    gemini-2.5-pro
    ,
    gemini-2.5-flash
    (2.x系列,2025年发布)
  • OpenAI:
    gpt-5-mini
    ,
    gpt-4o-mini
    (GPT-5代)
  • Claude:
    claude-sonnet-4-*
    (4.x系列,2025年发布)
⚠️ 不推荐(过时模型 - 请检查日期!):
  • gpt-4o
    (2024年发布 - 可能已超过12个月)
  • gpt-4-turbo
    (2023-2024年)
  • gemini-1.5-pro
    (1.x系列已被2.x系列取代)
  • claude-3.5-sonnet
    (4.x系列已发布,3.x系列过时)
选择标准:
  • 时效性:发布时间不超过6-12个月(始终查看后端文件确认日期)
  • 中端定价:非顶级(昂贵)或低端(廉价)模型
  • 通用可用性:稳定发布版本,非实验/预览/alpha版本
  • 版本号:主版本号越高越新(gemini-2.x > gemini-1.x, gpt-5 > gpt-4, claude-4 > claude-3)
何时可以例外:
  • 高端模型:测试模型的上限能力(例如
    gpt-5
    ,
    claude-opus-4
    ,
    gemini-3-pro
  • 经济型模型:成本优化实验(例如
    gpt-5-mini
    ,
    gemini-2.5-flash
  • 遗留测试:验证与旧模型的向后兼容性

Generating a Config (Agent-Friendly)

生成配置(Agent友好型)

Use
--generate-config
for programmatic config generation:
bash
undefined
使用
--generate-config
以编程方式生成配置:
bash
undefined

WORKFLOW:

工作流:

1. Read backend file to find recent mid-tier models

1. 读取后端文件查找近期中端模型

2. Verify release date (< 12 months old)

2. 验证发布日期(<12个月)

3. Check pricing tier (mid-range)

3. 检查定价层级(中端)

4. Use model in --config-model flag

4. 在--config-model参数中使用该模型

Example: Generate 2-agent config

示例:生成双Agent配置

massgen --generate-config ./test_config.yaml
--config-backend gemini
--config-model gemini-2.5-pro \ # (example - always verify this is current!) --config-agents 2
--config-docker
massgen --generate-config ./test_config.yaml
--config-backend gemini
--config-model gemini-2.5-pro \ # (示例 - 请始终验证该模型是否为当前主流!) --config-agents 2
--config-docker

With context path

带上下文路径的情况

massgen --generate-config ./test_config.yaml
--config-backend openai
--config-model gpt-5-mini \ # (example - always verify this is current!) --config-context-path /path/to/project

**IMPORTANT**: The model names shown above are EXAMPLES. Always check backend files for current models based on today's date.

This creates a full-featured config with code-based tools, skills, and task planning enabled.
massgen --generate-config ./test_config.yaml
--config-backend openai
--config-model gpt-5-mini \ # (示例 - 请始终验证该模型是否为当前主流!) --config-context-path /path/to/project

**重要提示**:以上模型名称仅为示例。请始终根据当前日期查看后端文件以确认当前主流模型。

此命令会生成一个全功能配置,包含代码工具、Skill和任务规划功能。

Testing Specific Features

测试特定功能

Modify the generated config to enable/disable features:
Code execution:
yaml
agents:
  - backend:
      enable_mcp_command_line: true
      command_line_execution_mode: "docker"
Custom tools:
yaml
agents:
  - backend:
      enable_code_based_tools: true
      auto_discover_custom_tools: true
Different models per agent:
yaml
agents:
  - backend: {type: "gemini", model: "gemini-2.5-pro"}
  - backend: {type: "openai", model: "gpt-5-mini"}
Common parameters:
enable_code_based_tools
,
enable_mcp_command_line
,
command_line_execution_mode
,
auto_discover_custom_tools
,
timeout_settings

修改生成的配置以启用/禁用功能:
代码执行:
yaml
agents:
  - backend:
      enable_mcp_command_line: true
      command_line_execution_mode: "docker"
自定义工具:
yaml
agents:
  - backend:
      enable_code_based_tools: true
      auto_discover_custom_tools: true
为每个Agent配置不同模型:
yaml
agents:
  - backend: {type: "gemini", model: "gemini-2.5-pro"}
  - backend: {type: "openai", model: "gpt-5-mini"}
常用参数:
enable_code_based_tools
,
enable_mcp_command_line
,
command_line_execution_mode
,
auto_discover_custom_tools
,
timeout_settings

Docker Considerations

Docker相关注意事项

Automatic Docker Detection

自动Docker检测

MassGen automatically detects when running inside a Docker container. If a config has
command_line_execution_mode: "docker"
, MassGen will:
  1. Detect the container environment (via
    /.dockerenv
    )
  2. Automatically switch to
    "local"
    execution mode
  3. Log: "Already running inside Docker container - switching to local execution mode"
Why this works: The outer container already provides isolation. Running "locally" within that container is safe and sandboxed.
No manual configuration needed - configs with Docker mode just work when run inside containers.
MassGen会自动检测是否在Docker容器内运行。如果配置中包含
command_line_execution_mode: "docker"
,MassGen将:
  1. 检测容器环境(通过
    /.dockerenv
    文件)
  2. 自动切换为
    "local"
    执行模式
  3. 记录日志:"Already running inside Docker container - switching to local execution mode"
为什么这样设计:外层容器已提供隔离环境。在容器内以“本地”模式运行是安全且沙箱化的。
无需手动配置 - 带有Docker模式的配置在容器内运行时会自动适配。

Tradeoffs

权衡

When auto-switching to local execution:
  • ✅ Still sandboxed from host
  • ✅ All features work (VHS, MassGen, tools are in container)
  • ⚠️ No per-execution isolation between tool calls
  • ⚠️ State persists within container session

自动切换为本地执行模式时:
  • ✅ 仍与主机系统隔离
  • ✅ 所有功能正常(VHS、MassGen、工具均在容器内)
  • ⚠️ 工具调用之间无每次执行的隔离
  • ⚠️ 状态会在容器会话内持续保留

Reference Files

参考文件

  • Status file docs:
    docs/source/reference/status_file.rst
  • Terminal evaluation docs:
    docs/source/user_guide/terminal_evaluation.rst
  • Example configs:
    massgen/configs/basic/
    ,
    massgen/configs/meta/
  • Recording tool:
    massgen/tool/_multimodal_tools/run_massgen_with_recording.py
  • Video analysis tool:
    massgen/tool/_multimodal_tools/understand_video.py
  • 状态文件文档
    docs/source/reference/status_file.rst
  • 终端评估文档
    docs/source/user_guide/terminal_evaluation.rst
  • 示例配置
    massgen/configs/basic/
    ,
    massgen/configs/meta/
  • 录制工具
    massgen/tool/_multimodal_tools/run_massgen_with_recording.py
  • 视频分析工具
    massgen/tool/_multimodal_tools/understand_video.py