Loading...
Loading...
Guide for using MassGen to develop and improve itself. This skill should be used when agents need to run MassGen experiments programmatically (using automation mode) OR analyze terminal UI/UX quality (using visual evaluation tools). These are mutually exclusive workflows for different improvement goals.
npx skill4agent add massgen/massgen massgen-develops-massgenuv run massgen --automation --config massgen/configs/basic/multi/two_agents_gemini.yaml "What is 2+2?"start_background_shellrun_in_background| Feature | Benefit |
|---|---|
| Clean output | ~10 parseable lines vs 3,000+ ANSI codes |
| LOG_DIR printed | First line shows log directory path |
| status.json | Real-time monitoring file |
| Exit codes | 0=success, 1=config, 2=execution, 3=timeout, 4=interrupted |
| Workspace isolation | Safe parallel execution |
LOG_DIR: .massgen/massgen_logs/log_20251120_143022_123456
STATUS: .massgen/massgen_logs/log_20251120_143022_123456/status.json
🤖 Multi-Agent Mode
Agents: gemini-2.5-pro1, gemini-2.5-pro2
Question: What is 2+2?
============================================================
QUESTION: What is 2+2?
[Coordination in progress - monitor status.json for real-time updates]
WINNER: gemini-2.5-pro1
DURATION: 33.4s
ANSWER_PREVIEW: The answer is 4.
COMPLETED: 2 agents, 35.2s totalLOG_DIRcat .massgen/massgen_logs/log_20251120_143022_123456/status.json{
"coordination": {
"completion_percentage": 65,
"phase": "enforcement"
},
"results": {
"winner": null // null = running, "agent_id" = done
},
"agents": {
"agent_a": {
"status": "streaming",
"error": null
}
}
}waitingstreamingansweredvotedcompletederror# Read final answer
cat [log_dir]/final/[winner]/answer.txtcompletion_percentagesleep()token_monitor.pyimport json, time, sys
from pathlib import Path
log_dir = Path(sys.argv[1]) # Pass LOG_DIR as argument
while True:
if (log_dir / "status.json").exists():
with open(log_dir / "status.json") as f:
data = json.load(f)
with open("token_monitor.log", "a") as log:
log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
log.write(f"Tokens: {data.get('total_tokens_used', 0)}\n")
log.write(f"Cost: ${data.get('total_cost', 0):.4f}\n\n")
time.sleep(5)error_monitor.pyimport time, sys
from pathlib import Path
log_dir = Path(sys.argv[1])
while True:
if log_dir.exists():
with open("error_monitor.log", "a") as log:
log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
errors = []
for logfile in log_dir.glob("*.log"):
with open(logfile) as f:
for line in f:
if any(x in line.lower() for x in ['error', 'warning', 'failed']):
errors.append(line.strip())
log.write('\n'.join(errors[-5:]) if errors else "No errors\n")
log.write("\n")
time.sleep(5)progress_monitor.pyimport json, time, sys
from pathlib import Path
log_dir = Path(sys.argv[1])
while True:
if (log_dir / "status.json").exists():
with open(log_dir / "status.json") as f:
data = json.load(f)
with open("progress_monitor.log", "a") as log:
log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
progress = data.get('completion_percentage', 0)
active = sum(1 for a in data.get('agents', {}).values()
if a.get('status') == 'active')
log.write(f"Progress: {progress}% Active agents: {active}\n\n")
time.sleep(5)coordination_monitor.pyimport json, time, sys
from pathlib import Path
log_dir = Path(sys.argv[1])
while True:
if (log_dir / "status.json").exists():
with open(log_dir / "status.json") as f:
data = json.load(f)
coord = data.get('coordination', {})
with open("coordination_monitor.log", "a") as log:
log.write(f"=== {time.strftime('%H:%M:%S')} ===\n")
log.write(f"Phase: {coord.get('phase', 'unknown')}\n")
log.write(f"Round: {coord.get('round', 0)}\n")
log.write(f"Total answers: {coord.get('total_answers', 0)}\n\n")
time.sleep(5)python3 token_monitor.py [LOG_DIR] &custom_tools/_multimodal_tools/--automation--automationrun_massgen_with_recordingunderstand_video# Start with --automation to verify everything works
uv run massgen --automation --config [config_path] "[question]"from custom_tools._multimodal_tools.run_massgen_with_recording import run_massgen_with_recording
result = await run_massgen_with_recording(
config_path="massgen/configs/basic/multi/two_agents_gemini.yaml",
question="What is 2+2?",
output_format="mp4", # ALWAYS use mp4 for maximum compatibility
timeout_seconds=120,
width=1920,
height=1080
)"mp4"understand_videofrom custom_tools._multimodal_tools.understand_video import understand_video
# Overall UX evaluation
ux_eval = await understand_video(
video_path=result["video_path"], # The MP4 file from Step 1
prompt="Evaluate the overall terminal display quality, clarity, and usability",
num_frames=12
)
# Focused on coordination
coordination_eval = await understand_video(
video_path=result["video_path"],
prompt="How clearly does the display show agent coordination phases and voting?",
num_frames=8
)
# Status indicators
status_eval = await understand_video(
video_path=result["video_path"],
prompt="Are status indicators (streaming, answered, voted) clear and visually distinct?",
num_frames=8
)understand_video| Format | Use Case | Notes |
|---|---|---|
| MP4 | Default - use for everything | Best quality, universally supported, ideal for detailed analysis |
| GIF | Smaller file size, easy embedding | Lower quality, larger files than expected, avoid unless size-constrained |
| WebM | Modern web publishing | Good quality, not universally supported |
| Frames | Use Case |
|---|---|
| 4-8 | Quick evaluation |
| 8-12 | Standard evaluation |
| 12-16+ | Detailed analysis |
# Check Gemini models and their release dates
grep -A 5 "model.*2\." massgen/backend/gemini.py
# Check OpenAI models and their release dates
grep -A 5 "model.*gpt" massgen/backend/openai.py
# Check Claude models and their release dates
grep -A 5 "model.*claude" massgen/backend/claude.pycat docs/source/reference/token_budget.rst | grep -A 3 "gemini\|gpt\|claude"gemini-2.5-progemini-2.5-flashgpt-5-minigpt-4o-miniclaude-sonnet-4-*gpt-4ogpt-4-turbogemini-1.5-proclaude-3.5-sonnetgpt-5claude-opus-4gemini-3-progpt-5-minigemini-2.5-flash--generate-config# WORKFLOW:
# 1. Read backend file to find recent mid-tier models
# 2. Verify release date (< 12 months old)
# 3. Check pricing tier (mid-range)
# 4. Use model in --config-model flag
# Example: Generate 2-agent config
massgen --generate-config ./test_config.yaml \
--config-backend gemini \
--config-model gemini-2.5-pro \ # (example - always verify this is current!)
--config-agents 2 \
--config-docker
# With context path
massgen --generate-config ./test_config.yaml \
--config-backend openai \
--config-model gpt-5-mini \ # (example - always verify this is current!)
--config-context-path /path/to/projectagents:
- backend:
enable_mcp_command_line: true
command_line_execution_mode: "docker"agents:
- backend:
enable_code_based_tools: true
auto_discover_custom_tools: trueagents:
- backend: {type: "gemini", model: "gemini-2.5-pro"}
- backend: {type: "openai", model: "gpt-5-mini"}enable_code_based_toolsenable_mcp_command_linecommand_line_execution_modeauto_discover_custom_toolstimeout_settingscommand_line_execution_mode: "docker"/.dockerenv"local"docs/source/reference/status_file.rstdocs/source/user_guide/terminal_evaluation.rstmassgen/configs/basic/massgen/configs/meta/massgen/tool/_multimodal_tools/run_massgen_with_recording.pymassgen/tool/_multimodal_tools/understand_video.py