Loading...
Loading...
Compare original and translation side by side
EVAL -> ANALYZE -> RESEARCH -> IMPROVE -> RE-EVAL -> DECIDE -> (repeat)error_analyzer.pyEVAL -> ANALYZE -> RESEARCH -> IMPROVE -> RE-EVAL -> DECIDE -> (重复)error_analyzer.pyUser: "Run the self-improving loop on the mini-framework agent for 3 iterations"
Skill: Executes 3 iterations of EVAL->ANALYZE->RESEARCH->IMPROVE->RE-EVAL->DECIDE
Reports per-iteration scores, net improvement, and commits/reverts.用户: "在迷你框架Agent上运行自我改进循环,执行3次迭代"
Skill: 执行EVAL->ANALYZE->RESEARCH->IMPROVE->RE-EVAL->DECIDE的3次迭代
报告每次迭代的分数、净提升情况,以及提交/回滚操作。undefinedundefined
**Source:** `src/amplihack/eval/self_improve/runner.py`
**源码路径:** `src/amplihack/eval/self_improve/runner.py`python -m amplihack.eval.progressive_test_suite \
--agent-name <agent_name> \
--output-dir <output_dir>/iteration_N/eval \
--levels L1 L2 L3 L4 L5 L6python -m amplihack.eval.progressive_test_suite \
--agent-name <agent_name> \
--output-dir <output_dir>/iteration_N/eval \
--levels L1 L2 L3 L4 L5 L6error_analyzer.pyfrom amplihack.eval.self_improve import analyze_eval_results
analyses = analyze_eval_results(level_results, score_threshold=0.6)error_analyzer.pyfrom amplihack.eval.self_improve import analyze_eval_results
analyses = analyze_eval_results(level_results, score_threshold=0.6)undefinedundefinedresearch_decisions.jsonresearch_decisions.json| Parameter | Default | Description |
|---|---|---|
| | Which SDK: mini/claude/copilot/microsoft |
| | Maximum improvement iterations |
| | Minimum % improvement to commit |
| | Maximum % regression on any level |
| | Which levels to evaluate |
| | Results directory |
| | Evaluate only, don't apply changes |
| 参数名称 | 默认值 | 描述说明 |
|---|---|---|
| | 使用的SDK类型:mini/claude/copilot/microsoft |
| | 最大改进迭代次数 |
| | 提交改进所需的最小百分比提升 |
| | 任意层级允许的最大性能退化百分比 |
| | 需要评估的层级 |
| | 结果输出目录 |
| | 仅评估,不应用更改 |
from amplihack.eval.self_improve import run_self_improvement, RunnerConfig
config = RunnerConfig(
sdk_type="mini",
max_iterations=3,
improvement_threshold=2.0,
regression_tolerance=5.0,
levels=["L1", "L2", "L3", "L4", "L5", "L6"],
output_dir="./eval_results/self_improve",
dry_run=False,
)
result = run_self_improvement(config)
print(f"Total improvement: {result.total_improvement:+.1f}%")
print(f"Final scores: {result.final_scores}")from amplihack.eval.self_improve import run_self_improvement, RunnerConfig
config = RunnerConfig(
sdk_type="mini",
max_iterations=3,
improvement_threshold=2.0,
regression_tolerance=5.0,
levels=["L1", "L2", "L3", "L4", "L5", "L6"],
output_dir="./eval_results/self_improve",
dry_run=False,
)
result = run_self_improvement(config)
print(f"总提升幅度: {result.total_improvement:+.1f}%")
print(f"最终分数: {result.final_scores}")User: "Run a 4-way benchmark comparing all SDK implementations"
Skill: Runs eval suite on mini, claude, copilot, microsoft
Generates comparison table with scores, LOC, and coverage.User: "运行四向基准测试,对比所有SDK implementations"
Skill: 在mini、claude、copilot、microsoft上运行评估套件
生成包含分数、LOC和覆盖率的对比表格。src/amplihack/eval/self_improve/runner.pysrc/amplihack/eval/self_improve/error_analyzer.pysrc/amplihack/eval/progressive_test_suite.pysrc/amplihack/agents/goal_seeking/sdk_adapters/src/amplihack/eval/metacognition_grader.pysrc/amplihack/eval/teaching_session.pysrc/amplihack/eval/self_improve/runner.pysrc/amplihack/eval/self_improve/error_analyzer.pysrc/amplihack/eval/progressive_test_suite.pysrc/amplihack/agents/goal_seeking/sdk_adapters/src/amplihack/eval/metacognition_grader.pysrc/amplihack/eval/teaching_session.py