dse-loop
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDSE Loop: Autonomous Design Space Exploration
DSE循环:自主设计空间探索
Autonomously explore a design space: run → analyze → pick next parameters → repeat, until the objective is met or timeout is reached. Designed for computer architecture and EDA problems.
自主探索设计空间:运行→分析→选择下一组参数→重复,直至达成目标或达到超时时间。本工具专为计算机架构与EDA领域的问题设计。
Context: $ARGUMENTS
上下文:$ARGUMENTS
Safety Rules — READ FIRST
安全规则 — 请先阅读
NEVER do any of the following:
- anything
sudo - ,
rm -rf, or any recursive deletionrm -r - any file you did not create in this session
rm - Overwrite existing source files without reading them first
- ,
git push, or any destructive git operationgit reset --hard - Kill processes you did not start
If a step requires any of the above, STOP and report to the user.
绝对禁止以下操作:
- 使用执行任何命令
sudo - 执行、
rm -rf或任何递归删除操作rm -r - 删除任何非本次会话创建的文件
- 未先阅读现有源文件就覆盖它们
- 执行、
git push或任何破坏性Git操作git reset --hard - 终止任何非你启动的进程
如果某步骤需要执行上述任一操作,请立即停止并向用户报告。
Constants (override via $ARGUMENTS)
常量(可通过$ARGUMENTS覆盖)
| Constant | Default | Description |
|---|---|---|
| 2h | Total wall-clock budget. Stop exploring after this. |
| 50 | Hard cap on number of design points evaluated. |
| 10 | Stop early if no improvement for this many consecutive iterations. |
| minimize | |
Override inline:
/dse-loop "task desc — timeout: 4h, max_iterations: 100, patience: 15"| 常量 | 默认值 | 描述 |
|---|---|---|
| 2h | 总时钟预算,超过后停止探索 |
| 50 | 评估设计点数量的硬上限 |
| 10 | 若连续多次迭代无改进则提前停止 |
| minimize | 对目标指标进行 |
行内覆盖示例:
/dse-loop "任务描述 — timeout: 4h, max_iterations: 100, patience: 15"Typical Use Cases
典型用例
| Problem | Program | Parameters | Objective |
|---|---|---|---|
| Microarch DSE | gem5 simulation | cache size, assoc, pipeline width, ROB size, branch predictor | maximize IPC or minimize area×delay |
| Synthesis tuning | yosys/DC script | optimization passes, target freq, effort level | minimize area at timing closure |
| RTL parameterization | verilator sim | data width, FIFO depth, pipeline stages, buffer sizes | meet throughput target at min area |
| Compiler flags | gcc/llvm build + benchmark | -O levels, unroll factor, vectorization, scheduling | minimize runtime or code size |
| Placement/routing | openroad/innovus | utilization, aspect ratio, layer config | minimize wirelength / timing |
| Formal verification | abc/sby | bound depth, engine, timeout per property | maximize coverage in time budget |
| Memory subsystem | cacti / ramulator | bank count, row buffer policy, scheduling | optimize bandwidth/energy |
| 问题 | 程序 | 参数 | 目标 |
|---|---|---|---|
| 微架构DSE | gem5仿真 | 缓存大小、关联度、流水线宽度、ROB大小、分支预测器 | 最大化IPC或最小化面积×延迟 |
| 综合调优 | yosys/DC脚本 | 优化流程、目标频率、努力等级 | 在满足时序收敛的前提下最小化面积 |
| RTL参数化 | verilator仿真 | 数据宽度、FIFO深度、流水线级数、缓冲区大小 | 以最小面积达成吞吐量目标 |
| 编译器标志 | gcc/llvm构建+基准测试 | -O级别、循环展开因子、向量化、调度 | 最小化运行时间或代码大小 |
| 布局布线 | openroad/innovus | 利用率、宽高比、层配置 | 最小化线长/时序 |
| 形式化验证 | abc/sby | 边界深度、引擎、每个属性的超时时间 | 在时间预算内最大化覆盖率 |
| 内存子系统 | cacti / ramulator | 存储体数量、行缓冲区策略、调度 | 优化带宽/能耗 |
Workflow
工作流
Phase 0: Parse Task & Setup
阶段0:解析任务与设置
-
Parse $ARGUMENTS to extract:
- Program: what to run (command, script, or Makefile target)
- Parameter space: which knobs to tune and their ranges/options (may be incomplete — see step 2)
- Objective metric: what to optimize (and how to extract it from output)
- Constraints: hard limits that must not be violated (e.g., timing must close)
- Timeout: wall-clock budget
- Success criteria: when is the result "good enough" to stop early?
-
Infer missing parameter ranges — If the user provides parameter names but NOT ranges/options, you MUST infer them before exploring:a. Read the source code — search for the parameter names in the codebase:
- Look for argparse/click definitions, config files, Makefile variables, module parameters, ,
#define(SystemVerilog),parameter, etc.localparam - Extract defaults, types, and any comments hinting at valid values
b. Apply domain knowledge to set reasonable ranges:Parameter type Inference strategy Cache/memory sizes Powers of 2, typically 1KB–16MB Associativity Powers of 2: 1, 2, 4, 8, 16 Pipeline width / issue width Small integers: 1, 2, 4, 8 Buffer/queue/FIFO depth Powers of 2: 4, 8, 16, 32, 64 Clock period / frequency Based on technology node; try ±50% from default Bound depth (BMC/formal) Geometric: 5, 10, 20, 50, 100 Timeout values Geometric: 10s, 30s, 60s, 120s, 300s Boolean/enum flags Enumerate all options found in source Continuous (learning rate, threshold) Log-scale sweep: 5 points spanning 2 orders of magnitude around default Integer counts (threads, cores) Linear: from 1 to hardware max c. Start conservative — begin with 3-5 values per parameter. Expand range later if the best result is at a boundary.d. Log inferred ranges — write the inferred parameter space toso the user can review:dse_results/inferred_params.mdmarkdown# Inferred Parameter Space | Parameter | Source | Default | Inferred Range | Reasoning | |-----------|--------|---------|---------------|-----------| | CACHE_SIZE | config.py:42 | 32768 | [8192, 16384, 32768, 65536, 131072] | powers of 2, ±2x from default | | ASSOC | config.py:43 | 4 | [1, 2, 4, 8] | standard associativities | | BMC_DEPTH | run_bmc.py:15 | 10 | [5, 10, 20, 50] | geometric, common BMC depths |e. Boundary expansion — during the search, if the best result is at the min or max of a range, automatically extend that range by one step in that direction (but log the extension). - Look for argparse/click definitions, config files, Makefile variables, module parameters,
-
Read the project to understand:
- How to run the program
- Where results are produced (stdout, log files, reports)
- How to parse the objective metric from output
- Current/baseline configuration (if any)
-
Create working directory:in project root
dse_results/- — one row per design point
dse_results/dse_log.csv - — final report
dse_results/DSE_REPORT.md - — state for recovery
dse_results/DSE_STATE.json - — inferred parameter space (if ranges were not provided)
dse_results/inferred_params.md - — config files for each run
dse_results/configs/ - — raw output for each run
dse_results/outputs/
-
Write a parameter extraction script (or similar) that takes a run's output and returns the objective metric as a number. Test it on a baseline run first.
dse_results/parse_result.py -
Run baseline (iteration 0): run the program with default/current parameters. Record the baseline metric. This is the point to beat.
-
解析$ARGUMENTS以提取:
- 程序:要运行的内容(命令、脚本或Makefile目标)
- 参数空间:要调优的控制项及其范围/选项(可能不完整——见步骤2)
- 目标指标:要优化的对象(以及如何从输出中提取它)
- 约束条件:必须满足的硬限制(例如,时序必须收敛)
- 超时时间:时钟预算
- 成功标准:结果达到何种程度可提前停止?
-
推断缺失的参数范围——如果用户提供了参数名称但未提供范围/选项,必须先推断范围再开始探索:a. 阅读源代码——在代码库中搜索参数名称:
- 查找argparse/click定义、配置文件、Makefile变量、模块参数、、SystemVerilog의
#define、parameter等localparam - 提取默认值、类型以及任何提示有效值的注释
b. 运用领域知识设置合理范围:参数类型 推断策略 缓存/内存大小 2的幂,通常为1KB–16MB 关联度 2的幂:1, 2, 4, 8, 16 流水线宽度/发射宽度 小整数:1, 2, 4, 8 缓冲区/队列/FIFO深度 2的幂:4, 8, 16, 32, 64 时钟周期/频率 基于工艺节点;尝试默认值的±50% 边界深度(BMC/形式化验证) 几何级数:5, 10, 20, 50, 100 超时值 几何级数:10s, 30s, 60s, 120s, 300s 布尔/枚举标志 枚举源代码中找到的所有选项 连续值(学习率、阈值) 对数尺度扫描:围绕默认值的2个数量级范围内取5个点 整数计数(线程、核心) 线性:从1到硬件最大值 c. 从保守范围开始——每个参数先取3-5个值。若最佳结果出现在范围边界,后续再扩展范围。d. 记录推断的范围——将推断的参数空间写入,以便用户查看:dse_results/inferred_params.mdmarkdown# 推断的参数空间 | 参数 | 来源 | 默认值 | 推断范围 | 推理依据 | |-----------|--------|---------|---------------|-----------| | CACHE_SIZE | config.py:42 | 32768 | [8192, 16384, 32768, 65536, 131072] | 2的幂,默认值的±2倍 | | ASSOC | config.py:43 | 4 | [1, 2, 4, 8] | 标准关联度值 | | BMC_DEPTH | run_bmc.py:15 | 10 | [5, 10, 20, 50] | 几何级数,常见BMC深度 |e. 边界扩展——在搜索过程中,若最佳结果出现在某范围的最小值或最大值处,自动将该范围向对应方向扩展一个步长(并记录扩展操作)。 - 查找argparse/click定义、配置文件、Makefile变量、模块参数、
-
了解项目以明确:
- 如何运行程序
- 结果生成位置(标准输出、日志文件、报告)
- 如何从输出中解析目标指标
- 当前/基线配置(若有)
-
创建工作目录:在项目根目录下创建
dse_results/- — 每个设计点对应一行
dse_results/dse_log.csv - — 最终报告
dse_results/DSE_REPORT.md - — 用于恢复的状态文件
dse_results/DSE_STATE.json - — 推断的参数空间(若未提供范围)
dse_results/inferred_params.md - — 每次运行的配置文件
dse_results/configs/ - — 每次运行的原始输出
dse_results/outputs/
-
编写参数提取脚本(如),该脚本接收某次运行的输出并返回目标指标的数值。先在基线运行上测试该脚本。
dse_results/parse_result.py -
运行基线(迭代0):使用默认/当前参数运行程序。记录基线指标。这是后续要超越的基准。
Phase 1: Initial Exploration
阶段1:初始探索
Goal: Quickly survey the space to understand which parameters matter most.
Strategy: Latin Hypercube Sampling or structured sweep of key parameters.
- Pick 5-10 diverse design points that span the parameter ranges
- Run them (in parallel if independent, via background processes or sequential)
- Record all results in :
dse_log.csviteration,param1,param2,...,metric,constraint_met,timestamp,notes 0,default,default,...,baseline_val,yes,2026-03-13T10:00:00,baseline 1,val1a,val2a,...,result1,yes,2026-03-13T10:05:00,initial sweep ... - Analyze: which parameters have the most impact on the objective?
- Narrow the search to the most sensitive parameters
目标:快速遍历参数空间,了解哪些参数影响最大。
策略:拉丁超立方采样或关键参数的结构化扫描。
- 选择5-10个覆盖参数范围的多样化设计点
- 运行这些设计点(若相互独立可并行运行,通过后台进程或顺序执行)
- 将所有结果记录到:
dse_log.csviteration,param1,param2,...,metric,constraint_met,timestamp,notes 0,default,default,...,baseline_val,yes,2026-03-13T10:00:00,baseline 1,val1a,val2a,...,result1,yes,2026-03-13T10:05:00,initial sweep ... - 分析:哪些参数对目标指标的影响最大?
- 将搜索范围缩小到最敏感的参数
Phase 2: Directed Search
阶段2:定向搜索
Goal: Converge toward the optimum by making informed choices.
Strategy: Adaptive — pick the approach that fits the problem:
- Few parameters (≤3): Fine-grained grid search around the best region from Phase 1
- Many parameters (>3): Coordinate descent — optimize one parameter at a time, holding others at current best
- Binary/categorical params: Enumerate promising combinations
- Continuous params: Binary search or golden section between best neighbors
- Multi-objective: Track Pareto frontier, explore along the front
For each iteration:
-
Select next design point based on results so far:
- Look at the trend: which direction improves the metric?
- Avoid re-running configurations already evaluated
- Balance exploration (untested regions) vs exploitation (near current best)
-
Modify parameters: edit config file, command-line args, or source constants
-
Run the program: execute and capture output
-
Parse results: extract the objective metric and check constraints
-
Log to: append the new row
dse_log.csv -
Check stopping conditions:
- Timeout reached? → stop
- Max iterations reached? → stop
- Patience exhausted (no improvement in N iterations)? → stop
- Success criteria met (metric is "good enough")? → stop
- Constraint violation pattern detected? → adjust search bounds
-
Update:
DSE_STATE.jsonjson{ "iteration": 15, "status": "in_progress", "best_metric": 1.23, "best_params": {"cache_size": 32768, "assoc": 4, "pipeline_width": 2}, "total_iterations": 15, "start_time": "2026-03-13T10:00:00", "timeout": "2h", "patience_counter": 3 } -
Decide next step → back to step 1
目标:通过明智的选择向最优解收敛。
策略:自适应——选择适合问题的方法:
- 参数较少(≤3个):在阶段1找到的最优区域附近进行细粒度网格搜索
- 参数较多(>3个):坐标下降法——每次优化一个参数,其他参数保持当前最优值
- 二进制/分类参数:枚举有前景的组合
- 连续参数:在最优邻居之间进行二分搜索或黄金分割搜索
- 多目标:跟踪帕累托前沿,沿前沿探索
每次迭代步骤:
-
根据已有结果选择下一个设计点:
- 观察趋势:哪个方向能改善指标?
- 避免重新运行已评估过的配置
- 平衡探索(未测试区域)与利用(当前最优附近区域)
-
修改参数:编辑配置文件、命令行参数或源常量
-
运行程序:执行并捕获输出
-
解析结果:提取目标指标并检查约束条件
-
记录到:追加新行
dse_log.csv -
检查停止条件:
- 是否已达到超时时间?→ 停止
- 是否已达到最大迭代次数?→ 停止
- 是否已耗尽耐心(连续N次迭代无改进)?→ 停止
- 是否已满足成功标准(指标足够好)?→ 停止
- 是否检测到约束违反模式?→ 调整搜索边界
-
更新:
DSE_STATE.jsonjson{ "iteration": 15, "status": "in_progress", "best_metric": 1.23, "best_params": {"cache_size": 32768, "assoc": 4, "pipeline_width": 2}, "total_iterations": 15, "start_time": "2026-03-13T10:00:00", "timeout": "2h", "patience_counter": 3 } -
决定下一步 → 返回步骤1
Phase 3: Refinement (if time allows)
阶段3:优化(若时间允许)
If the search converged and there's still time budget:
- Local perturbation: try ±1 step on each parameter from the best point
- Sensitivity analysis: which parameters can be relaxed without hurting the metric?
- Constraint boundary: if a constraint is nearly binding, explore near-feasible points
若搜索已收敛且仍有时间预算:
- 局部扰动:在最佳点的基础上,对每个参数尝试±1个步长的调整
- 敏感性分析:哪些参数可以放松而不影响指标?
- 约束边界:若某约束接近临界值,探索接近可行的点
Phase 4: Report
阶段4:报告
Write :
dse_results/DSE_REPORT.mdmarkdown
undefined编写:
dse_results/DSE_REPORT.mdmarkdown
undefinedDesign Space Exploration Report
设计空间探索报告
Task: [description]
Date: [start] → [end]
Total iterations: N
Wall-clock time: X hours Y minutes
任务:[描述]
日期:[开始时间] → [结束时间]
总迭代次数:N
时钟时间:X小时Y分钟
Objective
目标
- Metric: [what was optimized]
- Direction: minimize / maximize
- Baseline: [value]
- Best found: [value] ([improvement]% better than baseline)
- 指标:[优化的对象]
- 方向:最小化 / 最大化
- 基线:[数值]
- 找到的最优值:[数值](比基线提升[improvement]%)
Best Configuration
最优配置
| Parameter | Baseline | Best |
|---|---|---|
| param1 | default | best_val |
| param2 | default | best_val |
| ... | ... | ... |
| 参数 | 基线 | 最优值 |
|---|---|---|
| param1 | default | best_val |
| param2 | default | best_val |
| ... | ... | ... |
Search Trajectory
搜索轨迹
| Iteration | param1 | param2 | ... | Metric | Notes |
|---|---|---|---|---|---|
| 0 (baseline) | ... | ... | ... | ... | baseline |
| 1 | ... | ... | ... | ... | initial sweep |
| ... | ... | ... | ... | ... | ... |
| N (best) | ... | ... | ... | ... | ★ best |
| 迭代次数 | param1 | param2 | ... | 指标 | 备注 |
|---|---|---|---|---|---|
| 0(基线) | ... | ... | ... | ... | baseline |
| 1 | ... | ... | ... | ... | initial sweep |
| ... | ... | ... | ... | ... | ... |
| N(最优) | ... | ... | ... | ... | ★ 最优 |
Parameter Sensitivity
参数敏感性
- param1: [high/medium/low impact] — [brief explanation]
- param2: [high/medium/low impact] — [brief explanation]
- param1:[高/中/低影响] — [简要说明]
- param2:[高/中/低影响] — [简要说明]
Pareto Frontier (if multi-objective)
帕累托前沿(多目标场景)
[Table or description of non-dominated points]
[非支配点的表格或描述]
Stopping Reason
停止原因
[timeout / max_iterations / patience / success_criteria_met]
[超时 / 达到最大迭代次数 / 耗尽耐心 / 满足成功标准]
Recommendations
建议
- [actionable insights from the exploration]
- [which parameters matter most]
- [suggested follow-up explorations]
Also generate a summary plot if matplotlib is available:
- Convergence curve (metric vs iteration)
- Parameter sensitivity bar chart
- Pareto frontier scatter (if multi-objective)- [从探索中得出的可操作见解]
- [哪些参数影响最大]
- [建议的后续探索方向]
若matplotlib可用,还需生成汇总图表:
- 收敛曲线(指标 vs 迭代次数)
- 参数敏感性柱状图
- 帕累托前沿散点图(多目标场景)State Recovery
状态恢复
If the context window compacts mid-run, the loop recovers from + :
DSE_STATE.jsondse_log.csv- Read for current iteration, best params, patience counter
DSE_STATE.json - Read for full history
dse_log.csv - Resume from next iteration
若在运行过程中上下文窗口被压缩,循环可从 + 恢复:
DSE_STATE.jsondse_log.csv- 读取获取当前迭代次数、最优参数、耐心计数器
DSE_STATE.json - 读取获取完整历史
dse_log.csv - 从下一次迭代开始恢复运行
Key Rules
关键规则
- Work AUTONOMOUSLY — do not ask the user for permission at each iteration
- Every run must be logged — even failed runs, constraint violations, errors. The log is the ground truth.
- Never re-run an identical configuration — check before each run
dse_log.csv - Respect the timeout — check elapsed time before starting a new iteration. If the next run is likely to exceed the timeout, stop and report.
- Parse metrics programmatically — write a parsing script, don't eyeball logs
- Keep raw outputs — save each run's full output in
dse_results/outputs/iter_N/ - Constraint violations are not improvements — a design point that violates constraints is never "best", regardless of the metric
- If a run crashes, log the error, skip that point, and continue with the next
- If the same crash repeats 3 times with different configs, stop and report the issue
- 自主工作——无需在每次迭代时请求用户许可
- 每次运行都必须记录——即使是失败的运行、约束违反、错误。日志是唯一的事实依据。
- 永远不要重新运行完全相同的配置——每次运行前检查
dse_log.csv - 遵守超时时间——在开始新迭代前检查已用时间。若下一次运行可能超过超时时间,停止并报告。
- 以编程方式解析指标——编写解析脚本,不要手动查看日志
- 保留原始输出——将每次运行的完整输出保存到
dse_results/outputs/iter_N/ - 约束违反不算改进——违反约束条件的设计点无论指标如何,都不能被视为“最优”
- 若某次运行崩溃,记录错误,跳过该点,继续下一个
- 若不同配置连续3次崩溃,停止并报告问题
Example Invocations
调用示例
undefinedundefinedMinimal — just name the parameters, let the agent figure out ranges
Minimal — just name the parameters, let the agent figure out ranges
/dse-loop "Run gem5 mcf benchmark. Tune: L1D_SIZE, L2_SIZE, ROB_ENTRIES. Objective: maximize IPC. Timeout: 3h"
/dse-loop "Run gem5 mcf benchmark. Tune: L1D_SIZE, L2_SIZE, ROB_ENTRIES. Objective: maximize IPC. Timeout: 3h"
Partial — some ranges given, some not
Partial — some ranges given, some not
/dse-loop "Run make synth. Tune: CLOCK_PERIOD [5ns, 4ns, 3ns, 2ns], FLATTEN, ABC_SCRIPT. Objective: minimize area at timing closure. Timeout: 1h"
/dse-loop "Run make synth. Tune: CLOCK_PERIOD [5ns, 4ns, 3ns, 2ns], FLATTEN, ABC_SCRIPT. Objective: minimize area at timing closure. Timeout: 1h"
Fully specified — explicit ranges for everything
Fully specified — explicit ranges for everything
/dse-loop "Simulate processor with FIFO_DEPTH [4,8,16,32], ISSUE_WIDTH [1,2,4], PREFETCH [on,off]. Run: make sim. Objective: max throughput/area. Timeout: 2h"
/dse-loop "Simulate processor with FIFO_DEPTH [4,8,16,32], ISSUE_WIDTH [1,2,4], PREFETCH [on,off]. Run: make sim. Objective: max throughput/area. Timeout: 2h"
Real-world: PDAG-SFA formal verification tuning
Real-world: PDAG-SFA formal verification tuning
/dse-loop "Run python run_bmc.py. Tune: BMC_DEPTH, ENGINE, TIMEOUT_PER_PROP. Objective: maximize properties proved. Timeout: 2h"
undefined/dse-loop "Run python run_bmc.py. Tune: BMC_DEPTH, ENGINE, TIMEOUT_PER_PROP. Objective: maximize properties proved. Timeout: 2h"
undefined