ln-811-performance-profiler

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Paths: File paths (
shared/
,
references/
,
../ln-*
) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.
路径说明: 文件路径(
shared/
references/
../ln-*
)均相对于技能仓库根目录。若在当前工作目录未找到,请定位至本SKILL.md所在目录,再向上一级即为仓库根目录。

ln-811-performance-profiler

ln-811-performance-profiler

Type: L3 Worker Category: 8XX Optimization
Runtime profiler that executes the optimization target, measures multiple metrics (CPU, memory, I/O, time), instruments code for per-function breakdown, and produces a standardized performance map from real data.

类型: L3 Worker 分类: 8XX 优化类
一款运行时性能分析工具,可执行优化目标、测量多类指标(CPU、内存、I/O、时间)、对代码进行插桩以实现按函数拆解分析,并基于真实数据生成标准化性能图谱。

Overview

概述

AspectDetails
InputProblem statement: target (file/endpoint/pipeline) + observed metric
OutputPerformance map (multi-metric, per-function), suspicion stack, bottleneck classification
PatternDiscover test → Baseline run → Static analysis → Deep profile → Performance map → Report

维度详情
输入问题描述:优化目标(文件/接口/流水线)+ 观测指标
输出多指标按函数拆分的性能图谱、疑点栈、瓶颈分类
流程模式测试发现 → 基准运行 → 静态分析 → 深度剖析 → 性能图谱 → 报告生成

Workflow

工作流程

Phases: Test Discovery → Baseline Run → Static Analysis → Deep Profile → Performance Map → Report

阶段划分: 测试发现 → 基准运行 → 静态分析 → 深度剖析 → 性能图谱 → 报告生成

Phase 0: Test Discovery/Creation

阶段0:测试用例发现/创建

MANDATORY READ: Load
shared/references/ci_tool_detection.md
for test framework detection. MANDATORY READ: Load
shared/references/benchmark_generation.md
for auto-generating benchmarks when none exist.
Find or create commands that exercise the optimization target. Two outputs:
test_command
(profiling/measurement) and
e2e_test_command
(functional safety gate).
必读文档: 加载
shared/references/ci_tool_detection.md
以进行测试框架检测。 必读文档: 加载
shared/references/benchmark_generation.md
以在无现有基准测试时自动生成。
查找或创建可执行优化目标的命令,输出两个结果:
test_command
(用于性能剖析/指标测量)和
e2e_test_command
(用于功能安全校验)。

Step 1: Discover test_command

步骤1:查找test_command

PriorityMethodAction
1User-providedUser specifies test command or API endpoint
2Discover existing E2E testGrep test files for target entry point (stop at first match)
3Create test scriptGenerate per
shared/references/benchmark_generation.md
to
.optimization/{slug}/profile_test.sh
E2E discovery protocol (stop at first match):
PriorityMethodHow
1Route-based searchGrep e2e/integration test files for entry point route
2Function-based searchGrep for entry point function name
3Module-based searchGrep for import of entry point module
Test creation (if no existing test found):
Target TypeGenerated Script
API endpoint
curl -w "%{time_total}" -o /dev/null -s {endpoint}
FunctionStack-specific benchmark per
shared/references/benchmark_generation.md
PipelineFull pipeline invocation with test input
优先级方法操作
1用户提供用户指定测试命令或API接口
2查找现有端到端测试在测试文件中搜索目标入口点(找到首个匹配项即停止)
3创建测试脚本根据
shared/references/benchmark_generation.md
生成脚本至
.optimization/{slug}/profile_test.sh
端到端测试发现规则(找到首个匹配项即停止):
优先级方法操作方式
1基于路由搜索在端到端/集成测试文件中搜索入口点路由
2基于函数搜索搜索入口点函数名
3基于模块搜索搜索入口点模块的导入语句
测试用例创建(无现有测试时):
目标类型生成的脚本
API接口
curl -w "%{time_total}" -o /dev/null -s {endpoint}
函数根据
shared/references/benchmark_generation.md
生成对应技术栈的基准测试
流水线使用测试输入调用完整流水线

Step 2: Discover e2e_test_command

步骤2:查找e2e_test_command

If
test_command
came from E2E discovery (Step 1 priority 2):
e2e_test_command = test_command
.
Otherwise, run E2E discovery protocol again (same 3-priority table) to find a separate functional safety test.
If not found:
e2e_test_command = null
, log:
WARNING: No e2e test covers {entry_point}. Full test suite serves as functional gate.
如果
test_command
来自端到端测试发现(步骤1优先级2):
e2e_test_command = test_command
否则,再次执行端到端测试发现规则(使用相同的3级优先级表格)以找到独立的功能安全测试用例。
如果未找到:
e2e_test_command = null
,并记录日志:
WARNING: No e2e test covers {entry_point}. Full test suite serves as functional gate.

Output

输出结果

FieldDescription
test_command
Command for profiling/measurement
e2e_test_command
Command for functional safety gate (may equal test_command, or null)
e2e_test_source
Discovery method: user / route / function / module / none

字段说明
test_command
用于性能剖析/指标测量的命令
e2e_test_command
用于功能安全校验的命令(可能与test_command相同,或为null)
e2e_test_source
发现方式:user / route / function / module / none

Phase 1: Baseline Run (Multi-Metric)

阶段1:基准运行(多指标测量)

Run
test_command
with system-level profiling. Capture simultaneously:
MetricHow to CaptureWhen
Wall time
time
wrapper or test harness
Always
CPU time (user+sys)
/usr/bin/time -v
or language profiler
Always
Memory peak (RSS)
/usr/bin/time -v
(Max RSS) or
tracemalloc
/
process.memoryUsage()
Always
I/O bytes
/usr/bin/time -v
or structured logs
If I/O suspected
HTTP round-tripsCount from structured logs or application metricsIf network I/O in call graph
GPU utilization
nvidia-smi --query-gpu
Only if CUDA/GPU detected in stack
使用系统级性能剖析工具运行
test_command
,同时捕获以下指标:
指标捕获方式适用场景
挂钟时间
time
命令包装器或测试框架
始终捕获
CPU时间(用户态+内核态)
/usr/bin/time -v
或对应语言的性能剖析工具
始终捕获
内存峰值(RSS)
/usr/bin/time -v
(Max RSS)或
tracemalloc
/
process.memoryUsage()
始终捕获
I/O字节数
/usr/bin/time -v
或结构化日志
当怀疑存在I/O瓶颈时
HTTP往返次数从结构化日志或应用指标中统计当调用图中存在网络I/O时
GPU利用率
nvidia-smi --query-gpu
仅当检测到技术栈中使用CUDA/GPU时

Baseline Protocol

基准运行规则

ParameterValue
Runs3
MetricMedian
Warm-up1 discarded run
Output
baseline
— multi-metric snapshot

参数取值
运行次数3
统计方式中位数
预热处理丢弃1次运行结果
输出
baseline
——多指标快照

Phase 2: Static Analysis → Instrumentation Points

阶段2:静态分析 → 插桩点确定

MANDATORY READ: Load bottleneck_classification.md
Trace call chain from code + build suspicion stack. Purpose: guide WHERE to instrument in Phase 3.
必读文档: 加载bottleneck_classification.md
从代码中追踪调用链并构建疑点栈。目的: 指导阶段3的代码插桩位置。

Step 1: Trace Call Chain

步骤1:追踪调用链

Starting from entry point, trace depth-first (max depth 5). At each step, READ the full function body.
Cross-service tracing: If
service_topology
is available from coordinator and a step makes an HTTP/gRPC call to another service whose code is accessible:
SituationAction
HTTP call to service with code in submodule/monorepoFollow into that service's handler: resolve route → trace handler code (depth resets to 0 for the new service)
HTTP call to service without accessible codeClassify as External, record latency estimate
gRPC/message queue to known serviceSame as HTTP — follow into handler if code accessible
Record
service: "{service_name}"
on each step to track which service owns it. The performance_map
steps
tree can span multiple services.
Depth-First Rule: If code of the called service is accessible — ALWAYS profile INSIDE. NEVER classify an accessible service as "External/slow" without profiling its internals. "Slow" is a symptom, not a diagnosis.
5 Whys for each bottleneck: Before reporting a bottleneck, chain "why?" until you reach config/architecture level:
  1. "What is slow?" → alignment service (5.9s) 2. "Why?" → 6 pairs × ~1s each 3. "Why ~1s per pair?" → O(n²) mwmf computation 4. "Why O(n²)?" → library default, not production config 5. "Why default?" →
    matching_methods
    not configured → root cause = config
从入口点开始,采用深度优先方式追踪(最大深度5层)。每一步都需完整阅读函数体代码。
跨服务追踪: 如果从协调器处获取到
service_topology
,且某一步骤调用了另一个代码可访问的服务(HTTP/gRPC调用):
场景操作
调用代码位于子模块/单体仓库中的服务的HTTP请求追踪至该服务的处理器:解析路由 → 追踪处理器代码(新服务的追踪深度重置为0)
调用代码不可访问的服务的HTTP请求归类为外部服务,记录延迟估算值
调用已知服务的gRPC/消息队列请求与HTTP场景相同——若代码可访问则追踪至处理器
在每一步记录
service: "{service_name}"
以追踪该步骤所属的服务。性能图谱的
steps
树可跨多个服务。
深度优先规则: 若被调用服务的代码可访问,必须对其内部进行性能剖析。未对内部进行剖析时,绝不能将可访问服务归类为“外部/缓慢”。“缓慢”是症状,而非诊断结果。
针对每个瓶颈的5Why分析法: 在报告瓶颈前,连续追问“为什么?”直至找到配置/架构层面的根因:
  1. “什么模块慢?” → 对齐服务(5.9秒) 2. “为什么慢?” → 6组数据 × 每组约1秒 3. “每组为什么约1秒?” → O(n²)复杂度的mwmf计算 4. “为什么是O(n²)复杂度?” → 库默认配置,而非生产环境配置 5. “为什么使用默认配置?” →
    matching_methods
    未配置 → 根因=配置问题

Step 2: Classify & Suspicion Scan

步骤2:分类与疑点扫描

For each step, classify by type (CPU, I/O-DB, I/O-Network, I/O-File, Architecture, External, Cache) and scan for performance concerns.
Suspicion checklist (minimum, not limitation):
CategoryWhat to Look For
Connection managementClient created per-request? Missing pooling? Missing reuse?
Data flowData read multiple times? Over-fetching? Unnecessary transforms?
Async patternsSync I/O in async context? Sequential awaits without data dependency?
Resource lifecycleUnclosed connections? Temp files? Memory accumulation in loop?
ConfigurationHardcoded timeouts? Default pool sizes? Missing batch size config?
Redundant workSame validation at multiple layers? Same data loaded twice?
ArchitectureN+1 in loop? Batch API unused? Cache infra unused? Sequential-when-parallel?
(open)Anything else spotted — checklist does not limit findings
对每个步骤按类型分类(CPU、I/O-数据库、I/O-网络、I/O-文件、架构、外部服务、缓存),并扫描性能问题疑点。
疑点检查清单(基础项,并非全部):
类别检查内容
连接管理是否每次请求创建新客户端?是否缺少连接池?是否未复用连接?
数据流是否重复读取数据?是否过度获取数据?是否存在不必要的数据转换?
异步模式异步上下文是否存在同步I/O?是否存在无数据依赖的顺序等待?
资源生命周期是否存在未关闭的连接?是否残留临时文件?循环中是否存在内存累积?
配置是否存在硬编码超时?是否使用默认连接池大小?是否缺少批量处理大小配置?
冗余操作是否在多层重复校验?是否重复加载相同数据?
架构设计循环中是否存在N+1问题?是否未使用批量API?是否未利用缓存基础设施?是否可并行却采用串行执行?
(其他)任何其他发现——清单不限制可发现的问题

Step 2b: Suspicion Deduplication

步骤2b:疑点去重

MANDATORY READ: Load
shared/references/output_normalization.md
After generating suspicions across all call chain steps, normalize and deduplicate per §1-§2:
  • Normalize suspicion descriptions (replace specific values with placeholders)
  • Group identical suspicions across different steps → merge into single entry with
    affected_steps: [list]
  • Example: "Missing connection pooling" found in steps 1.1, 1.2, 1.3 → one suspicion with
    affected_steps: ["1.1", "1.2", "1.3"]
必读文档: 加载
shared/references/output_normalization.md
在所有调用链步骤中生成疑点后,根据§1-§2进行标准化和去重:
  • 标准化疑点描述(将具体值替换为占位符)
  • 将不同步骤中相同的疑点分组 → 合并为单个条目,添加
    affected_steps: [列表]
  • 示例:在步骤1.1、1.2、1.3中均发现“缺少连接池” → 合并为一个疑点,
    affected_steps: ["1.1", "1.2", "1.3"]

Step 3: Verify & Map to Instrumentation Points

步骤3:验证与插桩点映射

FOR each suspicion:
  1. VERIFY: follow code to confirm or dismiss
  2. VERDICT: CONFIRMED → map to instrumentation point | DISMISSED → log reason
  3. For each CONFIRMED suspicion, identify:
     - function to wrap with timing
     - I/O call to count
     - memory allocation to track
FOR each suspicion:
  1. VERIFY: follow code to confirm or dismiss
  2. VERDICT: CONFIRMED → map to instrumentation point | DISMISSED → log reason
  3. For each CONFIRMED suspicion, identify:
     - function to wrap with timing
     - I/O call to count
     - memory allocation to track

Profiler Selection (per stack)

性能剖析工具选择(按技术栈)

StackNon-invasive profilerInvasive (if non-invasive insufficient)
Python
py-spy
,
cProfile
time.perf_counter()
decorators
Node.js
clinic
,
--prof
console.time()
wrappers
Go
pprof
(built-in)
Usually not needed
.NET
dotnet-trace
Stopwatch
wrappers
Rust
cargo flamegraph
std::time::Instant
Stack detection: per
shared/references/ci_tool_detection.md
.

技术栈非侵入式剖析工具侵入式工具(当非侵入式工具不足时)
Python
py-spy
,
cProfile
time.perf_counter()
decorators
Node.js
clinic
,
--prof
console.time()
wrappers
Go
pprof
(built-in)
通常无需使用
.NET
dotnet-trace
Stopwatch
wrappers
Rust
cargo flamegraph
std::time::Instant
技术栈检测: 依据
shared/references/ci_tool_detection.md
执行。

Phase 3: Deep Profile

阶段3:深度性能剖析

Profiler Hierarchy (escalate as needed)

剖析工具层级(按需升级)

LevelTool ExamplesWhat It ShowsWhen to Use
1
py-spy
,
cProfile
,
pprof
,
dotnet-trace
Function-level hotspotsAlways — first pass
2
line_profiler
, per-line timing
Line-level timing in hotspot functionHotspot function found but cause unclear
3
tracemalloc
,
memory_profiler
Per-line memory allocationMemory metrics abnormal in baseline
层级工具示例展示内容适用场景
1
py-spy
,
cProfile
,
pprof
,
dotnet-trace
函数级热点始终优先使用
2
line_profiler
, per-line timing
热点函数的行级计时找到热点函数但原因不明时
3
tracemalloc
,
memory_profiler
行级内存分配基准运行中内存指标异常时

Step 1: Non-Invasive Profiling (preferred)

步骤1:非侵入式性能剖析(优先选择)

Run
test_command
with Level 1 profiler to get per-function breakdown without code changes.
使用1级剖析工具运行
test_command
,无需修改代码即可获取按函数拆分的分析结果。

Step 2: Escalation Decision

步骤2:升级决策

After Level 1 profiler run, evaluate result against suspicion stack from Phase 2:
Profiler ResultAction
Hotspot function identified, time breakdown confirms suspicionsDONE — proceed to Phase 4
Hotspot identified but internal cause unclear (CPU vs I/O inside one function)Escalate to Level 2 (line-level timing)
Memory baseline abnormal (peak or delta)Escalate to Level 3 (memory profiler)
Multiple suspicions unresolved — profiler granularity insufficientGo to Step 3 (targeted instrumentation)
Profiler unavailable or overhead > 20% of wall timeGo to Step 3 (targeted instrumentation)
运行1级剖析工具后,结合阶段2的疑点栈评估结果:
剖析结果操作
已识别热点函数,时间拆分结果验证了疑点完成,进入阶段4
已识别热点函数但内部原因不明(函数内CPU与I/O瓶颈区分不清)升级至2级(行级计时)
基准内存指标异常(峰值或增量)升级至3级(内存剖析工具)
多个疑点未解决——剖析工具粒度不足进入步骤3(定向插桩)
剖析工具不可用或开销超过挂钟时间的20%进入步骤3(定向插桩)

Step 3: Targeted Instrumentation (proactive)

步骤3:定向插桩(主动式)

Add timing/logging along the call stack at instrumentation points identified in Phase 2 Step 3:
1. FOR each CONFIRMED suspicion without measured data:
     Add timing wrapper around target function/I/O call
     Add counter for I/O round-trips if network/DB suspected
     (cross-service: instrument in the correct service's codebase)
2. Re-run test_command (3 runs, median)
3. Collect per-function measurements from logs
4. Record list of instrumented files (may span multiple services)
Instrumentation TypeWhenExample
Timing wrapperAlways for unresolved suspicions
time.perf_counter()
around function call
I/O call counterNetwork or DB bottleneck suspectedCount HTTP requests, DB queries in loop
Memory snapshotMemory accumulation suspected
tracemalloc.get_traced_memory()
before/after
KEEP instrumentation in place. The executor reuses it for post-optimization per-function comparison, then cleans up after strike. Report
instrumented_files
in output.

在阶段2步骤3确定的插桩点处,沿调用链添加计时/日志:
1. FOR each CONFIRMED suspicion without measured data:
     Add timing wrapper around target function/I/O call
     Add counter for I/O round-trips if network/DB suspected
     (cross-service: instrument in the correct service's codebase)
2. Re-run test_command (3 runs, median)
3. Collect per-function measurements from logs
4. Record list of instrumented files (may span multiple services)
插桩类型适用场景示例
计时包装针对未解决的疑点始终使用
time.perf_counter()
around function call
I/O调用计数器怀疑存在网络或数据库瓶颈时统计循环中的HTTP请求、数据库查询次数
内存快照怀疑存在内存累积时
tracemalloc.get_traced_memory()
before/after
保留插桩代码。执行器会复用这些插桩代码进行优化后的按函数对比,完成后再清理。需在输出中报告
instrumented_files

Phase 4: Build Performance Map

阶段4:生成性能图谱

Standardized format — feeds into
.optimization/{slug}/context.md
for downstream consumption.
yaml
performance_map:
  test_command: "uv run pytest tests/e2e/test_example.py -s"
  baseline:
    wall_time_ms: 7280
    cpu_time_ms: 850
    memory_peak_mb: 256
    memory_delta_mb: 45
    io_read_bytes: 1200000
    io_write_bytes: 500000
    http_round_trips: 13
  steps:                          # service field present only in multi-service topology
    - id: "1"
      function: "process_job"
      location: "app/services/job_processor.py:45"
      service: "api"             # optional — which service owns this step
      wall_time_ms: 7200
      time_share_pct: 99
      type: "function_call"
      children:
        - id: "1.1"
          function: "translate_binary"
          wall_time_ms: 7100
          type: "function_call"
          children:
            - id: "1.1.1"
              function: "tikal_extract"
              service: "tikal"   # cross-service: code traced into submodule
              wall_time_ms: 2800
              type: "http_call"
              http_round_trips: 1
            - id: "1.1.2"
              function: "mt_translate"
              service: "mt-engine"
              wall_time_ms: 3500
              type: "http_call"
              http_round_trips: 13
  bottleneck_classification: "I/O-Network"
  bottleneck_detail: "13 sequential HTTP calls to MT service (3500ms)"
  top_bottlenecks:
    - step: "1.1.2", type: "I/O-Network", share: 48%
    - step: "1.1.1", type: "I/O-Network", share: 38%

标准化格式——将结果写入
.optimization/{slug}/context.md
以供下游流程使用。
yaml
performance_map:
  test_command: "uv run pytest tests/e2e/test_example.py -s"
  baseline:
    wall_time_ms: 7280
    cpu_time_ms: 850
    memory_peak_mb: 256
    memory_delta_mb: 45
    io_read_bytes: 1200000
    io_write_bytes: 500000
    http_round_trips: 13
  steps:                          # service field present only in multi-service topology
    - id: "1"
      function: "process_job"
      location: "app/services/job_processor.py:45"
      service: "api"             # optional — which service owns this step
      wall_time_ms: 7200
      time_share_pct: 99
      type: "function_call"
      children:
        - id: "1.1"
          function: "translate_binary"
          wall_time_ms: 7100
          type: "function_call"
          children:
            - id: "1.1.1"
              function: "tikal_extract"
              service: "tikal"   # cross-service: code traced into submodule
              wall_time_ms: 2800
              type: "http_call"
              http_round_trips: 1
            - id: "1.1.2"
              function: "mt_translate"
              service: "mt-engine"
              wall_time_ms: 3500
              type: "http_call"
              http_round_trips: 13
  bottleneck_classification: "I/O-Network"
  bottleneck_detail: "13 sequential HTTP calls to MT service (3500ms)"
  top_bottlenecks:
    - step: "1.1.2", type: "I/O-Network", share: 48%
    - step: "1.1.1", type: "I/O-Network", share: 38%

Phase 5: Report

阶段5:生成报告

Report Structure

报告结构

profile_result:
  entry_point_info:
    type: <string>                     # "api_endpoint" | "function" | "pipeline"
    location: <string>                 # file:line
    route: <string|null>               # API route (if endpoint)
    function: <string>                 # Entry point function name
  performance_map: <object>            # Full map from Phase 4
  bottleneck_classification: <string>  # Primary bottleneck type
  bottleneck_detail: <string>          # Human-readable description
  top_bottlenecks:
    - step, type, share, description
  optimization_hints:                  # CONFIRMED suspicions only (Phase 2)
    - hint with evidence
  suspicion_stack:                     # Full audit trail (confirmed + dismissed)
    - category: <string>
      location: <string>
      description: <string>
      verdict: <string>               # "confirmed" | "dismissed"
      evidence: <string>
      verification_note: <string>
  e2e_test:
    command: <string|null>             # E2E safety test command (from Phase 0)
    source: <string>                   # user / route / function / module / none
  instrumented_files: [<string>]       # Files with active instrumentation (empty if non-invasive only)
  wrong_tool_indicators: []            # Empty = proceed, non-empty = exit
profile_result:
  entry_point_info:
    type: <string>                     # "api_endpoint" | "function" | "pipeline"
    location: <string>                 # file:line
    route: <string|null>               # API route (if endpoint)
    function: <string>                 # Entry point function name
  performance_map: <object>            # Full map from Phase 4
  bottleneck_classification: <string>  # Primary bottleneck type
  bottleneck_detail: <string>          # Human-readable description
  top_bottlenecks:
    - step, type, share, description
  optimization_hints:                  # CONFIRMED suspicions only (Phase 2)
    - hint with evidence
  suspicion_stack:                     # Full audit trail (confirmed + dismissed)
    - category: <string>
      location: <string>
      description: <string>
      verdict: <string>               # "confirmed" | "dismissed"
      evidence: <string>
      verification_note: <string>
  e2e_test:
    command: <string|null>             # E2E safety test command (from Phase 0)
    source: <string>                   # user / route / function / module / none
  instrumented_files: [<string>]       # Files with active instrumentation (empty if non-invasive only)
  wrong_tool_indicators: []            # Empty = proceed, non-empty = exit

Wrong Tool Indicators

不适用本工具的标识

IndicatorCondition
external_service_no_alternative
90%+ measured time in external service, no batch/cache/parallel path
within_industry_norm
Measured time within expected range for operation type
infrastructure_bound
Bottleneck is hardware (measured via system metrics)
already_optimized
Code already uses best patterns (confirmed by suspicion scan)

标识触发条件
external_service_no_alternative
90%以上的测量时间消耗在外部服务,且无批量/缓存/并行优化路径
within_industry_norm
测量时间符合该操作类型的行业预期范围
infrastructure_bound
瓶颈为硬件问题(通过系统指标测量确认)
already_optimized
代码已采用最佳实践(通过疑点扫描确认)

Error Handling

错误处理

ErrorRecovery
Cannot resolve entry pointBlock: "file/function not found at {path}"
Test command fails on unmodified codeBlock: "test fails before profiling — fix test first"
Profiler not available for stackFall back to invasive instrumentation (Phase 3 Step 2)
Instrumentation breaks testsRevert immediately:
git checkout -- .
Call chain too deep (> 5 levels)Stop at depth 5, note truncation
Cannot classify step typeDefault to "Unknown", use measured time
No I/O detected (pure CPU)Classify as CPU, focus on algorithm profiling

错误恢复措施
无法解析入口点阻塞:"file/function not found at {path}"
未修改代码时测试命令执行失败阻塞:"test fails before profiling — fix test first"
当前技术栈无可用的剖析工具Fallback到侵入式插桩(阶段3步骤2)
插桩导致测试失败立即回滚:
git checkout -- .
调用链过深(超过5层)在5层深度处停止,记录截断说明
无法对步骤类型进行分类默认归类为"Unknown",使用测量时间
未检测到I/O(纯CPU场景)归类为CPU类型,重点进行算法剖析

References

参考文档

  • bottleneck_classification.md — classification taxonomy
  • latency_estimation.md — latency heuristics (fallback for static-only mode)
  • shared/references/ci_tool_detection.md
    — stack/tool detection
  • shared/references/benchmark_generation.md
    — benchmark templates per stack

  • bottleneck_classification.md — 分类体系
  • latency_estimation.md — 延迟估算规则(纯静态模式下的fallback方案)
  • shared/references/ci_tool_detection.md
    — 技术栈/工具检测
  • shared/references/benchmark_generation.md
    — 各技术栈的基准测试模板

Definition of Done

完成标准

  • Test command discovered or created for optimization target
  • E2E safety test discovered (or documented as unavailable)
  • Baseline measured: wall time, CPU, memory (3 runs, median)
  • Call graph traced and function bodies read
  • Suspicion stack built: each suspicion verified and mapped to instrumentation point
  • Deep profile completed (non-invasive preferred, invasive if needed)
  • Instrumented files reported (cleanup deferred to executor)
  • Performance map built in standardized format (real measurements)
  • Top 3 bottlenecks identified from measured data
  • Wrong tool indicators evaluated from real metrics
  • optimization_hints contain only CONFIRMED suspicions with measurement evidence
  • Report returned to coordinator

Version: 3.0.0 Last Updated: 2026-03-15
  • 已为优化目标找到或创建测试命令
  • 已找到端到端安全测试(或记录为无可用测试)
  • 已完成基准测量:挂钟时间、CPU、内存(3次运行,取中位数)
  • 已追踪调用链并阅读函数体代码
  • 已构建疑点栈:每个疑点均已验证并映射到插桩点
  • 已完成深度剖析(优先非侵入式,必要时使用侵入式)
  • 已报告插桩文件列表(清理工作由执行器延迟执行)
  • 已生成标准化格式的性能图谱(基于真实测量数据)
  • 已从测量数据中识别出Top 3瓶颈
  • 已基于真实指标评估是否适用本工具
  • optimization_hints仅包含有测量证据的已确认疑点
  • 已向协调器返回报告

版本: 3.0.0 最后更新日期: 2026-03-15