ln-811-performance-profiler

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Paths: File paths (
shared/
,
references/
,
../ln-*
) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

路径说明： 文件路径（
shared/
、
references/
、
../ln-*
）均相对于技能仓库根目录。若在当前工作目录未找到，请定位至本SKILL.md所在目录，再向上一级即为仓库根目录。

ln-811-performance-profiler

Type: L3 Worker Category: 8XX Optimization

Runtime profiler that executes the optimization target, measures multiple metrics (CPU, memory, I/O, time), instruments code for per-function breakdown, and produces a standardized performance map from real data.

类型： L3 Worker 分类： 8XX 优化类

一款运行时性能分析工具，可执行优化目标、测量多类指标（CPU、内存、I/O、时间）、对代码进行插桩以实现按函数拆解分析，并基于真实数据生成标准化性能图谱。

Overview

概述

Aspect	Details
Input	Problem statement: target (file/endpoint/pipeline) + observed metric
Output	Performance map (multi-metric, per-function), suspicion stack, bottleneck classification
Pattern	Discover test → Baseline run → Static analysis → Deep profile → Performance map → Report

维度	详情
输入	问题描述：优化目标（文件/接口/流水线）+ 观测指标
输出	多指标按函数拆分的性能图谱、疑点栈、瓶颈分类
流程模式	测试发现 → 基准运行 → 静态分析 → 深度剖析 → 性能图谱 → 报告生成

Workflow

工作流程

Phases: Test Discovery → Baseline Run → Static Analysis → Deep Profile → Performance Map → Report

阶段划分： 测试发现 → 基准运行 → 静态分析 → 深度剖析 → 性能图谱 → 报告生成

Phase 0: Test Discovery/Creation

阶段0：测试用例发现/创建

MANDATORY READ: Load

shared/references/ci_tool_detection.md

for test framework detection. MANDATORY READ: Load

shared/references/benchmark_generation.md

for auto-generating benchmarks when none exist.

Find or create commands that exercise the optimization target. Two outputs:

test_command

(profiling/measurement) and

e2e_test_command

(functional safety gate).

必读文档： 加载

shared/references/ci_tool_detection.md

以进行测试框架检测。 必读文档： 加载

shared/references/benchmark_generation.md

以在无现有基准测试时自动生成。

查找或创建可执行优化目标的命令，输出两个结果：

test_command

（用于性能剖析/指标测量）和

e2e_test_command

（用于功能安全校验）。

Step 1: Discover test_command

步骤1：查找test_command

Priority	Method	Action
1	User-provided	User specifies test command or API endpoint
2	Discover existing E2E test	Grep test files for target entry point (stop at first match)
3	Create test script	Generate per `shared/references/benchmark_generation.md` to `.optimization/{slug}/profile_test.sh`

E2E discovery protocol (stop at first match):

Priority	Method	How
1	Route-based search	Grep e2e/integration test files for entry point route
2	Function-based search	Grep for entry point function name
3	Module-based search	Grep for import of entry point module

Test creation (if no existing test found):

Target Type	Generated Script
API endpoint	`curl -w "%{time_total}" -o /dev/null -s {endpoint}`
Function	Stack-specific benchmark per `shared/references/benchmark_generation.md`
Pipeline	Full pipeline invocation with test input

优先级	方法	操作
1	用户提供	用户指定测试命令或API接口
2	查找现有端到端测试	在测试文件中搜索目标入口点（找到首个匹配项即停止）
3	创建测试脚本	根据 `shared/references/benchmark_generation.md` 生成脚本至 `.optimization/{slug}/profile_test.sh`

端到端测试发现规则（找到首个匹配项即停止）：

优先级	方法	操作方式
1	基于路由搜索	在端到端/集成测试文件中搜索入口点路由
2	基于函数搜索	搜索入口点函数名
3	基于模块搜索	搜索入口点模块的导入语句

测试用例创建（无现有测试时）：

目标类型	生成的脚本
API接口	`curl -w "%{time_total}" -o /dev/null -s {endpoint}`
函数	根据 `shared/references/benchmark_generation.md` 生成对应技术栈的基准测试
流水线	使用测试输入调用完整流水线

Step 2: Discover e2e_test_command

步骤2：查找e2e_test_command

test_command

came from E2E discovery (Step 1 priority 2):

e2e_test_command = test_command

Otherwise, run E2E discovery protocol again (same 3-priority table) to find a separate functional safety test.

If not found:

e2e_test_command = null

, log:

WARNING: No e2e test covers {entry_point}. Full test suite serves as functional gate.

如果

test_command

来自端到端测试发现（步骤1优先级2）：

e2e_test_command = test_command

。

否则，再次执行端到端测试发现规则（使用相同的3级优先级表格）以找到独立的功能安全测试用例。

如果未找到：

e2e_test_command = null

，并记录日志：

WARNING: No e2e test covers {entry_point}. Full test suite serves as functional gate.

Output

输出结果

Field	Description
`test_command`	Command for profiling/measurement
`e2e_test_command`	Command for functional safety gate (may equal test_command, or null)
`e2e_test_source`	Discovery method: user / route / function / module / none

字段	说明
`test_command`	用于性能剖析/指标测量的命令
`e2e_test_command`	用于功能安全校验的命令（可能与test_command相同，或为null）
`e2e_test_source`	发现方式：user / route / function / module / none

Phase 1: Baseline Run (Multi-Metric)

阶段1：基准运行（多指标测量）

Run

test_command

with system-level profiling. Capture simultaneously:

Metric	How to Capture	When
Wall time	`time` wrapper or test harness	Always
CPU time (user+sys)	`/usr/bin/time -v` or language profiler	Always
Memory peak (RSS)	`/usr/bin/time -v` (Max RSS) or `tracemalloc` / `process.memoryUsage()`	Always
I/O bytes	`/usr/bin/time -v` or structured logs	If I/O suspected
HTTP round-trips	Count from structured logs or application metrics	If network I/O in call graph
GPU utilization	`nvidia-smi --query-gpu`	Only if CUDA/GPU detected in stack

使用系统级性能剖析工具运行

test_command

，同时捕获以下指标：

指标	捕获方式	适用场景
挂钟时间	`time` 命令包装器或测试框架	始终捕获
CPU时间（用户态+内核态）	`/usr/bin/time -v` 或对应语言的性能剖析工具	始终捕获
内存峰值（RSS）	`/usr/bin/time -v` （Max RSS）或 `tracemalloc` / `process.memoryUsage()`	始终捕获
I/O字节数	`/usr/bin/time -v` 或结构化日志	当怀疑存在I/O瓶颈时
HTTP往返次数	从结构化日志或应用指标中统计	当调用图中存在网络I/O时
GPU利用率	`nvidia-smi --query-gpu`	仅当检测到技术栈中使用CUDA/GPU时

Baseline Protocol

基准运行规则

Parameter	Value
Runs	3
Metric	Median
Warm-up	1 discarded run
Output	`baseline` — multi-metric snapshot

参数	取值
运行次数	3
统计方式	中位数
预热处理	丢弃1次运行结果
输出	`baseline` ——多指标快照

Phase 2: Static Analysis → Instrumentation Points

阶段2：静态分析 → 插桩点确定

MANDATORY READ: Load bottleneck_classification.md

Trace call chain from code + build suspicion stack. Purpose: guide WHERE to instrument in Phase 3.

必读文档： 加载bottleneck_classification.md

从代码中追踪调用链并构建疑点栈。目的： 指导阶段3的代码插桩位置。

Step 1: Trace Call Chain

步骤1：追踪调用链

Starting from entry point, trace depth-first (max depth 5). At each step, READ the full function body.

Cross-service tracing: If

service_topology

is available from coordinator and a step makes an HTTP/gRPC call to another service whose code is accessible:

Situation	Action
HTTP call to service with code in submodule/monorepo	Follow into that service's handler: resolve route → trace handler code (depth resets to 0 for the new service)
HTTP call to service without accessible code	Classify as External, record latency estimate
gRPC/message queue to known service	Same as HTTP — follow into handler if code accessible

Record

service: "{service_name}"

on each step to track which service owns it. The performance_map

steps

tree can span multiple services.

Depth-First Rule: If code of the called service is accessible — ALWAYS profile INSIDE. NEVER classify an accessible service as "External/slow" without profiling its internals. "Slow" is a symptom, not a diagnosis.

5 Whys for each bottleneck: Before reporting a bottleneck, chain "why?" until you reach config/architecture level:

"What is slow?" → alignment service (5.9s) 2. "Why?" → 6 pairs × ~1s each 3. "Why ~1s per pair?" → O(n²) mwmf computation 4. "Why O(n²)?" → library default, not production config 5. "Why default?" →
```
matching_methods
```
not configured → root cause = config

从入口点开始，采用深度优先方式追踪（最大深度5层）。每一步都需完整阅读函数体代码。

跨服务追踪： 如果从协调器处获取到

service_topology

，且某一步骤调用了另一个代码可访问的服务（HTTP/gRPC调用）：

场景	操作
调用代码位于子模块/单体仓库中的服务的HTTP请求	追踪至该服务的处理器：解析路由 → 追踪处理器代码（新服务的追踪深度重置为0）
调用代码不可访问的服务的HTTP请求	归类为外部服务，记录延迟估算值
调用已知服务的gRPC/消息队列请求	与HTTP场景相同——若代码可访问则追踪至处理器

在每一步记录

service: "{service_name}"

以追踪该步骤所属的服务。性能图谱的

steps

树可跨多个服务。

深度优先规则： 若被调用服务的代码可访问，必须对其内部进行性能剖析。未对内部进行剖析时，绝不能将可访问服务归类为“外部/缓慢”。“缓慢”是症状，而非诊断结果。

针对每个瓶颈的5Why分析法： 在报告瓶颈前，连续追问“为什么？”直至找到配置/架构层面的根因：

“什么模块慢？” → 对齐服务（5.9秒） 2. “为什么慢？” → 6组数据 × 每组约1秒 3. “每组为什么约1秒？” → O(n²)复杂度的mwmf计算 4. “为什么是O(n²)复杂度？” → 库默认配置，而非生产环境配置 5. “为什么使用默认配置？” →
```
matching_methods
```
未配置 → 根因=配置问题

Step 2: Classify & Suspicion Scan

步骤2：分类与疑点扫描

For each step, classify by type (CPU, I/O-DB, I/O-Network, I/O-File, Architecture, External, Cache) and scan for performance concerns.

Suspicion checklist (minimum, not limitation):

Category	What to Look For
Connection management	Client created per-request? Missing pooling? Missing reuse?
Data flow	Data read multiple times? Over-fetching? Unnecessary transforms?
Async patterns	Sync I/O in async context? Sequential awaits without data dependency?
Resource lifecycle	Unclosed connections? Temp files? Memory accumulation in loop?
Configuration	Hardcoded timeouts? Default pool sizes? Missing batch size config?
Redundant work	Same validation at multiple layers? Same data loaded twice?
Architecture	N+1 in loop? Batch API unused? Cache infra unused? Sequential-when-parallel?
(open)	Anything else spotted — checklist does not limit findings

对每个步骤按类型分类（CPU、I/O-数据库、I/O-网络、I/O-文件、架构、外部服务、缓存），并扫描性能问题疑点。

疑点检查清单（基础项，并非全部）：

类别	检查内容
连接管理	是否每次请求创建新客户端？是否缺少连接池？是否未复用连接？
数据流	是否重复读取数据？是否过度获取数据？是否存在不必要的数据转换？
异步模式	异步上下文是否存在同步I/O？是否存在无数据依赖的顺序等待？
资源生命周期	是否存在未关闭的连接？是否残留临时文件？循环中是否存在内存累积？
配置	是否存在硬编码超时？是否使用默认连接池大小？是否缺少批量处理大小配置？
冗余操作	是否在多层重复校验？是否重复加载相同数据？
架构设计	循环中是否存在N+1问题？是否未使用批量API？是否未利用缓存基础设施？是否可并行却采用串行执行？
（其他）	任何其他发现——清单不限制可发现的问题

Step 2b: Suspicion Deduplication

步骤2b：疑点去重

MANDATORY READ: Load

shared/references/output_normalization.md

After generating suspicions across all call chain steps, normalize and deduplicate per §1-§2:

Normalize suspicion descriptions (replace specific values with placeholders)
Group identical suspicions across different steps → merge into single entry with
```
affected_steps: [list]
```
Example: "Missing connection pooling" found in steps 1.1, 1.2, 1.3 → one suspicion with
```
affected_steps: ["1.1", "1.2", "1.3"]
```

必读文档： 加载

shared/references/output_normalization.md

在所有调用链步骤中生成疑点后，根据§1-§2进行标准化和去重：

标准化疑点描述（将具体值替换为占位符）
将不同步骤中相同的疑点分组 → 合并为单个条目，添加
```
affected_steps: [列表]
```
示例：在步骤1.1、1.2、1.3中均发现“缺少连接池” → 合并为一个疑点，
```
affected_steps: ["1.1", "1.2", "1.3"]
```

Step 3: Verify & Map to Instrumentation Points

步骤3：验证与插桩点映射

FOR each suspicion:
  1. VERIFY: follow code to confirm or dismiss
  2. VERDICT: CONFIRMED → map to instrumentation point | DISMISSED → log reason
  3. For each CONFIRMED suspicion, identify:
     - function to wrap with timing
     - I/O call to count
     - memory allocation to track

FOR each suspicion:
  1. VERIFY: follow code to confirm or dismiss
  2. VERDICT: CONFIRMED → map to instrumentation point | DISMISSED → log reason
  3. For each CONFIRMED suspicion, identify:
     - function to wrap with timing
     - I/O call to count
     - memory allocation to track

Profiler Selection (per stack)

性能剖析工具选择（按技术栈）

Stack	Non-invasive profiler	Invasive (if non-invasive insufficient)
Python	`py-spy` , `cProfile`	`time.perf_counter()` decorators
Node.js	`clinic` , `--prof`	`console.time()` wrappers
Go	`pprof` (built-in)	Usually not needed
.NET	`dotnet-trace`	`Stopwatch` wrappers
Rust	`cargo flamegraph`	`std::time::Instant`

Stack detection: per

shared/references/ci_tool_detection.md

技术栈	非侵入式剖析工具	侵入式工具（当非侵入式工具不足时）
Python	`py-spy` , `cProfile`	`time.perf_counter()` decorators
Node.js	`clinic` , `--prof`	`console.time()` wrappers
Go	`pprof` (built-in)	通常无需使用
.NET	`dotnet-trace`	`Stopwatch` wrappers
Rust	`cargo flamegraph`	`std::time::Instant`

技术栈检测： 依据

shared/references/ci_tool_detection.md

执行。

Phase 3: Deep Profile

阶段3：深度性能剖析

Profiler Hierarchy (escalate as needed)

剖析工具层级（按需升级）

Level	Tool Examples	What It Shows	When to Use
1	`py-spy` , `cProfile` , `pprof` , `dotnet-trace`	Function-level hotspots	Always — first pass
2	`line_profiler` , per-line timing	Line-level timing in hotspot function	Hotspot function found but cause unclear
3	`tracemalloc` , `memory_profiler`	Per-line memory allocation	Memory metrics abnormal in baseline

层级	工具示例	展示内容	适用场景
1	`py-spy` , `cProfile` , `pprof` , `dotnet-trace`	函数级热点	始终优先使用
2	`line_profiler` , per-line timing	热点函数的行级计时	找到热点函数但原因不明时
3	`tracemalloc` , `memory_profiler`	行级内存分配	基准运行中内存指标异常时

Step 1: Non-Invasive Profiling (preferred)

步骤1：非侵入式性能剖析（优先选择）

Run

test_command

with Level 1 profiler to get per-function breakdown without code changes.

使用1级剖析工具运行

test_command

，无需修改代码即可获取按函数拆分的分析结果。

Step 2: Escalation Decision

步骤2：升级决策

After Level 1 profiler run, evaluate result against suspicion stack from Phase 2:

Profiler Result	Action
Hotspot function identified, time breakdown confirms suspicions	DONE — proceed to Phase 4
Hotspot identified but internal cause unclear (CPU vs I/O inside one function)	Escalate to Level 2 (line-level timing)
Memory baseline abnormal (peak or delta)	Escalate to Level 3 (memory profiler)
Multiple suspicions unresolved — profiler granularity insufficient	Go to Step 3 (targeted instrumentation)
Profiler unavailable or overhead > 20% of wall time	Go to Step 3 (targeted instrumentation)

运行1级剖析工具后，结合阶段2的疑点栈评估结果：

剖析结果	操作
已识别热点函数，时间拆分结果验证了疑点	完成，进入阶段4
已识别热点函数但内部原因不明（函数内CPU与I/O瓶颈区分不清）	升级至2级（行级计时）
基准内存指标异常（峰值或增量）	升级至3级（内存剖析工具）
多个疑点未解决——剖析工具粒度不足	进入步骤3（定向插桩）
剖析工具不可用或开销超过挂钟时间的20%	进入步骤3（定向插桩）

Step 3: Targeted Instrumentation (proactive)

步骤3：定向插桩（主动式）

Add timing/logging along the call stack at instrumentation points identified in Phase 2 Step 3:

1. FOR each CONFIRMED suspicion without measured data:
     Add timing wrapper around target function/I/O call
     Add counter for I/O round-trips if network/DB suspected
     (cross-service: instrument in the correct service's codebase)
2. Re-run test_command (3 runs, median)
3. Collect per-function measurements from logs
4. Record list of instrumented files (may span multiple services)

Instrumentation Type	When	Example
Timing wrapper	Always for unresolved suspicions	`time.perf_counter()` around function call
I/O call counter	Network or DB bottleneck suspected	Count HTTP requests, DB queries in loop
Memory snapshot	Memory accumulation suspected	`tracemalloc.get_traced_memory()` before/after

KEEP instrumentation in place. The executor reuses it for post-optimization per-function comparison, then cleans up after strike. Report

instrumented_files

in output.

在阶段2步骤3确定的插桩点处，沿调用链添加计时/日志：

1. FOR each CONFIRMED suspicion without measured data:
     Add timing wrapper around target function/I/O call
     Add counter for I/O round-trips if network/DB suspected
     (cross-service: instrument in the correct service's codebase)
2. Re-run test_command (3 runs, median)
3. Collect per-function measurements from logs
4. Record list of instrumented files (may span multiple services)

插桩类型	适用场景	示例
计时包装	针对未解决的疑点始终使用	`time.perf_counter()` around function call
I/O调用计数器	怀疑存在网络或数据库瓶颈时	统计循环中的HTTP请求、数据库查询次数
内存快照	怀疑存在内存累积时	`tracemalloc.get_traced_memory()` before/after

保留插桩代码。执行器会复用这些插桩代码进行优化后的按函数对比，完成后再清理。需在输出中报告

instrumented_files

。

Phase 4: Build Performance Map

阶段4：生成性能图谱

Standardized format — feeds into

.optimization/{slug}/context.md

for downstream consumption.

yaml

performance_map:
  test_command: "uv run pytest tests/e2e/test_example.py -s"
  baseline:
    wall_time_ms: 7280
    cpu_time_ms: 850
    memory_peak_mb: 256
    memory_delta_mb: 45
    io_read_bytes: 1200000
    io_write_bytes: 500000
    http_round_trips: 13
  steps:                          # service field present only in multi-service topology
    - id: "1"
      function: "process_job"
      location: "app/services/job_processor.py:45"
      service: "api"             # optional — which service owns this step
      wall_time_ms: 7200
      time_share_pct: 99
      type: "function_call"
      children:
        - id: "1.1"
          function: "translate_binary"
          wall_time_ms: 7100
          type: "function_call"
          children:
            - id: "1.1.1"
              function: "tikal_extract"
              service: "tikal"   # cross-service: code traced into submodule
              wall_time_ms: 2800
              type: "http_call"
              http_round_trips: 1
            - id: "1.1.2"
              function: "mt_translate"
              service: "mt-engine"
              wall_time_ms: 3500
              type: "http_call"
              http_round_trips: 13
  bottleneck_classification: "I/O-Network"
  bottleneck_detail: "13 sequential HTTP calls to MT service (3500ms)"
  top_bottlenecks:
    - step: "1.1.2", type: "I/O-Network", share: 48%
    - step: "1.1.1", type: "I/O-Network", share: 38%

标准化格式——将结果写入

.optimization/{slug}/context.md

以供下游流程使用。

yaml

performance_map:
  test_command: "uv run pytest tests/e2e/test_example.py -s"
  baseline:
    wall_time_ms: 7280
    cpu_time_ms: 850
    memory_peak_mb: 256
    memory_delta_mb: 45
    io_read_bytes: 1200000
    io_write_bytes: 500000
    http_round_trips: 13
  steps:                          # service field present only in multi-service topology
    - id: "1"
      function: "process_job"
      location: "app/services/job_processor.py:45"
      service: "api"             # optional — which service owns this step
      wall_time_ms: 7200
      time_share_pct: 99
      type: "function_call"
      children:
        - id: "1.1"
          function: "translate_binary"
          wall_time_ms: 7100
          type: "function_call"
          children:
            - id: "1.1.1"
              function: "tikal_extract"
              service: "tikal"   # cross-service: code traced into submodule
              wall_time_ms: 2800
              type: "http_call"
              http_round_trips: 1
            - id: "1.1.2"
              function: "mt_translate"
              service: "mt-engine"
              wall_time_ms: 3500
              type: "http_call"
              http_round_trips: 13
  bottleneck_classification: "I/O-Network"
  bottleneck_detail: "13 sequential HTTP calls to MT service (3500ms)"
  top_bottlenecks:
    - step: "1.1.2", type: "I/O-Network", share: 48%
    - step: "1.1.1", type: "I/O-Network", share: 38%

Phase 5: Report

阶段5：生成报告

Report Structure

报告结构

profile_result:
  entry_point_info:
    type: <string>                     # "api_endpoint" | "function" | "pipeline"
    location: <string>                 # file:line
    route: <string|null>               # API route (if endpoint)
    function: <string>                 # Entry point function name
  performance_map: <object>            # Full map from Phase 4
  bottleneck_classification: <string>  # Primary bottleneck type
  bottleneck_detail: <string>          # Human-readable description
  top_bottlenecks:
    - step, type, share, description
  optimization_hints:                  # CONFIRMED suspicions only (Phase 2)
    - hint with evidence
  suspicion_stack:                     # Full audit trail (confirmed + dismissed)
    - category: <string>
      location: <string>
      description: <string>
      verdict: <string>               # "confirmed" | "dismissed"
      evidence: <string>
      verification_note: <string>
  e2e_test:
    command: <string|null>             # E2E safety test command (from Phase 0)
    source: <string>                   # user / route / function / module / none
  instrumented_files: [<string>]       # Files with active instrumentation (empty if non-invasive only)
  wrong_tool_indicators: []            # Empty = proceed, non-empty = exit

profile_result:
  entry_point_info:
    type: <string>                     # "api_endpoint" | "function" | "pipeline"
    location: <string>                 # file:line
    route: <string|null>               # API route (if endpoint)
    function: <string>                 # Entry point function name
  performance_map: <object>            # Full map from Phase 4
  bottleneck_classification: <string>  # Primary bottleneck type
  bottleneck_detail: <string>          # Human-readable description
  top_bottlenecks:
    - step, type, share, description
  optimization_hints:                  # CONFIRMED suspicions only (Phase 2)
    - hint with evidence
  suspicion_stack:                     # Full audit trail (confirmed + dismissed)
    - category: <string>
      location: <string>
      description: <string>
      verdict: <string>               # "confirmed" | "dismissed"
      evidence: <string>
      verification_note: <string>
  e2e_test:
    command: <string|null>             # E2E safety test command (from Phase 0)
    source: <string>                   # user / route / function / module / none
  instrumented_files: [<string>]       # Files with active instrumentation (empty if non-invasive only)
  wrong_tool_indicators: []            # Empty = proceed, non-empty = exit

Wrong Tool Indicators

不适用本工具的标识

Indicator	Condition
`external_service_no_alternative`	90%+ measured time in external service, no batch/cache/parallel path
`within_industry_norm`	Measured time within expected range for operation type
`infrastructure_bound`	Bottleneck is hardware (measured via system metrics)
`already_optimized`	Code already uses best patterns (confirmed by suspicion scan)

标识	触发条件
`external_service_no_alternative`	90%以上的测量时间消耗在外部服务，且无批量/缓存/并行优化路径
`within_industry_norm`	测量时间符合该操作类型的行业预期范围
`infrastructure_bound`	瓶颈为硬件问题（通过系统指标测量确认）
`already_optimized`	代码已采用最佳实践（通过疑点扫描确认）

Error Handling

错误处理

Error	Recovery
Cannot resolve entry point	Block: "file/function not found at {path}"
Test command fails on unmodified code	Block: "test fails before profiling — fix test first"
Profiler not available for stack	Fall back to invasive instrumentation (Phase 3 Step 2)
Instrumentation breaks tests	Revert immediately: `git checkout -- .`
Call chain too deep (> 5 levels)	Stop at depth 5, note truncation
Cannot classify step type	Default to "Unknown", use measured time
No I/O detected (pure CPU)	Classify as CPU, focus on algorithm profiling

错误	恢复措施
无法解析入口点	阻塞："file/function not found at {path}"
未修改代码时测试命令执行失败	阻塞："test fails before profiling — fix test first"
当前技术栈无可用的剖析工具	Fallback到侵入式插桩（阶段3步骤2）
插桩导致测试失败	立即回滚： `git checkout -- .`
调用链过深（超过5层）	在5层深度处停止，记录截断说明
无法对步骤类型进行分类	默认归类为"Unknown"，使用测量时间
未检测到I/O（纯CPU场景）	归类为CPU类型，重点进行算法剖析

References

参考文档

bottleneck_classification.md — classification taxonomy
latency_estimation.md — latency heuristics (fallback for static-only mode)
```
shared/references/ci_tool_detection.md
```
— stack/tool detection

shared/references/benchmark_generation.md

— benchmark templates per stack

bottleneck_classification.md — 分类体系
latency_estimation.md — 延迟估算规则（纯静态模式下的fallback方案）
```
shared/references/ci_tool_detection.md
```
— 技术栈/工具检测

shared/references/benchmark_generation.md

— 各技术栈的基准测试模板

ln-811-performance-profiler

Original

Translation

ln-811-performance-profiler

ln-811-performance-profiler

Overview

概述

Workflow

工作流程

Phase 0: Test Discovery/Creation

阶段0：测试用例发现/创建

Step 1: Discover test_command

步骤1：查找test_command

Step 2: Discover e2e_test_command

步骤2：查找e2e_test_command

Output

输出结果

Phase 1: Baseline Run (Multi-Metric)

阶段1：基准运行（多指标测量）

Baseline Protocol

基准运行规则

Phase 2: Static Analysis → Instrumentation Points

阶段2：静态分析 → 插桩点确定

Step 1: Trace Call Chain

步骤1：追踪调用链

Step 2: Classify & Suspicion Scan

步骤2：分类与疑点扫描

Step 2b: Suspicion Deduplication

步骤2b：疑点去重

Step 3: Verify & Map to Instrumentation Points

步骤3：验证与插桩点映射

Profiler Selection (per stack)

性能剖析工具选择（按技术栈）

Phase 3: Deep Profile

阶段3：深度性能剖析

Profiler Hierarchy (escalate as needed)

剖析工具层级（按需升级）

Step 1: Non-Invasive Profiling (preferred)

步骤1：非侵入式性能剖析（优先选择）

Step 2: Escalation Decision

步骤2：升级决策

Step 3: Targeted Instrumentation (proactive)

步骤3：定向插桩（主动式）

Phase 4: Build Performance Map

阶段4：生成性能图谱

Phase 5: Report

阶段5：生成报告

Report Structure

报告结构

Wrong Tool Indicators

不适用本工具的标识

Error Handling

错误处理

References

参考文档

Definition of Done

完成标准