dotnet-trace-collect

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

.NET Trace Collect

.NET 跟踪数据收集

This skill helps developers diagnose production performance issues by recommending the right diagnostic tools for their environment, guiding data collection, and suggesting analysis approaches. It does not analyze code for anti-patterns or perform the analysis itself.

本技能通过为开发者推荐适配其环境的诊断工具、指导数据收集并提供分析方法，帮助他们排查生产环境中的性能问题。本技能不会分析代码中的反模式，也不直接执行数据分析操作。

When to Use

适用场景

A developer needs to investigate a production performance issue (high CPU, memory leak, slow requests, excessive GC, networking errors, etc.)
Choosing the right diagnostic tool for a specific runtime, OS, or deployment topology
Setting up and running diagnostic tool commands for data collection
Understanding trade-offs between available tools (e.g. PerfView vs dotnet-trace)
Collecting diagnostics from containerized or Kubernetes workloads

开发者需要排查生产环境中的性能问题（CPU占用过高、内存泄漏、请求缓慢、GC过于频繁、网络错误等）
为特定运行时、操作系统或部署架构选择合适的诊断工具
配置并运行诊断工具命令以收集数据
了解不同可用工具的取舍（例如 PerfView 与 dotnet-trace）
从容器化或 Kubernetes 工作负载中收集诊断数据

When Not to Use

不适用场景

Reviewing source code for performance anti-patterns (use a code review skill instead)
Benchmarking during development (e.g. BenchmarkDotNet setup)
Analyzing collected trace or dump files (this skill recommends tools for analysis, but does not perform it)

审查源代码中的性能反模式（请使用代码审查类技能）
开发阶段的基准测试（例如 BenchmarkDotNet 配置）
分析已收集的跟踪或转储文件（本技能仅推荐分析工具，不执行分析操作）

Inputs

输入项

Input	Required	Description
Symptom	Yes	What the developer is observing (high CPU, memory growth, slow requests, hangs, excessive GC, HTTP 5xx errors, networking timeouts, connection failures, assembly loading failures, etc.)
Runtime	Yes	.NET Framework or modern .NET (and version, especially whether .NET 10+)
OS	Yes	Windows or Linux
Deployment	Yes	Non-container, container, or Kubernetes
Admin privileges	Recommended	Whether the developer has admin/root access on the target machine
Repro characteristics	Recommended	Whether the issue is easy to reproduce or requires a long time to manifest

输入项	是否必填	描述
症状	是	开发者观察到的问题（CPU过高、内存增长、请求缓慢、程序挂起、GC过于频繁、HTTP 5xx错误、网络超时、连接失败、程序集加载失败等）
运行时	是	.NET Framework 或现代.NET（需提供版本，尤其是是否为.NET 10+）
操作系统	是	Windows 或 Linux
部署方式	是	非容器、容器或 Kubernetes
管理员权限	推荐	开发者是否在目标机器上拥有管理员/root权限
复现特征	推荐	问题是否容易复现，或是需要很长时间才会显现

Workflow

工作流程

Step 1: Understand the environment

步骤1：了解环境

Determine or ask the developer to clarify:

Symptom: What they are observing (high CPU, memory leak, slow requests, hangs, excessive GC, HTTP 5xx errors, networking timeouts, connection failures, assembly loading failures, etc.)
Runtime: .NET Framework or modern .NET? If modern .NET, which version? (Especially whether .NET 10 or later.)
OS: Windows or Linux?
Deployment: Running directly on the host, in a container, or in Kubernetes?
Admin privileges: Do they have admin/root access on the target machine or container?
Repro characteristics: Does the issue reproduce quickly, or does it take a long time to manifest?
Workload context: Determine or ask the user if you are running in the context of the workload (i.e., on the same machine or connected to the same environment where the issue is occurring). If so, you can run diagnostic commands directly on their behalf. If not, provide the commands as guidance for the user to run themselves.

Use this information to select the right tool in Step 2.

确定或请开发者明确以下信息：

症状：观察到的具体问题（CPU过高、内存泄漏、请求缓慢、程序挂起、GC过于频繁、HTTP 5xx错误、网络超时、连接失败、程序集加载失败等）
运行时：.NET Framework 还是现代.NET？如果是现代.NET，具体版本是多少？（尤其是是否为.NET 10或更高版本）
操作系统：Windows 还是 Linux？
部署方式：直接在主机运行、在容器中运行，还是在 Kubernetes 中运行？
管理员权限：开发者在目标机器或容器上是否拥有管理员/root权限？
复现特征：问题是否能快速复现，还是需要很长时间才会显现？
工作负载上下文：确定或询问用户是否处于工作负载上下文（即与问题发生环境在同一机器或已连接到该环境）。如果是，可以直接代表用户运行诊断命令；如果不是，则提供命令供用户自行运行。

利用这些信息在步骤2中选择合适的工具。

Step 2: Recommend diagnostic tools

步骤2：推荐诊断工具

Select tools based on the environment using the priority rules below. Once a tool is selected, load the corresponding reference file for detailed command-line usage.

根据以下优先级规则，结合环境选择工具。选定工具后，加载对应的参考文件获取详细的命令行使用说明。

Tool reference lookup

工具参考文件查询

Environment	Reference file(s)
Windows + modern .NET + admin	`references/perfview.md`
Windows + modern .NET, no admin	`references/dotnet-trace-collect.md`
Windows + .NET Framework	`references/perfview.md`
Linux + .NET 10+ + root	`references/dotnet-trace-collect-linux.md`
Linux + pre-.NET 10	`references/dotnet-trace-collect.md`
Linux + native stacks needed	`references/perfcollect.md`
Container/K8s (console access)	`references/dotnet-trace-collect.md` (or `dotnet-trace-collect-linux.md` )
Container/K8s (no console)	`references/dotnet-monitor.md`

环境	参考文件
Windows + 现代.NET + 管理员权限	`references/perfview.md`
Windows + 现代.NET，无管理员权限	`references/dotnet-trace-collect.md`
Windows + .NET Framework	`references/perfview.md`
Linux + .NET 10+ + root权限	`references/dotnet-trace-collect-linux.md`
Linux + .NET 10之前版本	`references/dotnet-trace-collect.md`
Linux + 需要捕获原生调用栈	`references/perfcollect.md`
容器/K8s（可访问控制台）	`references/dotnet-trace-collect.md` （或 `dotnet-trace-collect-linux.md` ）
容器/K8s（无法访问控制台）	`references/dotnet-monitor.md`

Quick decision matrix (first-pass triage)

快速决策矩阵（初步分类）

Environment	Preferred tool	Fallback / Notes
Windows + modern .NET + admin	PerfView	If admin is unavailable, use `dotnet-trace`
Windows + .NET Framework + admin	PerfView	Without admin, there is no trace fallback; for hangs/memory leaks, provide dump commands directly ( `procdump -ma` or Task Manager) since `dump-collect` does not support .NET Framework
Linux + .NET 10+ + root	`dotnet-trace collect-linux`	Use `dotnet-trace` if root or kernel prerequisites are not met
Linux + pre-.NET 10	`dotnet-trace`	Add `perfcollect` when native stacks are needed (requires root)
Linux container/Kubernetes	Console tools if in workload context; `dotnet-monitor` if no console access	See Linux Container / Kubernetes section for details

环境	首选工具	备选方案/说明
Windows + 现代.NET + 管理员权限	PerfView	如果无管理员权限，使用 `dotnet-trace`
Windows + .NET Framework + 管理员权限	PerfView	无管理员权限时，没有跟踪工具备选方案；针对挂起/内存泄漏问题，直接提供转储命令（ `procdump -ma` 或任务管理器），因为 `dump-collect` 不支持.NET Framework
Linux + .NET 10+ + root权限	`dotnet-trace collect-linux`	如果不满足root或内核前置条件，使用 `dotnet-trace`
Linux + .NET 10之前版本	`dotnet-trace`	当需要原生调用栈时，添加 `perfcollect` （需要root权限）
Linux 容器/Kubernetes	处于工作负载上下文时使用控制台工具；无法访问控制台时使用 `dotnet-monitor`	详见Linux容器/Kubernetes部分的说明

Windows (non-container, modern .NET)

Windows（非容器，现代.NET）

PerfView (preferred) — produces richer ETW-based data; requires admin privileges. For slow requests, add
```
/ThreadTime
```
to capture thread-level wait and block detail.
dotnet-trace
— fallback when admin privileges are not available.
For long-running repros: use PerfView with a
```
/StopOn
```
trigger that fires on the symptom you want to capture (e.g.,
```
/StopOnPerfCounter
```
,
```
/StopOnGCEvent
```
,
```
/StopOnException
```
) and a circular buffer (
```
/CircularMB
```
+
```
/BufferSizeMB
```
). Critical: the stop trigger must fire on the interesting event, not the recovery. The circular buffer continuously overwrites old data, so if you trigger on recovery, the buffer may have already overwritten the interesting behavior by the time collection stops. Only add
```
/StartOn
```
if the start event is known to precede the stop event. For slow requests, do not include a stop trigger by default — let the user design one based on their specific scenario.

PerfView（首选）—— 生成更丰富的基于ETW的数据；需要管理员权限。针对请求缓慢问题，添加
```
/ThreadTime
```
参数以捕获线程级别的等待和阻塞细节。
dotnet-trace
—— 无管理员权限时的备选方案。
针对长时间复现的问题：使用带有
```
/StopOn
```
触发条件的PerfView，触发条件基于要捕获的症状（例如
```
/StopOnPerfCounter
```
、
```
/StopOnGCEvent
```
、
```
/StopOnException
```
），并配合循环缓冲区（
```
/CircularMB
```
+
```
/BufferSizeMB
```
）。关键注意事项：停止触发条件必须针对目标事件，而非恢复事件。 循环缓冲区会持续覆盖旧数据，因此如果在恢复时触发，收集停止时缓冲区可能已经覆盖了关键行为。仅当明确启动事件先于停止事件时，才添加
```
/StartOn
```
参数。针对请求缓慢问题，默认不设置停止触发条件——由用户根据具体场景自行设计。

Windows containers

Windows容器

PerfView — most Windows containers (including Kubernetes on Windows) use process-isolation by default. Collect from the host with
```
/EnableEventsInContainers
```
. After collection, you have two options:
- Analyze locally while the container is still running — PerfView can reach into the live container to resolve symbols, so you can open the trace immediately on the host machine.
- Analyze off-machine — before the container shuts down, copy the
```
.etl.zip
```
  into the container and run
```
PerfViewCollect merge /ImageIDsOnly
```
  inside it to embed symbol information. Then copy the merged trace out. Without this merge step, symbols for binaries inside the container will be unresolvable on other machines.
For the less common Hyper-V containers, collect inside the container directly. See references/perfview.md for detailed commands.
dotnet-monitor
, dotnet-trace
— inside the container if the tools are installed in the image. For dumps, invoke the dump-collect
skill.

PerfView—— 大多数Windows容器（包括Windows上的Kubernetes）默认使用进程隔离模式。从主机收集数据时添加
```
/EnableEventsInContainers
```
参数。收集完成后，有两种选择：
- 容器运行时本地分析—— PerfView可以直接访问运行中的容器解析符号，因此可以立即在主机上打开跟踪文件进行分析。
- 离线分析—— 在容器关闭前，将
```
.etl.zip
```
  文件复制到容器内，并在容器中运行
```
PerfViewCollect merge /ImageIDsOnly
```
  以嵌入符号信息。然后将合并后的跟踪文件复制出来。如果不执行此合并步骤，其他机器将无法解析容器内二进制文件的符号。
对于较罕见的Hyper-V容器，直接在容器内收集数据。详见 references/perfview.md 中的详细命令。
dotnet-monitor
、dotnet-trace
—— 如果工具已安装在镜像中，可在容器内使用。如需转储文件，调用 dump-collect
技能。

Windows (.NET Framework)

Windows（.NET Framework）

PerfView — the primary diagnostic tool for .NET Framework on Windows. Requires admin.
Same trigger guidance for long repros: use
```
/StopOn
```
triggers that fire on the symptom (e.g.,
```
/StopOnPerfCounter
```
,
```
/StopOnGCEvent
```
,
```
/StopOnException
```
) with
```
/CircularMB
```
+
```
/BufferSizeMB
```
.
Without admin: PerfView requires admin, and there are no alternative trace tools for .NET Framework. Process dumps can still be captured without admin — provide dump commands directly (e.g.,
```
procdump -ma <PID>
```
or Task Manager) since the
```
dump-collect
```
skill does not support .NET Framework. Dumps can help diagnose hangs and memory leaks. However, for high CPU, slow requests, and excessive GC, there is no way to investigate on .NET Framework without admin access. Advise the user to obtain admin privileges.

PerfView—— Windows上.NET Framework的主要诊断工具。需要管理员权限。
针对长时间复现问题的触发条件指导：使用基于症状的
```
/StopOn
```
触发条件（例如
```
/StopOnPerfCounter
```
、
```
/StopOnGCEvent
```
、
```
/StopOnException
```
），配合
```
/CircularMB
```
+
```
/BufferSizeMB
```
。
无管理员权限：PerfView需要管理员权限，且.NET Framework没有其他跟踪工具备选方案。仍可在无管理员权限下捕获进程转储——直接提供转储命令（例如
```
procdump -ma <PID>
```
或任务管理器），因为
```
dump-collect
```
技能不支持.NET Framework。转储文件可帮助排查挂起和内存泄漏问题。但针对CPU过高、请求缓慢和GC过于频繁问题，无管理员权限时无法在.NET Framework环境下排查。建议用户获取管理员权限。

Linux (non-container, .NET 10+)

Linux（非容器，.NET 10+）

dotnet-trace collect-linux
(preferred) — uses
```
perf_events
```
for richer traces including native call stacks and kernel events. Captures machine-wide by default (no PID required). Requires root and kernel >= 6.4.
dotnet-trace
— fallback when root privileges are not available or kernel requirements are not met. Managed stacks only.

dotnet-trace collect-linux
（首选）—— 使用
```
perf_events
```
生成更丰富的跟踪数据，包括原生调用栈和内核事件。默认捕获整个机器的数据（无需指定PID）。需要root权限和内核版本 >=6.4。
dotnet-trace
—— 无root权限或不满足内核要求时的备选方案。仅捕获托管调用栈。

Linux (non-container, pre-.NET 10)

Linux（非容器，.NET 10之前版本）

dotnet-trace
(preferred) — managed trace collection; no admin required.
perfcollect
— when native call stacks are needed (requires admin/root).

dotnet-trace
（首选）—— 托管跟踪数据收集；无需管理员权限。
perfcollect
—— 当需要原生调用栈时使用（需要管理员/root权限）。

Linux Container / Kubernetes

Linux容器/Kubernetes

If running in the context of the workload (i.e., you have console access to the container), prefer console-based tools. These are easier to set up than

dotnet-monitor

, which requires authentication configuration and sidecar deployment:

dotnet-trace collect-linux
(.NET 10+ with root) — produces the richest traces including native call stacks and kernel events.
dotnet-trace
— inside the container if the tool is installed in the image. For dumps, invoke the dump-collect
skill.
perfcollect
— inside the container when native stacks are needed on pre-.NET 10 (requires
```
SYS_ADMIN
```
/
```
--privileged
```
).

If not running in the workload context (no console access), or if

dotnet-monitor

is already deployed:

dotnet-monitor
— designed for containers; runs as a sidecar. No tools needed in the app container. Easiest option when console access is not available.

如果处于工作负载上下文（即可以访问容器控制台），优先使用控制台工具。这些工具比

dotnet-monitor

更易设置，后者需要配置认证和边车部署：

dotnet-trace collect-linux
（.NET 10+ 且有root权限）—— 生成最丰富的跟踪数据，包括原生调用栈和内核事件。
dotnet-trace
—— 如果工具已安装在镜像中，可在容器内使用。如需转储文件，调用 dump-collect
技能。
perfcollect
—— 在.NET 10之前版本的容器中需要原生调用栈时使用（需要
```
SYS_ADMIN
```
/
```
--privileged
```
权限）。

如果不处于工作负载上下文（无法访问控制台），或已部署

dotnet-monitor

：

dotnet-monitor
—— 为容器环境设计；以边车形式运行。无需在应用容器中安装工具。无法访问控制台时的最佳选择。

Memory dumps

内存转储文件

When dumps are needed (memory leaks, hangs), do not provide dump collection commands directly for modern .NET — invoke the dump-collect
skill instead. The

dump-collect

skill only supports modern .NET (.NET Core 3.0+). For .NET Framework, provide dump collection guidance directly (e.g.,

procdump -ma <PID>

or Task Manager). This skill focuses on trace collection only.

当需要转储文件（内存泄漏、程序挂起）时，不要直接提供现代.NET的转储收集命令—— 请调用 dump-collect
技能。

dump-collect

技能仅支持现代.NET（.NET Core 3.0+）。针对**.NET Framework**，直接提供转储收集指导（例如

procdump -ma <PID>

或任务管理器）。本技能仅专注于跟踪数据收集。

Memory leaks

内存泄漏

Capture two dumps as memory is increasing (e.g., one early, one after significant growth). Invoke the dump-collect
skill for dump collection — do not provide dump commands directly. Diff the dumps in PerfView to see which objects have increased — this is the most effective way to identify what is leaking.
Without admin privileges: Two process dumps can give a sense of what's growing on the heap, but may not be enough to identify the root cause. If dumps aren't sufficient, reproduce the issue in an environment where admin privileges are available to collect richer data (traces).
Modern .NET on Linux (pre-.NET 10): Recommend two dump captures (invoke
```
dump-collect
```
skill) for heap diff, plus
```
dotnet-trace
```
while memory is growing (for allocation tracking). No trigger needed — capture during the growth period. Both together give the best picture.
Modern .NET 10+ on Linux with admin: Recommend two dump captures (invoke
```
dump-collect
```
skill) for heap diff, plus
```
dotnet-trace collect-linux
```
while memory is growing (richer data including native stacks). No trigger needed.
.NET Framework: Recommend two dumps plus a PerfView trace while memory is growing to see what is being allocated. The
```
dump-collect
```
skill does not support .NET Framework, so provide dump commands directly (e.g.,
```
procdump -ma <PID>
```
or right-click → Create Dump File in Task Manager). No trigger is needed — just capture the trace during the growth period. Do not wait for an
```
OutOfMemoryException
```
.

捕获两个转储文件，分别在内存增长初期和显著增长后。调用 dump-collect
技能收集转储文件——不要直接提供命令。在PerfView中对比两个转储文件，查看哪些对象数量增加——这是识别泄漏源的最有效方法。
无管理员权限：两个进程转储文件可以大致了解堆中增长的对象，但可能不足以确定根本原因。如果转储文件不够，建议在有管理员权限的环境中复现问题，以收集更丰富的数据（跟踪文件）。
Linux上的现代.NET（.NET 10之前版本）：建议捕获两个转储文件（调用
```
dump-collect
```
技能）进行堆对比，同时在内存增长期间使用
```
dotnet-trace
```
（用于分配跟踪）。无需触发条件——在增长期间捕获数据。两者结合可提供最完整的信息。
Linux上的现代.NET 10+（有管理员权限）：建议捕获两个转储文件（调用
```
dump-collect
```
技能）进行堆对比，同时在内存增长期间使用
```
dotnet-trace collect-linux
```
（更丰富的数据，包括原生调用栈）。无需触发条件。
.NET Framework：建议捕获两个转储文件，同时在内存增长期间使用PerfView跟踪以查看分配的对象。
```
dump-collect
```
技能不支持.NET Framework，因此直接提供转储命令（例如
```
procdump -ma <PID>
```
或右键任务管理器中的进程→创建转储文件）。无需触发条件——只需在增长期间捕获跟踪文件。不要等到发生
```
OutOfMemoryException
```
。

Excessive GC

GC过于频繁

Excessive GC requires a trace to analyze GC events, pause times, and allocation patterns — a dump is not sufficient.

Windows (PerfView): Use
```
PerfView collect /GCCollectOnly
```
to capture GC events.

Linux (dotnet-trace): Use

dotnet-trace collect -p <PID> --profile gc-verbose

Linux .NET 10+ with root: Use
```
dotnet-trace collect-linux --profile gc-verbose
```
for richer data with native stacks.
Containers:
```
dotnet-monitor
```
can capture GC traces via its REST API (
```
/trace?profile=gc-verbose
```
).

GC过于频繁需要跟踪文件来分析GC事件、暂停时间和分配模式——转储文件不足以排查此类问题。

Windows（PerfView）：使用
```
PerfView collect /GCCollectOnly
```
捕获GC事件。

Linux（dotnet-trace）：使用

dotnet-trace collect -p <PID> --profile gc-verbose

。

Linux上的.NET 10+（有管理员权限）：使用
```
dotnet-trace collect-linux --profile gc-verbose
```
获取更丰富的数据，包括原生调用栈。
容器：
```
dotnet-monitor
```
可通过REST API（
```
/trace?profile=gc-verbose
```
）捕获GC跟踪文件。

Slow Requests

请求缓慢

Slow requests require a thread time trace to see where threads are spending time — waiting on locks, I/O, external calls, etc. Use larger buffers since thread time traces generate more data. For ASP.NET Core applications, also enable

Microsoft.AspNetCore.Hosting

and

Microsoft-AspNetCore-Server-Kestrel

providers to get server-side request lifecycle timing (when requests arrive, how long they take to process).

Windows (PerfView): Use
```
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048
```
. The
```
/ThreadTime
```
argument adds thread-level wait and block detail. For ASP.NET Core, add Kestrel providers:
```
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*Microsoft.AspNetCore.Hosting,*Microsoft-AspNetCore-Server-Kestrel
```
. Do not include a stop trigger by default — let the user design one based on their specific scenario.

Linux (dotnet-trace):

dotnet-trace

captures thread time data by default — no special arguments needed. Use

dotnet-trace collect -p <PID>

. For ASP.NET Core, add Kestrel providers:

dotnet-trace collect -p <PID> --providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel

Linux .NET 10+ with root: Use

dotnet-trace collect-linux --profile thread-time

for richer data with native stacks. For ASP.NET Core, add:

--providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel

Containers:
```
dotnet-monitor
```
can capture traces via its REST API (
```
/trace?pid=<PID>&durationSeconds=30
```
).

请求缓慢需要线程时间跟踪文件来查看线程的时间消耗位置——等待锁、I/O、外部调用等。使用更大的缓冲区，因为线程时间跟踪会生成更多数据。对于ASP.NET Core应用，还需启用

Microsoft.AspNetCore.Hosting

和

Microsoft-AspNetCore-Server-Kestrel

提供程序以获取服务器端请求生命周期的时间数据（请求到达时间、处理时长等）。

Windows（PerfView）：使用

PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048

。

/ThreadTime

参数添加线程级别的等待和阻塞细节。对于ASP.NET Core，添加Kestrel提供程序：

PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*Microsoft.AspNetCore.Hosting,*Microsoft-AspNetCore-Server-Kestrel

。默认不设置停止触发条件——由用户根据具体场景自行设计。

Linux（dotnet-trace）：

dotnet-trace

默认捕获线程时间数据——无需特殊参数。使用

dotnet-trace collect -p <PID>

。对于ASP.NET Core，添加Kestrel提供程序：

dotnet-trace collect -p <PID> --providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel

。

Linux上的.NET 10+（有管理员权限）：使用
```
dotnet-trace collect-linux --profile thread-time
```
获取更丰富的数据，包括原生调用栈。对于ASP.NET Core，添加：
```
--providers Microsoft.AspNetCore.Hosting,Microsoft-AspNetCore-Server-Kestrel
```
。
容器：
```
dotnet-monitor
```
可通过REST API（
```
/trace?pid=<PID>&durationSeconds=30
```
）捕获跟踪文件。

Hangs

程序挂起

Start with a trace to understand what threads are doing. Use the appropriate trace tool for the environment (PerfView with
```
/ThreadTime
```
on Windows,
```
dotnet-trace
```
on Linux,
```
dotnet-trace collect-linux --profile thread-time
```
on .NET 10+ Linux with root). The trace can reveal:
- Livelocks (threads spinning without forward progress) — threads appear busy but the application makes no progress.
- Thread starvation — the ThreadPool is exhausted and queued work items are not being processed. This can look like a deadlock but has a different root cause.
- Whether there is any forward progress at all — if some threads are making progress, the issue may be a bottleneck rather than a true hang.
If the trace does not explain the hang, the issue may be a true deadlock (threads waiting on each other in a cycle). In this case, invoke the dump-collect
skill to collect a process dump — do not provide dump commands directly.
Analyze the dump with a debugger to inspect thread stacks and identify the lock cycle:
- Windows: Visual Studio or WinDbg with the SOS debugger extension.
- Linux:
```
lldb
```
  with the SOS debugger extension.

先收集跟踪文件以了解线程的行为。使用适配环境的跟踪工具（Windows上使用带
```
/ThreadTime
```
的PerfView，Linux上使用
```
dotnet-trace
```
，.NET 10+ Linux且有管理员权限时使用
```
dotnet-trace collect-linux --profile thread-time
```
）。跟踪文件可揭示：
- 活锁（线程空转但无进展）——线程看似忙碌，但应用无进展。
- 线程饥饿——线程池耗尽，排队的工作项无法处理。这看起来像死锁，但根本原因不同。
- 是否有进展——如果部分线程有进展，问题可能是瓶颈而非真正的死锁。
如果跟踪文件无法解释挂起问题，可能是真正的死锁（线程循环等待彼此）。此时调用 dump-collect
技能收集进程转储文件——不要直接提供命令。
使用调试器分析转储文件以检查线程栈并识别锁循环：
- Windows：Visual Studio 或带有SOS调试扩展的WinDbg。
- Linux：带有SOS调试扩展的
```
lldb
```
  。

Networking Issues

网络问题

Networking issues (HTTP 5xx errors from downstream services, request timeouts, connection failures, DNS resolution failures, TLS handshake failures, connection pool exhaustion) require both a thread-time trace and networking event providers. The thread-time trace shows where threads are blocked (slow downstream calls, thread starvation), while the networking events show the request lifecycle — which requests failed, what status codes came back, how long DNS resolution and TLS handshakes took, and how long requests waited for a connection from the pool.

For .NET Framework,

PerfView /ThreadTime

already collects the relevant networking events (from the

System.Net

ETW provider) — no additional providers are needed.

For modern .NET, you must explicitly enable the

System.Net.*

EventSource providers:

Provider	What it covers
`System.Net.Http`	HttpClient/SocketsHttpHandler — request lifecycle, HTTP status codes, connection pool
`System.Net.NameResolution`	DNS lookups (start/stop, duration)
`System.Net.Security`	TLS/SSL handshakes (SslStream)
`System.Net.Sockets`	Low-level socket connect/disconnect

Key events from

System.Net.Http

RequestStart

(scheme, host, port, path),

RequestStop

(statusCode —

-1

if no response was received),

RequestFailed

(exception message for timeouts, connection refused, etc.),

RequestLeftQueue

(time waiting for a connection from the pool — indicates connection pool exhaustion),

ConnectionEstablished

ConnectionClosed

Collect a thread-time trace with networking providers enabled (modern .NET only — .NET Framework needs only

PerfView /ThreadTime

Windows (PerfView): Use
```
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*System.Net.Http,*System.Net.NameResolution,*System.Net.Security,*System.Net.Sockets
```
. For .NET Framework, omit the
```
/Providers
```
flag —
```
/ThreadTime
```
already includes the networking events. The thread-time trace shows where threads are blocked while the networking events show what requests are failing and why.

Linux (dotnet-trace):

dotnet-trace

captures thread time data by default, but specifying

--providers

overrides the defaults so you must also include

--profile

dotnet-trace collect -p <PID> --profile dotnet-common,dotnet-sampled-thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets

Linux .NET 10+ with root: Use

dotnet-trace collect-linux --profile dotnet-common,cpu-sampling,thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets

Containers:
```
dotnet-monitor
```
can capture traces with custom providers via its REST API.

网络问题（下游服务返回HTTP 5xx错误、请求超时、连接失败、DNS解析失败、TLS握手失败、连接池耗尽）需要线程时间跟踪文件和网络事件提供程序。线程时间跟踪文件显示线程阻塞的位置（缓慢的下游调用、线程饥饿），而网络事件显示请求生命周期——哪些请求失败、状态码、DNS解析和TLS握手时长、请求等待连接池连接的时间等。

针对**.NET Framework**，

PerfView /ThreadTime

已收集相关网络事件（来自

System.Net

ETW提供程序）——无需额外提供程序。

针对现代.NET，必须显式启用

System.Net.*

EventSource提供程序：

提供程序	覆盖范围
`System.Net.Http`	HttpClient/SocketsHttpHandler——请求生命周期、HTTP状态码、连接池
`System.Net.NameResolution`	DNS查询（开始/结束、时长）
`System.Net.Security`	TLS/SSL握手（SslStream）
`System.Net.Sockets`	底层套接字连接/断开

System.Net.Http

的关键事件：

RequestStart

（协议、主机、端口、路径）、

RequestStop

（statusCode——无响应时为

-1

）、

RequestFailed

（超时、连接拒绝等异常信息）、

RequestLeftQueue

（等待连接池连接的时间——指示连接池耗尽）、

ConnectionEstablished

、

ConnectionClosed

。

收集启用网络提供程序的线程时间跟踪文件（仅现代.NET需要——.NET Framework只需

PerfView /ThreadTime

）：

Windows（PerfView）：使用
```
PerfView /ThreadTime collect /BufferSizeMB:1024 /CircularMB:2048 /Providers:*System.Net.Http,*System.Net.NameResolution,*System.Net.Security,*System.Net.Sockets
```
。针对.NET Framework，省略
```
/Providers
```
参数——
```
/ThreadTime
```
已包含网络事件。线程时间跟踪文件显示线程阻塞位置，网络事件显示请求失败的原因。

Linux（dotnet-trace）：

dotnet-trace

默认捕获线程时间数据，但指定

--providers

会覆盖默认设置，因此必须同时包含

--profile

：

dotnet-trace collect -p <PID> --profile dotnet-common,dotnet-sampled-thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets

。

Linux上的.NET 10+（有管理员权限）：使用

dotnet-trace collect-linux --profile dotnet-common,cpu-sampling,thread-time --providers System.Net.Http,System.Net.NameResolution,System.Net.Security,System.Net.Sockets

。

容器：
```
dotnet-monitor
```
可通过REST API捕获带有自定义提供程序的跟踪文件。

Assembly Loading Issues

程序集加载问题

For modern .NET, assembly loading issues (

FileNotFoundException

FileLoadException

ReflectionTypeLoadException

, version conflicts, duplicate assembly loads across AssemblyLoadContexts) require collecting assembly loader binder events from the

Microsoft-Windows-DotNETRuntime

provider with the Loader keyword (

0x4

). These events trace every step of the runtime's assembly resolution algorithm — which paths were probed, which AssemblyLoadContext handled the load, whether the load succeeded or failed, and why. For .NET Framework, the same provider and keyword work for ETW-based collection; additionally, the Fusion Log Viewer (

fuslogvw.exe

) can diagnose assembly binding failures without requiring a trace.

The provider specification is

Microsoft-Windows-DotNETRuntime:0x4:4

(provider name, AssemblyLoader keyword, Informational verbosity).

Windows (PerfView): A default PerfView trace already includes binder events - simply run
```
PerfView collect
```
with no extra providers. For a smaller trace file, use
```
PerfView collect /ClrEvents:Default-Profile
```
, which removes the most verbose default events while keeping the events necessary for diagnosing assembly loading issues.

Linux / cross-platform (dotnet-trace): Use

dotnet-trace collect --clrevents assemblyloader -- <path-to-built-exe>

to launch and trace the process, or

dotnet-trace collect --clrevents assemblyloader -p <PID>

to attach to a running process.

Linux .NET 10+ with root: Use

dotnet-trace collect-linux --clrevents assemblyloader

Containers:
```
dotnet-monitor
```
can capture traces with the loader provider via its REST API.

For short-lived processes that fail on startup (common with assembly loading issues), prefer the

dotnet-trace

launch form (

-- <path-to-built-exe>

) over attaching by PID, since the process may exit before you can attach.

Explain the trade-offs when recommending a tool. For example:

PerfView gives richer data but needs admin; runs on Windows including Windows containers.
```
dotnet-trace
```
works cross-platform without admin but captures less system-level detail.
```
perfcollect
```
captures native call stacks but needs admin/root.
```
dotnet-monitor
```
is the best option for containers/K8s when console access is not available, but requires sidecar deployment and authentication configuration.

针对现代.NET，程序集加载问题（

FileNotFoundException

、

FileLoadException

、

ReflectionTypeLoadException

、版本冲突、跨AssemblyLoadContext的重复程序集加载）需要收集

Microsoft-Windows-DotNETRuntime

提供程序中带有Loader关键字（

0x4

）的程序集加载器绑定事件。这些事件跟踪运行时程序集解析算法的每一步——探测了哪些路径、哪个AssemblyLoadContext处理加载、加载成功或失败的原因等。针对.NET Framework，相同的提供程序和关键字适用于基于ETW的收集；此外，Fusion日志查看器（

fuslogvw.exe

）可在无需跟踪文件的情况下诊断程序集绑定失败。

提供程序规范为

Microsoft-Windows-DotNETRuntime:0x4:4

（提供程序名称、AssemblyLoader关键字、信息级详细程度）。

Windows（PerfView）：默认的PerfView跟踪已包含绑定事件——只需运行
```
PerfView collect
```
无需额外提供程序。如需更小的跟踪文件，使用
```
PerfView collect /ClrEvents:Default-Profile
```
，该命令会移除最冗长的默认事件，但保留诊断程序集加载问题所需的事件。

Linux/跨平台（dotnet-trace）：使用

dotnet-trace collect --clrevents assemblyloader -- <path-to-built-exe>

启动并跟踪进程，或使用

dotnet-trace collect --clrevents assemblyloader -p <PID>

附加到运行中的进程。

Linux上的.NET 10+（有管理员权限）：使用
```
dotnet-trace collect-linux --clrevents assemblyloader
```
。
容器：
```
dotnet-monitor
```
可通过REST API捕获带有加载器提供程序的跟踪文件。

针对启动时失败的短生命周期进程（程序集加载问题的常见场景），优先使用

dotnet-trace

的启动形式（

-- <path-to-built-exe>

）而非通过PID附加，因为进程可能在附加前就已退出。

推荐工具时请说明取舍。例如：

PerfView提供更丰富的数据，但需要管理员权限；可在Windows（包括Windows容器）上运行。
```
dotnet-trace
```
跨平台且无需管理员权限，但捕获的系统级细节较少。
```
perfcollect
```
可捕获原生调用栈，但需要管理员/root权限。
```
dotnet-monitor
```
是无法访问控制台时容器/K8s环境的最佳选择，但需要边车部署和认证配置。

Step 3: Guide data collection

步骤3：指导数据收集

Provide the specific commands for the recommended tool. Load the appropriate reference file from the tool reference lookup table for detailed command-line examples.

Key guidance to include:

Installation: How to install the tool if it is not already available (e.g.
```
dotnet tool install -g dotnet-trace
```
). When recommending multiple tools, provide installation and usage instructions for each one — do not mention a tool without showing how to install and use it.
PID discovery (required before any
-p <PID>
command): Verify the target process first (for example:
```
dotnet-trace ps
```
,
```
curl <monitor-endpoint>/processes
```
, or
```
ps
```
inside a container). If the app is expected to be PID 1 in a container, still verify before collecting.
Collection command: The exact command to run, including relevant providers, output format, and duration.
Container considerations:
- Collecting from inside the container: ensure the tool is installed in the image or use
```
kubectl cp
```
  to copy it in.
- Collecting from outside the container: use
```
dotnet-monitor
```
  as a sidecar with a shared diagnostic port (Unix domain socket in
```
/tmp
```
  ).
- Kubernetes:
```
dotnet-monitor
```
  as a sidecar container, or
```
kubectl debug
```
  for ephemeral debug containers.
Long-running repros (Windows/PerfView): show how to use trigger arguments and circular buffer settings.
Output location: Where the collected file will be saved and how to copy it off the target for analysis.
Artifact handoff checklist: Include runtime version, OS/kernel, container image tag or build SHA, PID/process name, UTC collection start/end timestamps, exact command used, and final artifact path when handing traces to someone else for analysis.

提供推荐工具的具体命令。从工具参考文件查询表中加载对应的参考文件获取详细的命令行示例。

需包含的关键指导：

安装：如果工具未安装，说明安装方法（例如
```
dotnet tool install -g dotnet-trace
```
）。当推荐多个工具时，为每个工具提供安装和使用说明——不要仅提及工具而不说明如何安装和使用。
PID发现（任何
-p <PID>
命令的前置步骤）：先验证目标进程（例如：
```
dotnet-trace ps
```
、
```
curl <monitor-endpoint>/processes
```
或容器内的
```
ps
```
命令）。即使应用在容器中预期为PID 1，收集前仍需验证。
收集命令：具体的运行命令，包括相关提供程序、输出格式和时长。
容器注意事项：
- 容器内收集：确保工具已安装在镜像中，或使用
```
kubectl cp
```
  复制工具到容器内。
- 容器外收集：将
```
dotnet-monitor
```
  作为边车部署，共享诊断端口（
```
/tmp
```
  中的Unix域套接字）。
- Kubernetes：将
```
dotnet-monitor
```
  作为边车容器，或使用
```
kubectl debug
```
  创建临时调试容器。
长时间复现的问题（Windows/PerfView）：说明如何使用触发参数和循环缓冲区设置。
输出位置：收集的文件保存位置，以及如何将文件复制到目标机器外进行分析。
工件移交检查清单：将跟踪文件移交他人分析时，需包含运行时版本、操作系统/内核版本、容器镜像标签或构建SHA、PID/进程名称、UTC收集开始/结束时间、使用的具体命令、最终工件路径。

Step 4: Recommend analysis approach

步骤4：推荐分析方法

After data is collected, recommend the appropriate tool for analysis. Do not perform the analysis — just point the developer to the right tool and documentation.

Collected Data	Analysis Tool	Notes
`.nettrace` file	PerfView (Windows), Speedscope (web)	PerfView gives the richest view on Windows
`.etl` / `.etl.zip` file	PerfView	ETW traces from PerfView or perfcollect
`perf.data.nl` from perfcollect	PerfView (Windows)	Copy the file to a Windows machine and open with PerfView

数据收集完成后，推荐合适的分析工具。不要执行分析操作——只需为开发者指明正确的工具和文档。

收集的数据	分析工具	说明
`.nettrace` 文件	PerfView（Windows）、Speedscope（网页版）	Windows上PerfView提供最丰富的视图
`.etl` / `.etl.zip` 文件	PerfView	来自PerfView或perfcollect的ETW跟踪文件
perfcollect生成的 `perf.data.nl`	PerfView（Windows）	将文件复制到Windows机器，使用PerfView打开

Validation

验证

The recommended tool is compatible with the developer's runtime, OS, and deployment topology
The collection command runs without errors
The output file is generated in the expected location
The developer knows which analysis tool to use for the collected data

推荐的工具与开发者的运行时、操作系统和部署架构兼容
收集命令可正常运行无错误
输出文件在预期位置生成
开发者了解针对收集的数据应使用哪种分析工具

Common Pitfalls

常见陷阱

Pitfall	Solution
Using `dotnet-trace` on .NET Framework	`dotnet-trace` only works with modern .NET (.NET Core 3.0+). Use PerfView for .NET Framework.
PerfView without admin privileges	PerfView requires admin for ETW tracing. Fall back to `dotnet-trace` if admin is not available.
`perfcollect` in container without `SYS_ADMIN`	Containers drop `SYS_ADMIN` by default. Run with `--privileged` or add `SYS_ADMIN` capability, or fall back to `dotnet-trace` .
Huge trace files from long repros	On Windows, use PerfView `/StopOn` triggers that fire on the symptom you want to capture (e.g., `/StopOnPerfCounter` , `/StopOnGCEvent` , `/StopOnException` ) with `/CircularMB` and `/BufferSizeMB` . Never trigger on recovery — the circular buffer continuously overwrites old data, so the interesting behavior may be lost by the time collection stops.
Diagnostic port not accessible in container	Mount `/tmp` as a shared volume between the app container and `dotnet-monitor` sidecar for the diagnostic Unix domain socket.
Forgetting to install tools in container image	Add `dotnet tool install` to your Dockerfile, or use `dotnet-monitor` as a sidecar to avoid modifying the app image.
Exposing `dotnet-monitor` with `--no-auth` in production	Keep auth enabled, bind to localhost, and use `kubectl port-forward` for access. Use `--no-auth` only for short-lived isolated debugging.
Collecting only CPU/thread-time trace for networking issues	CPU and thread-time traces alone do not show HTTP status codes, DNS timing, or connection pool behavior. Add the networking providers ( `System.Net.Http` , `System.Net.NameResolution` , `System.Net.Security` , `System.Net.Sockets` ) alongside the thread-time trace.
Enabling all networking providers when only one is needed	Each networking provider adds overhead. If the issue is clearly HTTP-level (5xx status codes), `System.Net.Http` alone may be sufficient. Add DNS, TLS, and socket providers when the root cause is unclear.

陷阱	解决方案
在.NET Framework上使用 `dotnet-trace`	`dotnet-trace` 仅支持现代.NET（.NET Core 3.0+）。针对.NET Framework使用PerfView。
无管理员权限使用PerfView	PerfView需要管理员权限进行ETW跟踪。无管理员权限时，退而使用 `dotnet-trace` 。
在无 `SYS_ADMIN` 权限的容器中使用 `perfcollect`	容器默认移除 `SYS_ADMIN` 权限。使用 `--privileged` 参数或添加 `SYS_ADMIN` 能力，或退而使用 `dotnet-trace` 。
长时间复现问题产生超大跟踪文件	在Windows上，使用基于目标症状的PerfView `/StopOn` 触发条件（例如 `/StopOnPerfCounter` 、 `/StopOnGCEvent` 、 `/StopOnException` ），并配合 `/CircularMB` 和 `/BufferSizeMB` 。绝对不要在恢复时触发——循环缓冲区会持续覆盖旧数据，因此收集停止时关键行为可能已丢失。
容器中诊断端口无法访问	在应用容器和 `dotnet-monitor` 边车之间挂载 `/tmp` 作为共享卷，用于诊断Unix域套接字。
忘记在容器镜像中安装工具	在Dockerfile中添加 `dotnet tool install` 命令，或使用 `dotnet-monitor` 作为边车以避免修改应用镜像。
生产环境中使用 `--no-auth` 暴露 `dotnet-monitor`	保持认证启用，绑定到本地主机，并使用 `kubectl port-forward` 进行访问。仅在短期隔离调试时使用 `--no-auth` 。
仅收集CPU/线程时间跟踪文件排查网络问题	CPU和线程时间跟踪文件无法显示HTTP状态码、DNS时长或连接池行为。在线程时间跟踪文件基础上添加网络提供程序（ `System.Net.Http` 、 `System.Net.NameResolution` 、 `System.Net.Security` 、 `System.Net.Sockets` ）。
仅需一个网络提供程序时启用所有提供程序	每个网络提供程序都会增加开销。如果问题明确为HTTP级别（5xx状态码），仅启用 `System.Net.Http` 可能就足够。当根本原因不明确时，再添加DNS、TLS和套接字提供程序。