doris-profile-reader

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Doris Profile Reader

Purpose

用途

Use this skill to identify the real bottleneck in an Apache Doris query runtime profile. The core rule is to separate active work from dependency, queue, and backpressure waits before naming an operator as expensive. When the plan contains joins, also separate the immediate runtime bottleneck from the plan-shape cause, especially bad join order and runtime-filter direction.

使用该技能识别Apache Doris查询运行时配置文件中的真实瓶颈。核心原则是：在判定某个算子耗时较高前，需将主动计算工作与依赖等待、队列等待、背压等待区分开。当执行计划包含连接操作时，还需将即时运行时瓶颈与执行计划结构问题（尤其是不合理的连接顺序和运行时过滤器方向）区分开。

Required Reading Order

必读顺序

Read
```
references/reading-workflow.md
```
for the analysis workflow and output contract.
Read
```
references/counter-semantics.md
```
for counter meaning and priority, especially wait counters.
Read
```
references/operator-guide.md
```
for the relevant operator family.
Read
```
references/join-order-diagnosis.md
```
when the profile or plan has multiple joins, a large hash/nested-loop build, a large scan that might have been pruned by a join, paired fast/slow plans, hints/reordered SQL, or a request about join shape/reorder.

Read

references/runtime-filters.md

when a profile or plan includes

RuntimeFilterInfo

RF... <-

RF... ->

JRFs

WaitForRuntimeFilter

, or

AcquireRuntimeFilter

Use
```
references/source-profile-inventory.md
```
as the source-backed operator/counter inventory. If a counter or operator is not in the narrative docs, do not ignore it; look it up in this inventory and classify it by the rules in
```
counter-semantics.md
```
.

阅读
```
references/reading-workflow.md
```
了解分析流程和输出规范。
阅读
```
references/counter-semantics.md
```
了解计数器含义和优先级，尤其是等待类计数器。
阅读
```
references/operator-guide.md
```
了解相关算子族的信息。
当配置文件或执行计划包含多连接操作、大型哈希/嵌套循环构建、可能被连接操作裁剪的大型扫描、快慢执行计划对比、带提示/重排的SQL，或涉及连接结构/重排的请求时，阅读
```
references/join-order-diagnosis.md
```
。

当配置文件或执行计划包含

RuntimeFilterInfo

、

RF... <-

、

RF... ->

、

JRFs

、

WaitForRuntimeFilter

或

AcquireRuntimeFilter

时，阅读

references/runtime-filters.md

。

将
```
references/source-profile-inventory.md
```
作为算子/计数器的权威参考清单。若某个计数器或算子未在文档中提及，请勿忽略，需在此清单中查找，并按照
```
counter-semantics.md
```
中的规则进行分类。

Non-Negotiable Interpretation Rules

不可违背的解析规则

Do not call
```
WaitForDependencyTime
```
,
```
WaitForDependency[...]Time
```
,
```
WaitForData0
```
,
```
WaitForDataN
```
,
```
WaitForRpcBufferQueue
```
,
```
WaitForBroadcastBuffer
```
,
```
PendingFinishDependency
```
, or pipeline blocked/wait counters direct operator compute cost. They are dependency, data-arrival, queue, memory, or backpressure signals.
Do not rank operators by the largest wait-like counter alone. First rank by
```
ExecTime
```
/active timers, direct custom timers, rows/bytes, memory/spill, and skew.
Treat merged-profile
```
sum
```
timers as accumulated across parallel instances. A timer can exceed query elapsed time and still be normal when many scanners, drivers, or fragments run in parallel.
For scans, prioritize
```
RowsRead
```
,
```
ScanRows
```
,
```
ScanBytes
```
,
```
ScannerCpuTime
```
,
```
ScannerGetBlockTime
```
,
```
ScannerWorkerWaitTime
```
, I/O/decompression timers, predicate/lazy-read timers, and row-filter counters.
```
ScannerWorkerWaitTime
```
is important, but it indicates scanner scheduling/thread-pool wait rather than scan CPU.
For runtime filters, distinguish source/build side from target/probe scan side. In plan text,
```
RFxxx <- expr
```
is produced from the build side;
```
RFxxx -> expr
```
is applied at a target scan. In profiles,
```
RuntimeFilterInfo
```
and scan-side wait/filter counters decide whether the RF helped or just waited.
For joins, always identify build side and probe/target side before judging the order. A scan can be the immediate active bottleneck while the root cause is still bad join order if a selective join/RF source is scheduled too late to prune that scan.
Do not require a paired fast profile before naming likely bad join order. A single slow profile can be enough when it proves large wasted build/scan work before an empty probe/result, an RF source side that had to scan massively before it could emit an empty/tiny filter, or a huge intermediate later eliminated by a highly selective or contradictory join predicate.
Do not treat "the current RF made other scans wait and then skip" as proof that the join order is good. If producing that empty/tiny RF required the only expensive scan, the RF source/target choice is itself the join-order question.
When a large build/source side is paid before an empty/tiny probe, preserved side, semi-join key set, or contradictory join can eliminate the result, call the build/probe order likely bad unless the profile proves that ordering is semantically forced and no earlier pruning/short-circuit is possible.
If a strong single-profile join-order pattern matches, do not hedge as "suspicious", "close to likely bad", or "not proven". Use
```
likely bad
```
when a better legal order still needs validation, and reserve
```
not proven
```
for the exact alternate shape, not for the join-order diagnosis itself.
When a join query has plan-shape evidence, the answer must explicitly judge join order/build-probe/RF direction as good, suspicious, or bad. Do not replace that judgment with a vague "predicate issue" or "plan shape issue".
A long
```
InitTime
```
,
```
OpenTime
```
,
```
CloseTime
```
, or profile total can matter, but only after confirming it is not accumulated across many instances and not dominated by a known wait/dependency branch.

不得将
```
WaitForDependencyTime
```
、
```
WaitForDependency[...]Time
```
、
```
WaitForData0
```
、
```
WaitForDataN
```
、
```
WaitForRpcBufferQueue
```
、
```
WaitForBroadcastBuffer
```
、
```
PendingFinishDependency
```
或流水线阻塞/等待类计数器视为算子的直接计算成本。这些是依赖、数据到达、队列、内存或背压信号。
不得仅依据最大的等待类计数器对算子排序。应首先依据
```
ExecTime
```
/主动计时器、直接自定义计时器、行/字节数、内存/溢出、数据倾斜进行排序。
合并配置文件中的
```
sum
```
计时器是多个并行实例的累计值。当多个扫描器、驱动或片段并行运行时，计时器数值超过查询总耗时属于正常情况。
对于扫描操作，优先关注
```
RowsRead
```
、
```
ScanRows
```
、
```
ScanBytes
```
、
```
ScannerCpuTime
```
、
```
ScannerGetBlockTime
```
、
```
ScannerWorkerWaitTime
```
、I/O/解压缩计时器、谓词/懒加载读取计时器和行过滤计数器。
```
ScannerWorkerWaitTime
```
很重要，但它表示扫描器调度/线程池等待时间，而非扫描CPU时间。
对于运行时过滤器，需区分源/构建端与目标/探测扫描端。在执行计划文本中，
```
RFxxx <- expr
```
由构建端生成；
```
RFxxx -> expr
```
应用于目标扫描端。在配置文件中，
```
RuntimeFilterInfo
```
和扫描端的等待/过滤计数器决定了运行时过滤器是否起到作用，还是仅造成等待。
对于连接操作，在判断顺序是否合理前，必须先确定构建端和探测/目标端。如果一个选择性连接/RF源被调度过晚，未能及时裁剪扫描操作，那么扫描操作可能是即时的主动瓶颈，但根本原因仍是不合理的连接顺序。
判定可能不合理的连接顺序时，无需依赖对比的快速配置文件。当单个慢配置文件能证明以下情况时，即可判定：在空探测/结果之前存在大量无用的构建/扫描工作；RF源端必须扫描大量数据才能生成空/极小的过滤器；大型中间结果随后被高选择性或矛盾的连接谓词消除。
不得将“当前运行时过滤器导致其他扫描等待后跳过”视为连接顺序合理的证据。如果生成该空/极小过滤器需要唯一的高成本扫描操作，那么RF源/目标的选择本身就是连接顺序问题。
当大型构建/源端的计算成本在空/极小探测、保留端、半连接键集或矛盾连接能消除结果之前就已产生时，除非配置文件证明该顺序是语义强制的，且无法提前裁剪/短路，否则应判定构建/探测顺序可能不合理。
当单个配置文件中的连接顺序模式明确时，不得使用“可疑”、“接近不合理”或“未证实”等模糊表述。当更好的合法顺序仍需验证时，使用“可能不合理”；仅在存在明确替代结构时，使用“未证实”，而非针对连接顺序诊断本身。
当连接查询存在执行计划结构证据时，答案必须明确判定连接顺序/构建-探测/RF方向是合理、可疑还是不合理。不得用模糊的“谓词问题”或“执行计划结构问题”替代该判定。
较长的
```
InitTime
```
、
```
OpenTime
```
、
```
CloseTime
```
或配置文件总耗时可能重要，但需先确认该值不是多个实例的累计值，且未被已知的等待/依赖分支主导。

Standard Answer Shape

标准回答结构

When explaining a profile, answer in this order:

```
Conclusion
```
: one or two sentences naming the likely bottleneck and, when joins are involved, whether the plan shape/join order is likely the cause.
```
Evidence
```
: profile counters with operator names, values, and whether each is active work, data volume, wait/backpressure, memory/spill, or runtime-filter evidence.
```
Reasoning
```
: how the evidence maps to the execution plan, which side is build/probe or RF source/target, why misleading counters are discounted, and whether the join order is reasonable.
```
Next checks
```
: the smallest additional profile/log/code checks needed if the conclusion is still uncertain.

Always preserve uncertainty. Use "proven", "likely", and "not proven" explicitly when the profile lacks enough detail.

解释配置文件时，需按以下顺序作答：

```
结论
```
：用1-2句话说明可能的瓶颈，若涉及连接操作，需说明执行计划结构/连接顺序是否可能是原因。
```
证据
```
：列出配置文件中的计数器、算子名称、数值，并说明每个是主动工作、数据量、等待/背压、内存/溢出还是运行时过滤器证据。
```
推理
```
：说明证据如何映射到执行计划，哪一端是构建/探测或RF源/目标，为何忽略误导性计数器，以及连接顺序是否合理。
```
后续检查
```
：若结论仍不确定，列出所需的最小额外配置文件/日志/代码检查项。

始终保留不确定性。当配置文件细节不足时，需明确使用“已证实”、“可能”和“未证实”。

Scripts

脚本

```
scripts/extract_source_profile_inventory.py
```
: scan Doris source for factory-created operators and counter/info registrations.

Scripts are evidence-generation helpers. They do not replace the reading workflow.

```
scripts/extract_source_profile_inventory.py
```
：扫描Doris源码，查找工厂创建的算子以及计数器/信息注册项。

脚本是生成证据的辅助工具，不能替代阅读流程。