reduce-unoptimized-query-oracle

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Reduce Unoptimized Query Oracle Test Failure

简化Unoptimized Query Oracle测试失败问题

Reduce an unoptimized-query-oracle test failure log to the simplest possible reproduction case.
The unoptimized-query-oracle roachtest runs a series of random SQL statements to create a random dataset, and then executes a random "Query of Interest" twice, with different optimization settings. If the two executions return different results, it indicates a bug in CockroachDB.
将unoptimized-query-oracle测试失败日志简化为最精简的复现用例。
unoptimized-query-oracle roachtest会运行一系列随机SQL语句来创建随机数据集,然后使用不同的优化设置,两次执行随机生成的“目标查询”。如果两次执行返回的结果不同,则表明CockroachDB中存在bug。

When to Use

适用场景

Use this skill when:
  • You have a test failure from the unoptimized-query-oracle roachtest.
  • You need to find the minimal SQL to reproduce the test failure.
在以下场景中使用本方法:
  • 你遇到了unoptimized-query-oracle roachtest的测试失败问题
  • 你需要找到能复现该测试失败的最简SQL

Step 1: Locate artifacts

步骤1:定位产物文件

Ask the user where the artifacts directory is.
Find the relevant files in the artifacts directory:
  • Test parameters:
    params.log
    (the parameters from the roachtest)
  • Test log:
    test.log
    (the log from the roachtest)
  • Failure log:
    failure*.log
    (the failure log from the roachtest)
  • Full SQL log:
    unoptimized-query-oracle*.log
    (the SQL statements that led to failure)
  • Query of interest log:
    unoptimized-query-oracle*.failure.log
    (containing the query of interest and possibly more information about the failure)
  • Cockroach log:
    logs/1.unredacted/cockroach.log
    or
    logs/unredacted/cockroach.log
    (contains the git commit)
询问用户产物目录的位置。
在产物目录中找到以下相关文件:
  • 测试参数
    params.log
    (来自roachtest的参数配置)
  • 测试日志
    test.log
    (来自roachtest的运行日志)
  • 失败日志
    failure*.log
    (来自roachtest的失败详情日志)
  • 完整SQL日志
    unoptimized-query-oracle*.log
    (导致失败的所有SQL语句)
  • 目标查询日志
    unoptimized-query-oracle*.failure.log
    (包含目标查询及可能的更多失败信息)
  • Cockroach日志
    logs/1.unredacted/cockroach.log
    logs/unredacted/cockroach.log
    (包含Git提交哈希)

Step 2: Determine test configuration

步骤2:确定测试配置

Determine the git commit from
cockroach.log
:
bash
grep "binary: CockroachDB" cockroach.log
Look for the commit hash in the version string (e.g.,
cb94db961b8f55e3473f279d98ae90f0eeb0adcb
).
Determine if runtime assertions are enabled by checking for:
  • "runtimeAssertionsBuild": "true"
    in
    params.log
  • or
    Runtime assertions enabled
    in
    test.log
Determine if metamorphic settings apply by looking for:
  • lines like these in
    params.log
    :
    "metamorphicBufferedSender": "true",
    "metamorphicWriteBuffering": "true",
  • or lines like these in
    test.log
    :
    metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
    metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'
Determine environment variables from the beginning of
cockroach.log
:
bash
grep -A10 "using local environment variables:" cockroach.log
Important environment variables include:
  • COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATAL
  • COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING
  • COCKROACH_RANDOM_SEED
  • COCKROACH_TESTING_FORCE_RELEASE_BRANCH
    But there might be more important environment variables, so best to get all of them.
Determine if this is a multi-region test or single-region test by checking:
  • the test name (e.g.,
    seed-multi-region
    in
    test.log
    indicates multi-region)
  • or the presence of
    \connect
    lines in the full SQL log If both of these are missing, it's a single-region test.
cockroach.log
中获取Git提交哈希:
bash
grep "binary: CockroachDB" cockroach.log
在版本字符串中查找提交哈希(例如:
cb94db961b8f55e3473f279d98ae90f0eeb0adcb
)。
通过以下方式判断是否启用了运行时断言:
  • params.log
    中查找
    "runtimeAssertionsBuild": "true"
  • 或在
    test.log
    中查找
    Runtime assertions enabled
通过以下方式判断是否应用了变形配置:
  • params.log
    中查找类似如下内容:
    "metamorphicBufferedSender": "true",
    "metamorphicWriteBuffering": "true",
  • 或在
    test.log
    中查找类似如下内容:
    metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
    metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'
cockroach.log
开头获取环境变量:
bash
grep -A10 "using local environment variables:" cockroach.log
重要的环境变量包括:
  • COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATAL
  • COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING
  • COCKROACH_RANDOM_SEED
  • COCKROACH_TESTING_FORCE_RELEASE_BRANCH
    但可能还有其他重要的环境变量,建议获取全部内容。
通过以下方式判断是多区域测试还是单区域测试:
  • 测试名称(例如
    test.log
    中的
    seed-multi-region
    表示多区域测试)
  • 或完整SQL日志中是否存在
    \connect
    语句 如果以上两者都没有,则为单区域测试。

Step 3: Check Out and Build

步骤3:检出代码并构建

For a normal build use:
bash
git checkout <commit-hash>
./dev build short
If runtime assertions were enabled, use a test build instead:
bash
git checkout <commit-hash>
./dev build short -- --crdb_test
Note: Only build libgeos if the reproduction uses geospatial functions (BOX2D, geometry, geography, etc.):
bash
./dev build libgeos
常规构建命令:
bash
git checkout <commit-hash>
./dev build short
如果启用了运行时断言,则使用测试构建命令:
bash
git checkout <commit-hash>
./dev build short -- --crdb_test
注意: 只有当复现用例涉及地理空间函数(如BOX2D、geometry、geography等)时,才需要构建libgeos:
bash
./dev build libgeos

Step 4: Prepare the Full SQL Log File

步骤4:准备完整SQL日志文件

First, check that the following statements are at the top of the full SQL log file. If they are not, add them:
sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
If metamorphic settings were used, also add them to the top of the full SQL log file:
sql
SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;
Create an appropriate directory either in the artifacts directory or in the repository root for holding temp files.
首先,检查完整SQL日志文件的开头是否包含以下语句。如果没有,则添加:
sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
如果使用了变形配置,还需要在完整SQL日志文件的开头添加以下语句:
sql
SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;
在产物目录或代码仓库根目录中创建一个合适的目录,用于存放临时文件。

Step 5: Initial Reproduction

步骤5:初始复现

Determine the correct demo command based on test type:
  • Multi-region test: Use
    --nodes=9
  • Single-region test: Omit
    --nodes
    option
Use a command like this to try reproducing the test failure from the full SQL log file. This command could take up to 20 minutes to finish.
bash
<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>
Check that the output reproduces the test failure described in the failure log. There are many possible failure modes. Look for one of the following, which should match the failure log:
  1. Different results between the two executions of the "Query of Interest" (which is the randomly generated SELECT statement repeated twice near the end of the log, wrapped in various SET and RESET staements). These different results could take the form of different result sets, or could also be an error in one case and no error in the other case. This is an "oracle" failure.
  2. Or,
    internal error
    or assertion failure. Note the error message for the reduce step.
  3. Or, a panic. Note the error message for the reduce step.
  4. Or, a timeout. Note the statement that timed out.
根据测试类型选择正确的demo命令:
  • 多区域测试:使用
    --nodes=9
  • 单区域测试:省略
    --nodes
    选项
使用如下命令尝试通过完整SQL日志文件复现测试失败。该命令可能需要长达20分钟才能完成。
bash
<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>
检查输出是否复现了失败日志中描述的问题。 失败模式有很多种,寻找以下与失败日志匹配的情况:
  1. 两次执行“目标查询”的结果不同(目标查询是日志末尾附近重复两次的随机生成SELECT语句,被各种SET和RESET语句包裹)。结果不同可能表现为结果集不同,也可能是其中一次执行报错而另一次不报错。这属于**“oracle”失败**。
  2. 出现
    internal error
    断言失败。记录错误信息以便后续简化步骤使用。
  3. 出现程序崩溃(panic)。记录错误信息以便后续简化步骤使用。
  4. 出现超时。记录超时的语句。

Troubleshooting

故障排查

IMPORTANT: Many failures are nondeterministic, especially for multi-region tests. If no failure happens on the first run, try up to 10 times before concluding it doesn't reproduce.
It can be helpful at this point to compare the output with the
failure*.log
which should show the failure from the original test run.
If the initial run fails to reproduce after 10 times, pause here and report to the user that the failure cannot be reproduced, and show the command that was tried. The user might have additional instructions.
If it looks like it reproduces, it's time to move on to the next step.
重要提示: 许多失败是不确定的,尤其是多区域测试。如果第一次运行没有复现失败,最多尝试10次后再得出无法复现的结论。
此时可将输出与
failure*.log
进行对比,该日志记录了原始测试运行中的失败情况。
如果初始运行尝试10次后仍无法复现,暂停操作并告知用户无法复现,同时展示所使用的命令。 用户可能会提供额外的指导。
如果成功复现,则进入下一步。

Step 6: Use the Reduce Tool

步骤6:使用简化工具

Build the reduce tool:
bash
./dev build reduce
构建简化工具:
bash
./dev build reduce

Prepare the Full SQL Log File again

再次准备完整SQL日志文件

For multi-region tests, remove
\connect
lines (they cause syntax errors in the
reduce
tool):
bash
grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>
对于多区域测试,移除
\connect
语句(这些语句会导致
reduce
工具出现语法错误):
bash
grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>

Run Reduce

运行简化工具

IMPORTANT: The reduce tool must be run from the cockroach repository root directory, because it looks for
./cockroach
in the current directory.
Use the
-multi-region
option for multi-region tests, or omit it for single-region tests.
For "oracle" failures (different results):
bash
./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
The
-unoptimized-query-oracle
option checks whether the two executions of the "Query of Interest" produce the same results.
For internal errors/assertion failures/panics:
bash
./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
Use a distinctive part of the error message as the
-contains
regex (e.g.,
"nil LeafTxnInputState"
).
The reduce tool might take up to an hour to run.
重要提示: 必须在Cockroach代码仓库根目录下运行简化工具,因为它会在当前目录中查找
./cockroach
对于多区域测试使用
-multi-region
选项,单区域测试则省略该选项。
针对“oracle”失败(结果不同):
bash
./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
-unoptimized-query-oracle
选项用于检查两次“目标查询”的执行结果是否一致。
针对内部错误/断言失败/程序崩溃:
bash
./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
使用错误信息中具有辨识度的部分作为
-contains
的正则表达式(例如
"nil LeafTxnInputState"
)。
简化工具可能需要长达1小时才能完成运行。

Extract the Reduced SQL

提取简化后的SQL

The reduce tool outputs progress lines followed by the final SQL. Extract just the SQL:
bash
grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql
IMPORTANT: Immediately save a backup of the reduce output before manual simplification:
bash
cp reduced.sql reduced_original.sql
This provides a recovery point if the working file gets corrupted during simplification.
If the reduce tool fails to reproduce, pause here and report this to the user. They might have additional instructions. Occasionally we have to modify the reduce tool itself, if the test failure is not reproducing.
简化工具会输出进度信息,随后输出最终的SQL。仅提取SQL部分:
bash
grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql
重要提示: 在手动简化前,立即备份简化输出:
bash
cp reduced.sql reduced_original.sql
这样可以在工作文件损坏时恢复到备份版本。
如果简化工具无法复现问题,暂停操作并告知用户。 用户可能会提供额外的指导。有时我们需要修改简化工具本身来解决无法复现的问题。

Step 7: Create Test Script and Determine Reproduction Rate

步骤7:创建测试脚本并确定复现率

IMPORTANT: Many bugs are nondeterministic. Before manual simplification, create a reusable test script and determine the reproduction rate.
Create a small test script (adjust as needed):
bash
cat > test_repro.sh << 'EOF'
#!/bin/bash
重要提示: 许多bug是不确定的。在手动简化前,创建一个可复用的测试脚本并确定复现率。
创建一个小型测试脚本(可根据需要调整):
bash
cat > test_repro.sh << 'EOF'
#!/bin/bash

Test if reduced_v2.sql reproduces the error (exits on first success, up to 10 attempts)

测试reduced_v2.sql是否能复现错误(最多尝试10次,成功则退出)

for i in {1..10}; do if ./cockroach demo --multitenant=false --nodes=9 --insecure
--set=errexit=false --no-example-database --format=tsv
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then echo "Run $i: REPRODUCED" exit 0 else echo "Run $i: no error" fi done echo "FAILED" EOF chmod +x test_repro.sh
For "oracle" failures, instead of checking for an error pattern, the test script
probably needs to isolate and diff the results of the two executions of the
"Query of Interest".

Run the test script to determine the reproduction rate. It's not always 100%.

This rate determines how many attempts you need when testing simplifications:
- 100% rate: Single attempt sufficient
- 50% rate: 2-3 attempts usually sufficient
- 10% rate: Need ~10 attempts to be confident
- <5% rate: May need 20+ attempts

Note that in some cases, the following settings might need to be added back to
the reduced file to get a repro:
```sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
If the reduced SQL fails to reproduce after 10 attempts, pause here and report this to the user. They might have additional instructions.
for i in {1..10}; do if ./cockroach demo --multitenant=false --nodes=9 --insecure
--set=errexit=false --no-example-database --format=tsv
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then echo "第$i次运行:已复现" exit 0 else echo "第$i次运行:无错误" fi done echo "复现失败" EOF chmod +x test_repro.sh
对于“oracle”失败,测试脚本可能需要隔离并对比两次“目标查询”的执行结果,而非检查错误模式。

运行测试脚本以确定复现率。复现率并不总是100%。

复现率决定了测试简化时需要尝试的次数:
- 100%复现率:只需尝试1次
- 50%复现率:通常需要尝试2-3次
- 10%复现率:需要约10次尝试才能确认
- <5%复现率:可能需要20+次尝试

注意在某些情况下,可能需要将以下设置重新添加到简化后的文件中才能复现问题:
```sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
如果简化后的SQL尝试10次后仍无法复现,暂停操作并告知用户。 用户可能会提供额外的指导。

Step 8: Manual Simplification

步骤8:手动简化

Now iteratively simplify the SQL while maintaining reproduction.
CRITICAL: For nondeterministic failures, you MUST test each simplification with enough attempts based on the repro rate. A single failed attempt does NOT mean the simplification broke the repro - it may just be nondeterminism.
现在逐步简化SQL,同时保持问题可复现。
关键提示: 对于不确定的失败,必须根据复现率进行足够次数的尝试。单次尝试失败并不意味着简化操作破坏了复现——这可能只是不确定性导致的。

Workflow for Each Simplification

每次简化的工作流程

  1. Copy
    reduced.sql
    to
    reduced_v2.sql
  2. Make ONE small change to
    reduced_v2.sql
  3. Run
    ./test_repro.sh
    (which tests
    reduced_v2.sql
    )
  4. If it reproduces: Copy
    reduced_v2.sql
    to
    reduced.sql
    , continue simplifying
  5. If it doesn't reproduce after enough attempts: Discard
    reduced_v2.sql
    , try a different change (i.e. backtrack).
This workflow avoids needing to restore files - you always keep the last working version in
reduced.sql
.
IMPORTANT: Run copy, edit, and test as separate bash commands (not chained with
&&
). This reduces the number of permission checks.
  1. reduced.sql
    复制为
    reduced_v2.sql
  2. reduced_v2.sql
    进行一次小修改
  3. 运行
    ./test_repro.sh
    (测试
    reduced_v2.sql
  4. 如果能复现:将
    reduced_v2.sql
    复制为
    reduced.sql
    ,继续简化
  5. 如果经过足够次数尝试仍无法复现:丢弃
    reduced_v2.sql
    ,尝试其他修改(即回退)
该工作流程避免了需要恢复文件的情况——始终将最后可复现的版本保存在
reduced.sql
中。
重要提示: 复制、编辑和测试操作需作为单独的bash命令执行(不要用
&&
链接)。这可以减少权限检查的次数。

What to Try Removing (in rough order)

可尝试移除的内容(大致顺序)

  1. Query projections and aggregations - Simplify SELECT list to just essential columns
  2. Query predicates - Simplify WHERE clause
  3. Indexes - Try removing secondary indexes
  4. Query joins - Simplify WHERE clause
  5. Columns from CREATE TABLE - Remove columns not referenced in the failing query
  6. Weird characters - Remove or replace non-ASCII characters from names and data
  7. other SQL simplifications
For "oracle" failures, when editing the Query of Interest, be sure to edit BOTH copies of the Query of Interest so that they are identical. Otherwise it won't be an apples-to-apples comparison when diffing the result sets.
  1. 查询投影和聚合——将SELECT列表简化为仅包含必要的列
  2. 查询谓词——简化WHERE子句
  3. 索引——尝试移除二级索引
  4. 查询连接——简化WHERE子句
  5. CREATE TABLE中的列——移除未在失败查询中引用的列
  6. 特殊字符——移除或替换名称和数据中的非ASCII字符
  7. 其他SQL简化操作
对于“oracle”失败,在编辑目标查询时,确保同时修改两次目标查询,使其保持一致。 否则在对比结果集时将无法进行公平比较。

Common Required Elements

通常无法移除的元素

These often cannot be removed:
  • Optimizer random seed:
    SET testing_optimizer_random_seed = <value>
    - this specific value often cannot be changed, as it determines which optimizer rules are disabled
  • Optimizer rule probability:
    SET testing_optimizer_disable_rule_probability
    • affects query plan selection
  • Specific RESET/SET sequences for optimizer settings, such as distsql and vectorize
  • Certain indexes (affect query plans)
  • Multi-node setup (
    --nodes=9
    ) for distributed query bugs (though try single-node first - it may work and is simpler)
  • CREATE STATISTICS
    statements (affect query planning)
  • 优化器随机种子
    SET testing_optimizer_random_seed = <value>
    ——该特定值通常无法更改,因为它决定了哪些优化器规则被禁用
  • 优化器规则概率
    SET testing_optimizer_disable_rule_probability
    ——影响查询计划的选择
  • 优化器设置的特定RESET/SET序列,如distsql和vectorize
  • 某些索引(影响查询计划)
  • 多节点设置(
    --nodes=9
    )用于分布式查询bug(不过可先尝试单节点——可能有效且更简单)
  • CREATE STATISTICS
    语句(影响查询计划)

Backtracking

回退操作

If a change breaks reproduction:
  1. Discard
    reduced_v2.sql
    (don't copy it to
    reduced.sql
    )
  2. Verify
    reduced.sql
    still reproduces. If it doesn't, this means the repro is nondeterministic. (It might have started out nondeterministic, or might have become nondeterministic over the course of simplification.) Try reproducing it 10 times and note the new repro rate. Use the new repro rate to adjust the number of repro attempts during each simplification step going forward.
  3. Try a DIFFERENT simplification
Never continue simplifying from a broken state.
If you get stuck (i.e. cannot reproduce again after backtracking), stop and report to the user with the exact command you were trying.
如果修改后无法复现:
  1. 丢弃
    reduced_v2.sql
    (不要复制到
    reduced.sql
  2. 验证
    reduced.sql
    是否仍能复现。如果不能,说明复现是不确定的。(可能一开始就是不确定的,也可能在简化过程中变得不确定。)尝试复现10次并记录新的复现率。后续简化步骤需根据新的复现率调整尝试次数。
  3. 尝试其他简化修改
绝不能从无法复现的状态继续简化。
如果遇到瓶颈(即回退后仍无法复现),停止操作并告知用户所使用的具体命令。

Step 9: Final Verification and Output

步骤9:最终验证与输出

After about 20 minutes of simplification, or if there are no more simplifications after backtracking a few times, it's time to stop.
  1. Run reproduction 10+ times to confirm stability and determine final repro rate
  2. Document the minimal reproduction steps
  3. Note which elements were required vs optional
经过约20分钟的简化,或回退几次后无法再进行简化时,停止操作。
  1. 运行10+次以确认稳定性并确定最终复现率
  2. 记录最简复现步骤
  3. 记录哪些元素是必需的,哪些是可选的

Output

输出内容

The final output should include two files that can be shown to the user:
  1. reduced.sql - The minimal SQL script that reproduces the bug
  2. bisect_run.sh - A script for use with
    git bisect run
Write the output in such a way that it could be copied and pasted into a terminal.
最终输出应包含两个可展示给用户的文件:
  1. reduced.sql——能复现bug的最简SQL脚本
  2. bisect_run.sh——用于
    git bisect run
    的脚本
输出内容需可直接复制粘贴到终端中执行。

Example Output Format

示例输出格式

(The commands in this output should be edited to match what was necessary to reproduce.)
bash
undefined
(输出中的命令需根据实际复现需求进行编辑。)
bash
undefined

Minimal Reproduction

最简复现用例

reduced.sql

reduced.sql

cat > reduced.sql << 'EOF' CREATE TABLE t ();
SET testing_optimizer_random_seed = 1234567890; SET testing_optimizer_disable_rule_probability = 0.5;
SELECT ...; EOF
cat > reduced.sql << 'EOF' CREATE TABLE t ();
SET testing_optimizer_random_seed = 1234567890; SET testing_optimizer_disable_rule_probability = 0.5;
SELECT ...; EOF

bisect_run.sh

bisect_run.sh

cat > bisect_run.sh << 'EOF' #!/bin/bash
cat > bisect_run.sh << 'EOF' #!/bin/bash

Git bisect run script

Git bisect运行脚本

Exit codes: 0=good (bug not present), 1=bad (bug present), 125=skip (build failed)

退出码:0=正常(无bug),1=异常(有bug),125=跳过(构建失败)

REPO_DIR="/path/to/cockroach" REPRO_SQL="/path/to/reduced.sql"
cd "$REPO_DIR" || exit 125
echo "=== Testing commit $(git rev-parse --short HEAD) ==="
REPO_DIR="/path/to/cockroach" REPRO_SQL="/path/to/reduced.sql"
cd "$REPO_DIR" || exit 125
echo "=== 测试提交 $(git rev-parse --short HEAD) ==="

Build (use --crdb_test if runtime assertions were enabled in the original test)

构建(如果原始测试中启用了运行时断言,使用--crdb_test)

if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then echo "BUILD FAILED - skipping" exit 125 fi
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then echo "构建失败 - 跳过" exit 125 fi

Test for bug (try 3 times for flaky bugs)

测试bug(针对不稳定bug尝试3次)

for i in {1..3}; do if ./cockroach demo --multitenant=false --insecure
--set=errexit=false --no-example-database --format=tsv
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then echo "BUG PRESENT - marking as BAD" exit 1 fi done
echo "Bug not present - marking as GOOD" exit 0 EOF chmod +x bisect_run.sh
for i in {1..3}; do if ./cockroach demo --multitenant=false --insecure
--set=errexit=false --no-example-database --format=tsv
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then echo "存在BUG - 标记为异常" exit 1 fi done
echo "无BUG - 标记为正常" exit 0 EOF chmod +x bisect_run.sh

Command to reproduce

复现命令

git checkout <commit-hash> ./bisect_run.sh
git checkout <commit-hash> ./bisect_run.sh

Command to bisect

二分查找命令

git bisect start ... git bisect run bisect_run.sh
git bisect start ... git bisect run bisect_run.sh

Failure

失败信息

<paste stacktrace or relevant failure details here>

<粘贴堆栈跟踪或相关失败详情>

Repro rate: ~X% (may need multiple attempts)

复现率:约X%(可能需要多次尝试)


**After showing this output, ask the user if they want to try reproducing the
bug on master branch.**

**展示输出后,询问用户是否需要在master分支上尝试复现该bug。**

Optional Step 10: Check if Bug is Fixed on Master

可选步骤10:检查master分支是否已修复该bug

Before bisecting, check whether the bug has already been fixed on master.
bash
git stash  # if needed
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql
Run this a few times to account for flakiness. Note whether the bug reproduces on master or not.
在进行二分查找前,检查该bug是否已在master分支上修复。
bash
git stash  # 如有需要
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql
运行几次以应对不确定性。记录该bug在master分支上是否可复现。

Optional Step 11: Bisect

可选步骤11:二分查找

If the user wants to find the commit that introduced or fixed the bug, use
git bisect
.
如果用户需要找到引入或修复该bug的提交,使用
git bisect

If the Bug is Already Fixed on Master

如果bug已在master分支上修复

Bisect to find the fix commit (the first commit where the bug no longer reproduces). Use custom terms since the "good" commit (master) is newer than the "bad" commit:
bash
git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists>   # e.g., the original failing commit
git bisect fixed master                        # master is fixed

git bisect run ./bisect_run.sh
通过二分查找找到修复提交(即bug不再复现的第一个提交)。由于“正常”提交(master)比“异常”提交新,需使用自定义术语:
bash
git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists>   # 例如,原始失败的提交
git bisect fixed master                        # master分支已修复

git bisect run ./bisect_run.sh

When done

完成后

git bisect reset

**Note:** The `--first-parent` option follows only merge commits on the main
branch, avoiding detours into feature branches. The bisect script must return 0
when the bug is NOT present (fixed) and 1 when the bug IS present (broken).
git bisect reset

**注意:** `--first-parent`选项仅跟踪主分支上的合并提交,避免进入功能分支。二分查找脚本在无bug(已修复)时需返回0,有bug(未修复)时返回1。

If the Bug Still Exists on Master

如果bug在master分支上仍存在

Bisect to find the regression commit (the first commit where the bug was introduced):
bash
git bisect start --first-parent
git bisect good <known-good-commit>   # e.g., a previous release tag
git bisect bad master                  # master has the bug

git bisect run ./bisect_run.sh
通过二分查找找到回归提交(即引入bug的第一个提交):
bash
git bisect start --first-parent
git bisect good <known-good-commit>   # 例如,之前的发布标签
git bisect bad master                  # master分支存在bug

git bisect run ./bisect_run.sh

When done

完成后

git bisect reset

The bisect will identify the commit that introduced or fixed the bug.
git bisect reset

二分查找会定位到引入或修复bug的提交。

Finding a Good Commit

查找正常提交

If you don't know a good commit (where the bug doesn't exist), you can jump back in time to find one.
bash
undefined
如果不知道正常提交(无bug的提交),可回溯到过去查找:
bash
undefined

Find a commit from ~6 months ago on the main branch

在主分支上找到约6个月前的提交

git rev-list --first-parent -1 --before="6 months ago" HEAD

Test whether the bug exists at that commit. If not, use it as the good commit
for bisect. If the bug still exists, try going back further in time, but don't
go back further than 1 year.

**If a known good commit can't be found within 1 year, stop and report this to
the user.**
git rev-list --first-parent -1 --before="6 months ago" HEAD

测试该提交是否存在bug。如果不存在,将其用作二分查找的正常提交。如果仍存在bug,尝试回溯更早的时间,但不要超过1年。

**如果1年内无法找到已知的正常提交,停止操作并告知用户。**