reduce-unoptimized-query-oracle
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseReduce Unoptimized Query Oracle Test Failure
简化Unoptimized Query Oracle测试失败问题
Reduce an unoptimized-query-oracle test failure log to the simplest possible
reproduction case.
The unoptimized-query-oracle roachtest runs a series of random SQL statements to
create a random dataset, and then executes a random "Query of Interest" twice,
with different optimization settings. If the two executions return different
results, it indicates a bug in CockroachDB.
将unoptimized-query-oracle测试失败日志简化为最精简的复现用例。
unoptimized-query-oracle roachtest会运行一系列随机SQL语句来创建随机数据集,然后使用不同的优化设置,两次执行随机生成的“目标查询”。如果两次执行返回的结果不同,则表明CockroachDB中存在bug。
When to Use
适用场景
Use this skill when:
- You have a test failure from the unoptimized-query-oracle roachtest.
- You need to find the minimal SQL to reproduce the test failure.
在以下场景中使用本方法:
- 你遇到了unoptimized-query-oracle roachtest的测试失败问题
- 你需要找到能复现该测试失败的最简SQL
Step 1: Locate artifacts
步骤1:定位产物文件
Ask the user where the artifacts directory is.
Find the relevant files in the artifacts directory:
- Test parameters: (the parameters from the roachtest)
params.log - Test log: (the log from the roachtest)
test.log - Failure log: (the failure log from the roachtest)
failure*.log - Full SQL log: (the SQL statements that led to failure)
unoptimized-query-oracle*.log - Query of interest log: (containing the query of interest and possibly more information about the failure)
unoptimized-query-oracle*.failure.log - Cockroach log: or
logs/1.unredacted/cockroach.log(contains the git commit)logs/unredacted/cockroach.log
询问用户产物目录的位置。
在产物目录中找到以下相关文件:
- 测试参数:(来自roachtest的参数配置)
params.log - 测试日志:(来自roachtest的运行日志)
test.log - 失败日志:(来自roachtest的失败详情日志)
failure*.log - 完整SQL日志:(导致失败的所有SQL语句)
unoptimized-query-oracle*.log - 目标查询日志:(包含目标查询及可能的更多失败信息)
unoptimized-query-oracle*.failure.log - Cockroach日志:或
logs/1.unredacted/cockroach.log(包含Git提交哈希)logs/unredacted/cockroach.log
Step 2: Determine test configuration
步骤2:确定测试配置
Determine the git commit from :
cockroach.logbash
grep "binary: CockroachDB" cockroach.logLook for the commit hash in the version string (e.g., ).
cb94db961b8f55e3473f279d98ae90f0eeb0adcbDetermine if runtime assertions are enabled by checking for:
- in
"runtimeAssertionsBuild": "true"params.log - or in
Runtime assertions enabledtest.log
Determine if metamorphic settings apply by looking for:
- lines like these in :
params.log"metamorphicBufferedSender": "true", "metamorphicWriteBuffering": "true", - or lines like these in :
test.logmetamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true' metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'
Determine environment variables from the beginning of :
cockroach.logbash
grep -A10 "using local environment variables:" cockroach.logImportant environment variables include:
COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATALCOCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTINGCOCKROACH_RANDOM_SEED- But there might be more important environment variables, so best to get all of them.
COCKROACH_TESTING_FORCE_RELEASE_BRANCH
Determine if this is a multi-region test or single-region test by checking:
- the test name (e.g., in
seed-multi-regionindicates multi-region)test.log - or the presence of lines in the full SQL log If both of these are missing, it's a single-region test.
\connect
从中获取Git提交哈希:
cockroach.logbash
grep "binary: CockroachDB" cockroach.log在版本字符串中查找提交哈希(例如:)。
cb94db961b8f55e3473f279d98ae90f0eeb0adcb通过以下方式判断是否启用了运行时断言:
- 在中查找
params.log"runtimeAssertionsBuild": "true" - 或在中查找
test.logRuntime assertions enabled
通过以下方式判断是否应用了变形配置:
- 在中查找类似如下内容:
params.log"metamorphicBufferedSender": "true", "metamorphicWriteBuffering": "true", - 或在中查找类似如下内容:
test.logmetamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true' metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'
从开头获取环境变量:
cockroach.logbash
grep -A10 "using local environment variables:" cockroach.log重要的环境变量包括:
COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATALCOCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTINGCOCKROACH_RANDOM_SEED- 但可能还有其他重要的环境变量,建议获取全部内容。
COCKROACH_TESTING_FORCE_RELEASE_BRANCH
通过以下方式判断是多区域测试还是单区域测试:
- 测试名称(例如中的
test.log表示多区域测试)seed-multi-region - 或完整SQL日志中是否存在语句 如果以上两者都没有,则为单区域测试。
\connect
Step 3: Check Out and Build
步骤3:检出代码并构建
For a normal build use:
bash
git checkout <commit-hash>
./dev build shortIf runtime assertions were enabled, use a test build instead:
bash
git checkout <commit-hash>
./dev build short -- --crdb_testNote: Only build libgeos if the reproduction uses geospatial functions (BOX2D,
geometry, geography, etc.):
bash
./dev build libgeos常规构建命令:
bash
git checkout <commit-hash>
./dev build short如果启用了运行时断言,则使用测试构建命令:
bash
git checkout <commit-hash>
./dev build short -- --crdb_test注意: 只有当复现用例涉及地理空间函数(如BOX2D、geometry、geography等)时,才需要构建libgeos:
bash
./dev build libgeosStep 4: Prepare the Full SQL Log File
步骤4:准备完整SQL日志文件
First, check that the following statements are at the top of the full SQL log
file. If they are not, add them:
sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;If metamorphic settings were used, also add them to the top of the full SQL log
file:
sql
SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;Create an appropriate directory either in the artifacts directory or in the
repository root for holding temp files.
首先,检查完整SQL日志文件的开头是否包含以下语句。如果没有,则添加:
sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;如果使用了变形配置,还需要在完整SQL日志文件的开头添加以下语句:
sql
SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;在产物目录或代码仓库根目录中创建一个合适的目录,用于存放临时文件。
Step 5: Initial Reproduction
步骤5:初始复现
Determine the correct demo command based on test type:
- Multi-region test: Use
--nodes=9 - Single-region test: Omit option
--nodes
Use a command like this to try reproducing the test failure from the full SQL
log file. This command could take up to 20 minutes to finish.
bash
<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>Check that the output reproduces the test failure described in the failure
log. There are many possible failure modes. Look for one of the following,
which should match the failure log:
- Different results between the two executions of the "Query of Interest" (which is the randomly generated SELECT statement repeated twice near the end of the log, wrapped in various SET and RESET staements). These different results could take the form of different result sets, or could also be an error in one case and no error in the other case. This is an "oracle" failure.
- Or, or assertion failure. Note the error message for the reduce step.
internal error - Or, a panic. Note the error message for the reduce step.
- Or, a timeout. Note the statement that timed out.
根据测试类型选择正确的demo命令:
- 多区域测试:使用
--nodes=9 - 单区域测试:省略选项
--nodes
使用如下命令尝试通过完整SQL日志文件复现测试失败。该命令可能需要长达20分钟才能完成。
bash
<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>检查输出是否复现了失败日志中描述的问题。 失败模式有很多种,寻找以下与失败日志匹配的情况:
- 两次执行“目标查询”的结果不同(目标查询是日志末尾附近重复两次的随机生成SELECT语句,被各种SET和RESET语句包裹)。结果不同可能表现为结果集不同,也可能是其中一次执行报错而另一次不报错。这属于**“oracle”失败**。
- 出现或断言失败。记录错误信息以便后续简化步骤使用。
internal error - 出现程序崩溃(panic)。记录错误信息以便后续简化步骤使用。
- 出现超时。记录超时的语句。
Troubleshooting
故障排查
IMPORTANT: Many failures are nondeterministic, especially for multi-region
tests. If no failure happens on the first run, try up to 10 times before
concluding it doesn't reproduce.
It can be helpful at this point to compare the output with the
which should show the failure from the original test run.
failure*.logIf the initial run fails to reproduce after 10 times, pause here and report to
the user that the failure cannot be reproduced, and show the command that was
tried. The user might have additional instructions.
If it looks like it reproduces, it's time to move on to the next step.
重要提示: 许多失败是不确定的,尤其是多区域测试。如果第一次运行没有复现失败,最多尝试10次后再得出无法复现的结论。
此时可将输出与进行对比,该日志记录了原始测试运行中的失败情况。
failure*.log如果初始运行尝试10次后仍无法复现,暂停操作并告知用户无法复现,同时展示所使用的命令。 用户可能会提供额外的指导。
如果成功复现,则进入下一步。
Step 6: Use the Reduce Tool
步骤6:使用简化工具
Build the reduce tool:
bash
./dev build reduce构建简化工具:
bash
./dev build reducePrepare the Full SQL Log File again
再次准备完整SQL日志文件
For multi-region tests, remove lines (they cause syntax errors in the
tool):
\connectreducebash
grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>对于多区域测试,移除语句(这些语句会导致工具出现语法错误):
\connectreducebash
grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>Run Reduce
运行简化工具
IMPORTANT: The reduce tool must be run from the cockroach repository root
directory, because it looks for in the current directory.
./cockroachUse the option for multi-region tests, or omit it for
single-region tests.
-multi-regionFor "oracle" failures (different results):
bash
./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.logThe option checks whether the two executions of the
"Query of Interest" produce the same results.
-unoptimized-query-oracleFor internal errors/assertion failures/panics:
bash
./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.logUse a distinctive part of the error message as the regex (e.g.,
).
-contains"nil LeafTxnInputState"The reduce tool might take up to an hour to run.
重要提示: 必须在Cockroach代码仓库根目录下运行简化工具,因为它会在当前目录中查找。
./cockroach对于多区域测试使用选项,单区域测试则省略该选项。
-multi-region针对“oracle”失败(结果不同):
bash
./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log-unoptimized-query-oracle针对内部错误/断言失败/程序崩溃:
bash
./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log使用错误信息中具有辨识度的部分作为的正则表达式(例如)。
-contains"nil LeafTxnInputState"简化工具可能需要长达1小时才能完成运行。
Extract the Reduced SQL
提取简化后的SQL
The reduce tool outputs progress lines followed by the final SQL. Extract just the SQL:
bash
grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sqlIMPORTANT: Immediately save a backup of the reduce output before manual simplification:
bash
cp reduced.sql reduced_original.sqlThis provides a recovery point if the working file gets corrupted during simplification.
If the reduce tool fails to reproduce, pause here and report this to the
user. They might have additional instructions. Occasionally we have to modify
the reduce tool itself, if the test failure is not reproducing.
简化工具会输出进度信息,随后输出最终的SQL。仅提取SQL部分:
bash
grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql重要提示: 在手动简化前,立即备份简化输出:
bash
cp reduced.sql reduced_original.sql这样可以在工作文件损坏时恢复到备份版本。
如果简化工具无法复现问题,暂停操作并告知用户。 用户可能会提供额外的指导。有时我们需要修改简化工具本身来解决无法复现的问题。
Step 7: Create Test Script and Determine Reproduction Rate
步骤7:创建测试脚本并确定复现率
IMPORTANT: Many bugs are nondeterministic. Before manual simplification,
create a reusable test script and determine the reproduction rate.
Create a small test script (adjust as needed):
bash
cat > test_repro.sh << 'EOF'
#!/bin/bash重要提示: 许多bug是不确定的。在手动简化前,创建一个可复用的测试脚本并确定复现率。
创建一个小型测试脚本(可根据需要调整):
bash
cat > test_repro.sh << 'EOF'
#!/bin/bashTest if reduced_v2.sql reproduces the error (exits on first success, up to 10 attempts)
测试reduced_v2.sql是否能复现错误(最多尝试10次,成功则退出)
for i in {1..10}; do
if ./cockroach demo --multitenant=false --nodes=9 --insecure
--set=errexit=false --no-example-database --format=tsv
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then echo "Run $i: REPRODUCED" exit 0 else echo "Run $i: no error" fi done echo "FAILED" EOF chmod +x test_repro.sh
--set=errexit=false --no-example-database --format=tsv
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then echo "Run $i: REPRODUCED" exit 0 else echo "Run $i: no error" fi done echo "FAILED" EOF chmod +x test_repro.sh
For "oracle" failures, instead of checking for an error pattern, the test script
probably needs to isolate and diff the results of the two executions of the
"Query of Interest".
Run the test script to determine the reproduction rate. It's not always 100%.
This rate determines how many attempts you need when testing simplifications:
- 100% rate: Single attempt sufficient
- 50% rate: 2-3 attempts usually sufficient
- 10% rate: Need ~10 attempts to be confident
- <5% rate: May need 20+ attempts
Note that in some cases, the following settings might need to be added back to
the reduced file to get a repro:
```sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;If the reduced SQL fails to reproduce after 10 attempts, pause here and report
this to the user. They might have additional instructions.
for i in {1..10}; do
if ./cockroach demo --multitenant=false --nodes=9 --insecure
--set=errexit=false --no-example-database --format=tsv
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then echo "第$i次运行:已复现" exit 0 else echo "第$i次运行:无错误" fi done echo "复现失败" EOF chmod +x test_repro.sh
--set=errexit=false --no-example-database --format=tsv
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then echo "第$i次运行:已复现" exit 0 else echo "第$i次运行:无错误" fi done echo "复现失败" EOF chmod +x test_repro.sh
对于“oracle”失败,测试脚本可能需要隔离并对比两次“目标查询”的执行结果,而非检查错误模式。
运行测试脚本以确定复现率。复现率并不总是100%。
复现率决定了测试简化时需要尝试的次数:
- 100%复现率:只需尝试1次
- 50%复现率:通常需要尝试2-3次
- 10%复现率:需要约10次尝试才能确认
- <5%复现率:可能需要20+次尝试
注意在某些情况下,可能需要将以下设置重新添加到简化后的文件中才能复现问题:
```sql
SET statement_timeout='1m0s';
SET sql_safe_updates = false;如果简化后的SQL尝试10次后仍无法复现,暂停操作并告知用户。 用户可能会提供额外的指导。
Step 8: Manual Simplification
步骤8:手动简化
Now iteratively simplify the SQL while maintaining reproduction.
CRITICAL: For nondeterministic failures, you MUST test each simplification
with enough attempts based on the repro rate. A single failed attempt does NOT
mean the simplification broke the repro - it may just be nondeterminism.
现在逐步简化SQL,同时保持问题可复现。
关键提示: 对于不确定的失败,必须根据复现率进行足够次数的尝试。单次尝试失败并不意味着简化操作破坏了复现——这可能只是不确定性导致的。
Workflow for Each Simplification
每次简化的工作流程
- Copy to
reduced.sqlreduced_v2.sql - Make ONE small change to
reduced_v2.sql - Run (which tests
./test_repro.sh)reduced_v2.sql - If it reproduces: Copy to
reduced_v2.sql, continue simplifyingreduced.sql - If it doesn't reproduce after enough attempts: Discard , try a different change (i.e. backtrack).
reduced_v2.sql
This workflow avoids needing to restore files - you always keep the last working
version in .
reduced.sqlIMPORTANT: Run copy, edit, and test as separate bash commands (not chained with ).
This reduces the number of permission checks.
&&- 将复制为
reduced.sqlreduced_v2.sql - 对进行一次小修改
reduced_v2.sql - 运行(测试
./test_repro.sh)reduced_v2.sql - 如果能复现:将复制为
reduced_v2.sql,继续简化reduced.sql - 如果经过足够次数尝试仍无法复现:丢弃,尝试其他修改(即回退)
reduced_v2.sql
该工作流程避免了需要恢复文件的情况——始终将最后可复现的版本保存在中。
reduced.sql重要提示: 复制、编辑和测试操作需作为单独的bash命令执行(不要用链接)。这可以减少权限检查的次数。
&&What to Try Removing (in rough order)
可尝试移除的内容(大致顺序)
- Query projections and aggregations - Simplify SELECT list to just essential columns
- Query predicates - Simplify WHERE clause
- Indexes - Try removing secondary indexes
- Query joins - Simplify WHERE clause
- Columns from CREATE TABLE - Remove columns not referenced in the failing query
- Weird characters - Remove or replace non-ASCII characters from names and data
- other SQL simplifications
For "oracle" failures, when editing the Query of Interest, be sure to edit
BOTH copies of the Query of Interest so that they are identical. Otherwise
it won't be an apples-to-apples comparison when diffing the result sets.
- 查询投影和聚合——将SELECT列表简化为仅包含必要的列
- 查询谓词——简化WHERE子句
- 索引——尝试移除二级索引
- 查询连接——简化WHERE子句
- CREATE TABLE中的列——移除未在失败查询中引用的列
- 特殊字符——移除或替换名称和数据中的非ASCII字符
- 其他SQL简化操作
对于“oracle”失败,在编辑目标查询时,确保同时修改两次目标查询,使其保持一致。 否则在对比结果集时将无法进行公平比较。
Common Required Elements
通常无法移除的元素
These often cannot be removed:
- Optimizer random seed: - this specific value often cannot be changed, as it determines which optimizer rules are disabled
SET testing_optimizer_random_seed = <value> - Optimizer rule probability:
SET testing_optimizer_disable_rule_probability- affects query plan selection
- Specific RESET/SET sequences for optimizer settings, such as distsql and vectorize
- Certain indexes (affect query plans)
- Multi-node setup () for distributed query bugs (though try single-node first - it may work and is simpler)
--nodes=9 - statements (affect query planning)
CREATE STATISTICS
- 优化器随机种子:——该特定值通常无法更改,因为它决定了哪些优化器规则被禁用
SET testing_optimizer_random_seed = <value> - 优化器规则概率:——影响查询计划的选择
SET testing_optimizer_disable_rule_probability - 优化器设置的特定RESET/SET序列,如distsql和vectorize
- 某些索引(影响查询计划)
- 多节点设置()用于分布式查询bug(不过可先尝试单节点——可能有效且更简单)
--nodes=9 - 语句(影响查询计划)
CREATE STATISTICS
Backtracking
回退操作
If a change breaks reproduction:
- Discard (don't copy it to
reduced_v2.sql)reduced.sql - Verify still reproduces. If it doesn't, this means the repro is nondeterministic. (It might have started out nondeterministic, or might have become nondeterministic over the course of simplification.) Try reproducing it 10 times and note the new repro rate. Use the new repro rate to adjust the number of repro attempts during each simplification step going forward.
reduced.sql - Try a DIFFERENT simplification
Never continue simplifying from a broken state.
If you get stuck (i.e. cannot reproduce again after backtracking), stop and
report to the user with the exact command you were trying.
如果修改后无法复现:
- 丢弃(不要复制到
reduced_v2.sql)reduced.sql - 验证是否仍能复现。如果不能,说明复现是不确定的。(可能一开始就是不确定的,也可能在简化过程中变得不确定。)尝试复现10次并记录新的复现率。后续简化步骤需根据新的复现率调整尝试次数。
reduced.sql - 尝试其他简化修改
绝不能从无法复现的状态继续简化。
如果遇到瓶颈(即回退后仍无法复现),停止操作并告知用户所使用的具体命令。
Step 9: Final Verification and Output
步骤9:最终验证与输出
After about 20 minutes of simplification, or if there are no more
simplifications after backtracking a few times, it's time to stop.
- Run reproduction 10+ times to confirm stability and determine final repro rate
- Document the minimal reproduction steps
- Note which elements were required vs optional
经过约20分钟的简化,或回退几次后无法再进行简化时,停止操作。
- 运行10+次以确认稳定性并确定最终复现率
- 记录最简复现步骤
- 记录哪些元素是必需的,哪些是可选的
Output
输出内容
The final output should include two files that can be shown to the user:
- reduced.sql - The minimal SQL script that reproduces the bug
- bisect_run.sh - A script for use with
git bisect run
Write the output in such a way that it could be copied and pasted into a
terminal.
最终输出应包含两个可展示给用户的文件:
- reduced.sql——能复现bug的最简SQL脚本
- bisect_run.sh——用于的脚本
git bisect run
输出内容需可直接复制粘贴到终端中执行。
Example Output Format
示例输出格式
(The commands in this output should be edited to match what was necessary to
reproduce.)
bash
undefined(输出中的命令需根据实际复现需求进行编辑。)
bash
undefinedMinimal Reproduction
最简复现用例
reduced.sql
reduced.sql
cat > reduced.sql << 'EOF'
CREATE TABLE t ();
SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;
SELECT ...;
EOF
cat > reduced.sql << 'EOF'
CREATE TABLE t ();
SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;
SELECT ...;
EOF
bisect_run.sh
bisect_run.sh
cat > bisect_run.sh << 'EOF'
#!/bin/bash
cat > bisect_run.sh << 'EOF'
#!/bin/bash
Git bisect run script
Git bisect运行脚本
Exit codes: 0=good (bug not present), 1=bad (bug present), 125=skip (build failed)
退出码:0=正常(无bug),1=异常(有bug),125=跳过(构建失败)
REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"
cd "$REPO_DIR" || exit 125
echo "=== Testing commit $(git rev-parse --short HEAD) ==="
REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"
cd "$REPO_DIR" || exit 125
echo "=== 测试提交 $(git rev-parse --short HEAD) ==="
Build (use --crdb_test if runtime assertions were enabled in the original test)
构建(如果原始测试中启用了运行时断言,使用--crdb_test)
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
echo "BUILD FAILED - skipping"
exit 125
fi
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
echo "构建失败 - 跳过"
exit 125
fi
Test for bug (try 3 times for flaky bugs)
测试bug(针对不稳定bug尝试3次)
for i in {1..3}; do
if ./cockroach demo --multitenant=false --insecure
--set=errexit=false --no-example-database --format=tsv
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then echo "BUG PRESENT - marking as BAD" exit 1 fi done
--set=errexit=false --no-example-database --format=tsv
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then echo "BUG PRESENT - marking as BAD" exit 1 fi done
echo "Bug not present - marking as GOOD"
exit 0
EOF
chmod +x bisect_run.sh
for i in {1..3}; do
if ./cockroach demo --multitenant=false --insecure
--set=errexit=false --no-example-database --format=tsv
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then echo "存在BUG - 标记为异常" exit 1 fi done
--set=errexit=false --no-example-database --format=tsv
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then echo "存在BUG - 标记为异常" exit 1 fi done
echo "无BUG - 标记为正常"
exit 0
EOF
chmod +x bisect_run.sh
Command to reproduce
复现命令
git checkout <commit-hash>
./bisect_run.sh
git checkout <commit-hash>
./bisect_run.sh
Command to bisect
二分查找命令
git bisect start ...
git bisect run bisect_run.sh
git bisect start ...
git bisect run bisect_run.sh
Failure
失败信息
<paste stacktrace or relevant failure details here>
<粘贴堆栈跟踪或相关失败详情>
Repro rate: ~X% (may need multiple attempts)
复现率:约X%(可能需要多次尝试)
**After showing this output, ask the user if they want to try reproducing the
bug on master branch.**
**展示输出后,询问用户是否需要在master分支上尝试复现该bug。**Optional Step 10: Check if Bug is Fixed on Master
可选步骤10:检查master分支是否已修复该bug
Before bisecting, check whether the bug has already been fixed on master.
bash
git stash # if needed
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sqlRun this a few times to account for flakiness. Note whether the bug reproduces
on master or not.
在进行二分查找前,检查该bug是否已在master分支上修复。
bash
git stash # 如有需要
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql运行几次以应对不确定性。记录该bug在master分支上是否可复现。
Optional Step 11: Bisect
可选步骤11:二分查找
If the user wants to find the commit that introduced or fixed the bug, use
.
git bisect如果用户需要找到引入或修复该bug的提交,使用。
git bisectIf the Bug is Already Fixed on Master
如果bug已在master分支上修复
Bisect to find the fix commit (the first commit where the bug no longer
reproduces). Use custom terms since the "good" commit (master) is newer than the
"bad" commit:
bash
git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists> # e.g., the original failing commit
git bisect fixed master # master is fixed
git bisect run ./bisect_run.sh通过二分查找找到修复提交(即bug不再复现的第一个提交)。由于“正常”提交(master)比“异常”提交新,需使用自定义术语:
bash
git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists> # 例如,原始失败的提交
git bisect fixed master # master分支已修复
git bisect run ./bisect_run.shWhen done
完成后
git bisect reset
**Note:** The `--first-parent` option follows only merge commits on the main
branch, avoiding detours into feature branches. The bisect script must return 0
when the bug is NOT present (fixed) and 1 when the bug IS present (broken).git bisect reset
**注意:** `--first-parent`选项仅跟踪主分支上的合并提交,避免进入功能分支。二分查找脚本在无bug(已修复)时需返回0,有bug(未修复)时返回1。If the Bug Still Exists on Master
如果bug在master分支上仍存在
Bisect to find the regression commit (the first commit where the bug was
introduced):
bash
git bisect start --first-parent
git bisect good <known-good-commit> # e.g., a previous release tag
git bisect bad master # master has the bug
git bisect run ./bisect_run.sh通过二分查找找到回归提交(即引入bug的第一个提交):
bash
git bisect start --first-parent
git bisect good <known-good-commit> # 例如,之前的发布标签
git bisect bad master # master分支存在bug
git bisect run ./bisect_run.shWhen done
完成后
git bisect reset
The bisect will identify the commit that introduced or fixed the bug.git bisect reset
二分查找会定位到引入或修复bug的提交。Finding a Good Commit
查找正常提交
If you don't know a good commit (where the bug doesn't exist), you can jump back
in time to find one.
bash
undefined如果不知道正常提交(无bug的提交),可回溯到过去查找:
bash
undefinedFind a commit from ~6 months ago on the main branch
在主分支上找到约6个月前的提交
git rev-list --first-parent -1 --before="6 months ago" HEAD
Test whether the bug exists at that commit. If not, use it as the good commit
for bisect. If the bug still exists, try going back further in time, but don't
go back further than 1 year.
**If a known good commit can't be found within 1 year, stop and report this to
the user.**git rev-list --first-parent -1 --before="6 months ago" HEAD
测试该提交是否存在bug。如果不存在,将其用作二分查找的正常提交。如果仍存在bug,尝试回溯更早的时间,但不要超过1年。
**如果1年内无法找到已知的正常提交,停止操作并告知用户。**