ai-regression-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Regression Testing
AI回归测试
Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
专为AI辅助开发设计的测试模式——在这种开发模式中,同一模型既编写代码又审查代码,会产生只有自动化测试才能发现的系统性盲点。
When to Activate
适用场景
- AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
- A bug was found and fixed — need to prevent re-introduction
- Project has a sandbox/mock mode that can be leveraged for DB-free testing
- Running or similar review commands after code changes
/bug-check - Multiple code paths exist (sandbox vs production, feature flags, etc.)
- AI代理(Claude Code、Cursor、Codex)修改了API路由或后端逻辑
- 发现并修复了漏洞——需要防止漏洞再次出现
- 项目具备沙箱/模拟模式,可用于无数据库测试
- 代码变更后运行或类似审查命令
/bug-check - 存在多条代码路径(沙箱vs生产环境、功能标志等)
The Core Problem
核心问题
When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
AI writes fix → AI reviews fix → AI says "looks correct" → Bug still existsReal-world example (observed in production):
Fix 1: Added notification_settings to API response
→ Forgot to add it to the SELECT query
→ AI reviewed and missed it (same blind spot)
Fix 2: Added it to SELECT query
→ TypeScript build error (column not in generated types)
→ AI reviewed Fix 1 but didn't catch the SELECT issue
Fix 3: Changed to SELECT *
→ Fixed production path, forgot sandbox path
→ AI reviewed and missed it AGAIN (4th occurrence)
Fix 4: Test caught it instantly on first run PASS:The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.
当AI编写代码后又自行审查时,会将相同的假设带入两个环节,从而形成可预测的失败模式:
AI编写修复代码 → AI审查修复代码 → AI判定“代码无误” → 漏洞依然存在生产环境真实案例:
修复1:在API响应中添加notification_settings字段
→ 忘记在SELECT查询中添加该字段
→ AI审查时未发现问题(存在相同盲点)
修复2:在SELECT查询中添加该字段
→ TypeScript构建错误(生成的类型中无此列)
→ AI审查修复1时未发现SELECT查询的问题
修复3:改为SELECT *
→ 修复了生产环境路径,但遗漏了沙箱环境路径
→ AI再次审查时仍未发现(这是第4次出现此类问题)
修复4:首次运行测试就立即发现问题 测试通过:这类模式的结论:沙箱/生产环境路径不一致是AI引入的头号回归漏洞。
Sandbox-Mode API Testing
沙箱模式API测试
Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
大多数采用AI友好架构的项目都具备沙箱/模拟模式,这是实现快速、无数据库API测试的关键。
Setup (Vitest + Next.js App Router)
配置(Vitest + Next.js App Router)
typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";
export default defineConfig({
test: {
environment: "node",
globals: true,
include: ["__tests__/**/*.test.ts"],
setupFiles: ["__tests__/setup.ts"],
},
resolve: {
alias: {
"@": path.resolve(__dirname, "."),
},
},
});typescript
// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";
export default defineConfig({
test: {
environment: "node",
globals: true,
include: ["__tests__/**/*.test.ts"],
setupFiles: ["__tests__/setup.ts"],
},
resolve: {
alias: {
"@": path.resolve(__dirname, "."),
},
},
});typescript
// __tests__/setup.ts
// 强制启用沙箱模式——无需数据库
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";Test Helper for Next.js API Routes
Next.js API路由测试工具
typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";
export function createTestRequest(
url: string,
options?: {
method?: string;
body?: Record<string, unknown>;
headers?: Record<string, string>;
sandboxUserId?: string;
},
): NextRequest {
const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
const reqHeaders: Record<string, string> = { ...headers };
if (sandboxUserId) {
reqHeaders["x-sandbox-user-id"] = sandboxUserId;
}
const init: { method: string; headers: Record<string, string>; body?: string } = {
method,
headers: reqHeaders,
};
if (body) {
init.body = JSON.stringify(body);
reqHeaders["content-type"] = "application/json";
}
return new NextRequest(fullUrl, init);
}
export async function parseResponse(response: Response) {
const json = await response.json();
return { status: response.status, json };
}typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";
export function createTestRequest(
url: string,
options?: {
method?: string;
body?: Record<string, unknown>;
headers?: Record<string, string>;
sandboxUserId?: string;
},
): NextRequest {
const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
const reqHeaders: Record<string, string> = { ...headers };
if (sandboxUserId) {
reqHeaders["x-sandbox-user-id"] = sandboxUserId;
}
const init: { method: string; headers: Record<string, string>; body?: string } = {
method,
headers: reqHeaders,
};
if (body) {
init.body = JSON.stringify(body);
reqHeaders["content-type"] = "application/json";
}
return new NextRequest(fullUrl, init);
}
export async function parseResponse(response: Response) {
const json = await response.json();
return { status: response.status, json };
}Writing Regression Tests
编写回归测试
The key principle: write tests for bugs that were found, not for code that works.
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";
// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
"id",
"email",
"full_name",
"phone",
"role",
"created_at",
"avatar_url",
"notification_settings", // ← Added after bug found it missing
];
describe("GET /api/user/profile", () => {
it("returns all required fields", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { status, json } = await parseResponse(res);
expect(status).toBe(200);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});
// Regression test — this exact bug was introduced by AI 4 times
it("notification_settings is not undefined (BUG-R1 regression)", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { json } = await parseResponse(res);
expect("notification_settings" in json.data).toBe(true);
const ns = json.data.notification_settings;
expect(ns === null || typeof ns === "object").toBe(true);
});
});核心原则:针对已发现的漏洞编写测试,而非针对运行正常的代码。
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";
// 定义契约——响应中必须包含的字段
const REQUIRED_FIELDS = [
"id",
"email",
"full_name",
"phone",
"role",
"created_at",
"avatar_url",
"notification_settings", // ← 发现漏洞遗漏此字段后添加
];
describe("GET /api/user/profile", () => {
it("返回所有必填字段", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { status, json } = await parseResponse(res);
expect(status).toBe(200);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});
// 回归测试——该漏洞被AI引入了4次
it("notification_settings字段不为undefined(BUG-R1回归测试)", async () => {
const req = createTestRequest("/api/user/profile");
const res = await GET(req);
const { json } = await parseResponse(res);
expect("notification_settings" in json.data).toBe(true);
const ns = json.data.notification_settings;
expect(ns === null || typeof ns === "object").toBe(true);
});
});Testing Sandbox/Production Parity
测试沙箱/生产环境一致性
The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
typescript
// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
it("includes partner_name in sandbox mode", async () => {
const req = createTestRequest("/api/user/messages", {
sandboxUserId: "user-001",
});
const res = await GET(req);
const { json } = await parseResponse(res);
// This caught a bug where partner_name was added
// to production path but not sandbox path
if (json.data.length > 0) {
for (const conv of json.data) {
expect("partner_name" in conv).toBe(true);
}
}
});
});最常见的AI回归漏洞:修复了生产环境路径,但遗漏了沙箱环境路径(反之亦然)。
typescript
// 测试沙箱环境响应是否符合预期契约
describe("GET /api/user/messages(会话列表)", () => {
it("沙箱模式下包含partner_name字段", async () => {
const req = createTestRequest("/api/user/messages", {
sandboxUserId: "user-001",
});
const res = await GET(req);
const { json } = await parseResponse(res);
// 该测试发现了一个漏洞:partner_name字段被添加到生产环境路径,但未添加到沙箱环境路径
if (json.data.length > 0) {
for (const conv of json.data) {
expect("partner_name" in conv).toBe(true);
}
}
});
});Integrating Tests into Bug-Check Workflow
将测试集成到漏洞检查工作流
Custom Command Definition
自定义命令定义
markdown
<!-- .claude/commands/bug-check.md -->markdown
<!-- .claude/commands/bug-check.md -->Bug Check
漏洞检查
Step 1: Automated Tests (mandatory, cannot skip)
步骤1:自动化测试(强制要求,不可跳过)
Run these commands FIRST before any code review:
npm run test # Vitest test suite
npm run build # TypeScript type check + build- If tests fail → report as highest priority bug
- If build fails → report type errors as highest priority
- Only proceed to Step 2 if both pass
在进行任何代码审查前,先运行以下命令:
npm run test # Vitest测试套件
npm run build # TypeScript类型检查 + 构建- 若测试失败 → 将其列为最高优先级漏洞
- 若构建失败 → 将类型错误列为最高优先级
- 只有两者都通过,才能进入步骤2
Step 2: Code Review (AI review)
步骤2:代码审查(AI审查)
- Sandbox / production path consistency
- API response shape matches frontend expectations
- SELECT clause completeness
- Error handling with rollback
- Optimistic update race conditions
- 沙箱/生产环境路径一致性
- API响应格式是否符合前端预期
- SELECT子句完整性
- 带回滚的错误处理
- 乐观更新竞争条件
Step 3: For each bug fixed, propose a regression test
步骤3:针对每个修复的漏洞,编写回归测试
undefinedundefinedThe Workflow
工作流
User: "バグチェックして" (or "/bug-check")
│
├─ Step 1: npm run test
│ ├─ FAIL → Bug found mechanically (no AI judgment needed)
│ └─ PASS → Continue
│
├─ Step 2: npm run build
│ ├─ FAIL → Type error found mechanically
│ └─ PASS → Continue
│
├─ Step 3: AI code review (with known blind spots in mind)
│ └─ Findings reported
│
└─ Step 4: For each fix, write a regression test
└─ Next bug-check catches if fix breaks用户:"バグチェックして"(或"/bug-check")
│
├─ 步骤1:npm run test
│ ├─ 失败 → 自动发现漏洞(无需AI判断)
│ └─ 通过 → 继续
│
├─ 步骤2:npm run build
│ ├─ 失败 → 自动发现类型错误
│ └─ 通过 → 继续
│
├─ 步骤3:AI代码审查(留意已知盲点)
│ └─ 报告审查结果
│
└─ 步骤4:针对每个修复,编写回归测试
└─ 下次漏洞检查将发现修复是否引入新问题Common AI Regression Patterns
常见AI回归模式
Pattern 1: Sandbox/Production Path Mismatch
模式1:沙箱/生产环境路径不匹配
Frequency: Most common (observed in 3 out of 4 regressions)
typescript
// FAIL: AI adds field to production path only
if (isSandboxMode()) {
return { data: { id, email, name } }; // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };
// PASS: Both paths must return the same shape
if (isSandboxMode()) {
return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };Test to catch it:
typescript
it("sandbox and production return same fields", async () => {
// In test env, sandbox mode is forced ON
const res = await GET(createTestRequest("/api/user/profile"));
const { json } = await parseResponse(res);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});出现频率:最常见(4次回归中占3次)
typescript
// 错误示例:AI仅在生产环境路径中添加字段
if (isSandboxMode()) {
return { data: { id, email, name } }; // 遗漏新字段
}
// 生产环境路径
return { data: { id, email, name, notification_settings } };
// 正确示例:两条路径必须返回相同格式
if (isSandboxMode()) {
return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };测试方法:
typescript
it("沙箱和生产环境返回相同字段", async () => {
// 测试环境中强制启用沙箱模式
const res = await GET(createTestRequest("/api/user/profile"));
const { json } = await parseResponse(res);
for (const field of REQUIRED_FIELDS) {
expect(json.data).toHaveProperty(field);
}
});Pattern 2: SELECT Clause Omission
模式2:SELECT子句遗漏
Frequency: Common with Supabase/Prisma when adding new columns
typescript
// FAIL: New column added to response but not to SELECT
const { data } = await supabase
.from("users")
.select("id, email, name") // notification_settings not here
.single();
return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined
// PASS: Use SELECT * or explicitly include new columns
const { data } = await supabase
.from("users")
.select("*")
.single();出现频率:使用Supabase/Prisma添加新列时常见
typescript
// 错误示例:响应中添加了新列,但未在SELECT中包含
const { data } = await supabase
.from("users")
.select("id, email, name") // 未包含notification_settings
.single();
return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings始终为undefined
// 正确示例:使用SELECT *或显式包含新列
const { data } = await supabase
.from("users")
.select("*")
.single();Pattern 3: Error State Leakage
模式3:错误状态泄露
Frequency: Moderate — when adding error handling to existing components
typescript
// FAIL: Error state set but old data not cleared
catch (err) {
setError("Failed to load");
// reservations still shows data from previous tab!
}
// PASS: Clear related state on error
catch (err) {
setReservations([]); // Clear stale data
setError("Failed to load");
}出现频率:中等——为现有组件添加错误处理时
typescript
// 错误示例:设置了错误状态,但未清除旧数据
catch (err) {
setError("加载失败");
// reservations仍显示上一个标签页的数据!
}
// 正确示例:出错时清除相关状态
catch (err) {
setReservations([]); // 清除过期数据
setError("加载失败");
}Pattern 4: Optimistic Update Without Proper Rollback
模式4:乐观更新无回滚机制
typescript
// FAIL: No rollback on failure
const handleRemove = async (id: string) => {
setItems(prev => prev.filter(i => i.id !== id));
await fetch(`/api/items/${id}`, { method: "DELETE" });
// If API fails, item is gone from UI but still in DB
};
// PASS: Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
const prevItems = [...items];
setItems(prev => prev.filter(i => i.id !== id));
try {
const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
if (!res.ok) throw new Error("API error");
} catch {
setItems(prevItems); // Rollback
alert("削除に失敗しました");
}
};typescript
// 错误示例:失败时无回滚
const handleRemove = async (id: string) => {
setItems(prev => prev.filter(i => i.id !== id));
await fetch(`/api/items/${id}`, { method: "DELETE" });
// 如果API调用失败,UI中已移除该条目,但数据库中仍存在
};
// 正确示例:保存先前状态,失败时回滚
const handleRemove = async (id: string) => {
const prevItems = [...items];
setItems(prev => prev.filter(i => i.id !== id));
try {
const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
if (!res.ok) throw new Error("API错误");
} catch {
setItems(prevItems); // 回滚
alert("削除に失敗しました");
}
};Strategy: Test Where Bugs Were Found
策略:针对漏洞发现位置编写测试
Don't aim for 100% coverage. Instead:
Bug found in /api/user/profile → Write test for profile API
Bug found in /api/user/messages → Write test for messages API
Bug found in /api/user/favorites → Write test for favorites API
No bug in /api/user/notifications → Don't write test (yet)Why this works with AI development:
- AI tends to make the same category of mistake repeatedly
- Bugs cluster in complex areas (auth, multi-path logic, state management)
- Once tested, that exact regression cannot happen again
- Test count grows organically with bug fixes — no wasted effort
不要追求100%覆盖率,而是:
漏洞出现在/api/user/profile → 为个人资料API编写测试
漏洞出现在/api/user/messages → 为消息API编写测试
漏洞出现在/api/user/favorites → 为收藏API编写测试
/api/user/notifications无漏洞 → 暂时不编写测试为何该策略适用于AI开发:
- AI倾向于重复犯同一类错误
- 漏洞集中在复杂区域(认证、多路径逻辑、状态管理)
- 一旦编写测试,该类回归漏洞不会再次出现
- 测试数量随漏洞修复有机增长——无无效工作
Quick Reference
快速参考
| AI Regression Pattern | Test Strategy | Priority |
|---|---|---|
| Sandbox/production mismatch | Assert same response shape in sandbox mode | High |
| SELECT clause omission | Assert all required fields in response | High |
| Error state leakage | Assert state cleanup on error | Medium |
| Missing rollback | Assert state restored on API failure | Medium |
| Type cast masking null | Assert field is not undefined | Medium |
| AI回归模式 | 测试策略 | 优先级 |
|---|---|---|
| 沙箱/生产环境路径不匹配 | 断言沙箱模式下响应格式一致 | 高 |
| SELECT子句遗漏 | 断言响应包含所有必填字段 | 高 |
| 错误状态泄露 | 断言错误时状态已清除 | 中 |
| 缺失回滚机制 | 断言API失败时状态已恢复 | 中 |
| 类型转换掩盖null值 | 断言字段不为undefined | 中 |
DO / DON'T
注意事项
DO:
- Write tests immediately after finding a bug (before fixing it if possible)
- Test the API response shape, not the implementation
- Run tests as the first step of every bug-check
- Keep tests fast (< 1 second total with sandbox mode)
- Name tests after the bug they prevent (e.g., "BUG-R1 regression")
DON'T:
- Write tests for code that has never had a bug
- Trust AI self-review as a substitute for automated tests
- Skip sandbox path testing because "it's just mock data"
- Write integration tests when unit tests suffice
- Aim for coverage percentage — aim for regression prevention
建议做:
- 发现漏洞后立即编写测试(可能的话,在修复前编写)
- 测试API响应格式,而非实现细节
- 将测试作为每次漏洞检查的第一步
- 保持测试快速(沙箱模式下总耗时<1秒)
- 以测试防范的漏洞命名测试(例如:"BUG-R1回归测试")
不建议做:
- 为从未出现过漏洞的代码编写测试
- 用AI自审查替代自动化测试
- 因“只是模拟数据”而跳过沙箱路径测试
- 能用单元测试时却编写集成测试
- 追求覆盖率百分比——应专注于预防回归漏洞