ai-regression-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Regression Testing

AI回归测试

Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
专为AI辅助开发设计的测试模式——在AI既写代码又审核代码的场景下,会产生系统性的盲点,只有自动化测试才能发现这些问题。

When to Activate

适用场景

  • AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
  • A bug was found and fixed — need to prevent re-introduction
  • Project has a sandbox/mock mode that can be leveraged for DB-free testing
  • Running
    /bug-check
    or similar review commands after code changes
  • Multiple code paths exist (sandbox vs production, feature flags, etc.)
  • AI Agent(Claude Code、Cursor、Codex)修改了API路由或后端逻辑
  • 发现并修复了某个Bug,需要防止其再次出现
  • 项目具备沙箱/模拟模式,可用于无数据库测试
  • 代码变更后运行
    /bug-check
    或类似的审核命令
  • 存在多条代码路径(沙箱 vs 生产环境、功能标志等)

The Core Problem

核心问题

When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists
Real-world example (observed in production):
Fix 1: Added notification_settings to API response
  → Forgot to add it to the SELECT query
  → AI reviewed and missed it (same blind spot)

Fix 2: Added it to SELECT query
  → TypeScript build error (column not in generated types)
  → AI reviewed Fix 1 but didn't catch the SELECT issue

Fix 3: Changed to SELECT *
  → Fixed production path, forgot sandbox path
  → AI reviewed and missed it AGAIN (4th occurrence)

Fix 4: Test caught it instantly on first run ✅
The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.
当AI编写代码后又自行审核时,会在两个步骤中带入相同的假设,从而形成可预测的失败模式:
AI编写修复代码 → AI审核修复代码 → AI判定「代码无误」 → Bug依然存在
真实生产案例
修复1:在API响应中添加notification_settings字段
  → 忘记在SELECT查询中包含该字段
  → AI审核时未发现问题(存在相同盲点)

修复2:在SELECT查询中添加该字段
  → 出现TypeScript构建错误(生成的类型中无该列)
  → AI审核修复1时未发现SELECT查询的问题

修复3:改为SELECT *查询
  → 修复了生产环境路径,但遗漏了沙箱环境路径
  → AI审核时再次遗漏问题(第4次出现同类错误)

修复4:首次运行测试就立即发现了问题 ✅
这类模式的核心结论:沙箱/生产环境路径不一致是AI引入的最常见回归缺陷

Sandbox-Mode API Testing

沙箱模式API测试

Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
大多数具备AI友好架构的项目都有沙箱/模拟模式,这是实现快速、无数据库API测试的关键。

Setup (Vitest + Next.js App Router)

环境搭建(Vitest + Next.js App Router)

typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});
typescript
// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";
typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});
typescript
// __tests__/setup.ts
// 强制启用沙箱模式 — 无需数据库
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

Test Helper for Next.js API Routes

Next.js API路由测试工具

typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}
typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

Writing Regression Tests

编写回归测试

The key principle: write tests for bugs that were found, not for code that works.
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← Added after bug found it missing
];

describe("GET /api/user/profile", () => {
  it("returns all required fields", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // Regression test — this exact bug was introduced by AI 4 times
  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});
核心原则:针对已发现的Bug编写测试,而非针对正常工作的代码
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// 定义契约 — 响应中必须包含的字段
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← 发现Bug后新增的字段
];

describe("GET /api/user/profile", () => {
  it("返回所有必填字段", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // 回归测试 — 该Bug被AI引入了4次
  it("notification_settings字段不为undefined(BUG-R1回归测试)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

Testing Sandbox/Production Parity

测试沙箱/生产环境一致性

The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
typescript
// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
  it("includes partner_name in sandbox mode", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // This caught a bug where partner_name was added
    // to production path but not sandbox path
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});
AI最常引入的回归缺陷:修复了生产环境路径,但遗漏了沙箱环境路径(反之亦然)。
typescript
// 测试沙箱环境响应是否符合预期契约
describe("GET /api/user/messages(对话列表)", () => {
  it("沙箱模式下返回partner_name字段", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // 该测试发现了一个Bug:partner_name字段被添加到生产环境路径,但未添加到沙箱环境路径
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

Integrating Tests into Bug-Check Workflow

将测试集成到Bug检查工作流

Custom Command Definition

自定义命令定义

markdown
<!-- .claude/commands/bug-check.md -->
markdown
<!-- .claude/commands/bug-check.md -->

Bug Check

Bug检查

Step 1: Automated Tests (mandatory, cannot skip)

步骤1:自动化测试(必填,不可跳过)

Run these commands FIRST before any code review:
npm run test       # Vitest test suite
npm run build      # TypeScript type check + build
  • If tests fail → report as highest priority bug
  • If build fails → report type errors as highest priority
  • Only proceed to Step 2 if both pass
在进行任何代码审核前,先运行以下命令:
npm run test       # 执行Vitest测试套件
npm run build      # TypeScript类型检查 + 构建
  • 若测试失败 → 标记为最高优先级Bug
  • 若构建失败 → 标记类型错误为最高优先级
  • 仅当两者都通过时,才能进入步骤2

Step 2: Code Review (AI review)

步骤2:代码审核(AI审核)

  1. Sandbox / production path consistency
  2. API response shape matches frontend expectations
  3. SELECT clause completeness
  4. Error handling with rollback
  5. Optimistic update race conditions
  1. 沙箱/生产环境路径一致性检查
  2. API响应结构是否符合前端预期
  3. SELECT语句完整性检查
  4. 带回滚的错误处理检查
  5. 乐观更新的竞态条件检查

Step 3: For each bug fixed, propose a regression test

步骤3:针对每个修复的Bug,编写回归测试

undefined
undefined

The Workflow

工作流流程

User: "バグチェックして" (or "/bug-check")
  ├─ Step 1: npm run test
  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
  │   └─ PASS → Continue
  ├─ Step 2: npm run build
  │   ├─ FAIL → Type error found mechanically
  │   └─ PASS → Continue
  ├─ Step 3: AI code review (with known blind spots in mind)
  │   └─ Findings reported
  └─ Step 4: For each fix, write a regression test
      └─ Next bug-check catches if fix breaks
用户:「进行Bug检查」(或输入"/bug-check")
  ├─ 步骤1:执行npm run test
  │   ├─ 失败 → 自动发现Bug(无需AI判断)
  │   └─ 通过 → 继续
  ├─ 步骤2:执行npm run build
  │   ├─ 失败 → 自动发现类型错误
  │   └─ 通过 → 继续
  ├─ 步骤3:AI代码审核(重点关注已知盲点)
  │   └─ 输出审核结果
  └─ 步骤4:针对每个修复的Bug,编写回归测试
      └─ 下次Bug检查时可发现修复是否引入新问题

Common AI Regression Patterns

常见AI回归缺陷模式

Pattern 1: Sandbox/Production Path Mismatch

模式1:沙箱/生产环境路径不匹配

Frequency: Most common (observed in 3 out of 4 regressions)
typescript
// ❌ AI adds field to production path only
if (isSandboxMode()) {
  return { data: { id, email, name } };  // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };

// ✅ Both paths must return the same shape
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };
Test to catch it:
typescript
it("sandbox and production return same fields", async () => {
  // In test env, sandbox mode is forced ON
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});
出现频率:最常见(4次回归缺陷中占3次)
typescript
// ❌ AI仅在生产环境路径中添加字段
if (isSandboxMode()) {
  return { data: { id, email, name } };  // 缺少新增字段
}
// 生产环境路径
return { data: { id, email, name, notification_settings } };

// ✅ 两种路径必须返回相同结构
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };
测试方案
typescript
it("沙箱与生产环境返回相同字段", async () => {
  // 测试环境中强制启用沙箱模式
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

Pattern 2: SELECT Clause Omission

模式2:SELECT语句遗漏字段

Frequency: Common with Supabase/Prisma when adding new columns
typescript
// ❌ New column added to response but not to SELECT
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // notification_settings not here
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined

// ✅ Use SELECT * or explicitly include new columns
const { data } = await supabase
  .from("users")
  .select("*")
  .single();
出现频率:在使用Supabase/Prisma添加新列时常见
typescript
// ❌ 响应中添加了新字段,但未在SELECT语句中包含
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // 未包含notification_settings
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings始终为undefined

// ✅ 使用SELECT *或显式包含新列
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

Pattern 3: Error State Leakage

模式3:错误状态泄露

Frequency: Moderate — when adding error handling to existing components
typescript
// ❌ Error state set but old data not cleared
catch (err) {
  setError("Failed to load");
  // reservations still shows data from previous tab!
}

// ✅ Clear related state on error
catch (err) {
  setReservations([]);  // Clear stale data
  setError("Failed to load");
}
出现频率:中等 — 在现有组件中添加错误处理时容易出现
typescript
// ❌ 设置了错误状态,但未清除旧数据
catch (err) {
  setError("加载失败");
  // reservations仍显示上一个标签页的数据!
}

// ✅ 出错时清除相关状态
catch (err) {
  setReservations([]);  // 清除过期数据
  setError("加载失败");
}

Pattern 4: Optimistic Update Without Proper Rollback

模式4:乐观更新未正确回滚

typescript
// ❌ No rollback on failure
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // If API fails, item is gone from UI but still in DB
};

// ✅ Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API error");
  } catch {
    setItems(prevItems);  // Rollback
    alert("削除に失敗しました");
  }
};
typescript
// ❌ 失败时未回滚状态
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // 若API失败,UI中已移除该条目,但数据库中仍存在
};

// ✅ 保存之前的状态,失败时回滚
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API错误");
  } catch {
    setItems(prevItems);  // 回滚状态
    alert("删除失败");
  }
};

Strategy: Test Where Bugs Were Found

策略:针对Bug出现的位置编写测试

Don't aim for 100% coverage. Instead:
Bug found in /api/user/profile     → Write test for profile API
Bug found in /api/user/messages    → Write test for messages API
Bug found in /api/user/favorites   → Write test for favorites API
No bug in /api/user/notifications  → Don't write test (yet)
Why this works with AI development:
  1. AI tends to make the same category of mistake repeatedly
  2. Bugs cluster in complex areas (auth, multi-path logic, state management)
  3. Once tested, that exact regression cannot happen again
  4. Test count grows organically with bug fixes — no wasted effort
不要追求100%覆盖率,而是:
在/api/user/profile中发现Bug → 为个人资料API编写测试
在/api/user/messages中发现Bug → 为消息API编写测试
在/api/user/favorites中发现Bug → 为收藏API编写测试
在/api/user/notifications中未发现Bug → 暂不编写测试
为何该策略适用于AI开发
  1. AI倾向于重复犯同一类错误
  2. Bug集中出现在复杂区域(认证、多路径逻辑、状态管理)
  3. 一旦编写测试,就能彻底防止该类回归缺陷再次出现
  4. 测试数量随Bug修复自然增长 — 无无效工作量

Quick Reference

快速参考

AI Regression PatternTest StrategyPriority
Sandbox/production mismatchAssert same response shape in sandbox mode🔴 High
SELECT clause omissionAssert all required fields in response🔴 High
Error state leakageAssert state cleanup on error🟡 Medium
Missing rollbackAssert state restored on API failure🟡 Medium
Type cast masking nullAssert field is not undefined🟡 Medium
AI回归缺陷模式测试策略优先级
沙箱/生产环境路径不匹配断言沙箱模式下响应结构一致🔴 高
SELECT语句遗漏字段断言响应包含所有必填字段🔴 高
错误状态泄露断言出错时状态已清除🟡 中
缺失回滚逻辑断言API失败时状态已恢复🟡 中
类型转换掩盖null值断言字段不为undefined🟡 中

DO / DON'T

注意事项

DO:
  • Write tests immediately after finding a bug (before fixing it if possible)
  • Test the API response shape, not the implementation
  • Run tests as the first step of every bug-check
  • Keep tests fast (< 1 second total with sandbox mode)
  • Name tests after the bug they prevent (e.g., "BUG-R1 regression")
DON'T:
  • Write tests for code that has never had a bug
  • Trust AI self-review as a substitute for automated tests
  • Skip sandbox path testing because "it's just mock data"
  • Write integration tests when unit tests suffice
  • Aim for coverage percentage — aim for regression prevention
建议
  • 发现Bug后立即编写测试(若可能,在修复前编写)
  • 测试API响应结构,而非具体实现
  • 将测试作为每次Bug检查的第一步
  • 保持测试快速(沙箱模式下总耗时<1秒)
  • 以测试预防的Bug命名测试用例(如「BUG-R1回归测试」)
不建议
  • 为从未出现过Bug的代码编写测试
  • 用AI自审替代自动化测试
  • 因「只是模拟数据」而跳过沙箱路径测试
  • 能用单元测试时却编写集成测试
  • 追求覆盖率百分比 — 应追求回归缺陷预防