ai-regression-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Regression Testing

AI回归测试

Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
专为AI辅助开发设计的测试模式——在这种开发模式中,同一模型既编写代码又审查代码,会产生只有自动化测试才能发现的系统性盲点。

When to Activate

适用场景

  • AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
  • A bug was found and fixed — need to prevent re-introduction
  • Project has a sandbox/mock mode that can be leveraged for DB-free testing
  • Running
    /bug-check
    or similar review commands after code changes
  • Multiple code paths exist (sandbox vs production, feature flags, etc.)
  • AI代理(Claude Code、Cursor、Codex)修改了API路由或后端逻辑
  • 发现并修复了漏洞——需要防止漏洞再次出现
  • 项目具备沙箱/模拟模式,可用于无数据库测试
  • 代码变更后运行
    /bug-check
    或类似审查命令
  • 存在多条代码路径(沙箱vs生产环境、功能标志等)

The Core Problem

核心问题

When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists
Real-world example (observed in production):
Fix 1: Added notification_settings to API response
  → Forgot to add it to the SELECT query
  → AI reviewed and missed it (same blind spot)

Fix 2: Added it to SELECT query
  → TypeScript build error (column not in generated types)
  → AI reviewed Fix 1 but didn't catch the SELECT issue

Fix 3: Changed to SELECT *
  → Fixed production path, forgot sandbox path
  → AI reviewed and missed it AGAIN (4th occurrence)

Fix 4: Test caught it instantly on first run PASS:
The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.
当AI编写代码后又自行审查时,会将相同的假设带入两个环节,从而形成可预测的失败模式:
AI编写修复代码 → AI审查修复代码 → AI判定“代码无误” → 漏洞依然存在
生产环境真实案例
修复1:在API响应中添加notification_settings字段
  → 忘记在SELECT查询中添加该字段
  → AI审查时未发现问题(存在相同盲点)

修复2:在SELECT查询中添加该字段
  → TypeScript构建错误(生成的类型中无此列)
  → AI审查修复1时未发现SELECT查询的问题

修复3:改为SELECT *
  → 修复了生产环境路径,但遗漏了沙箱环境路径
  → AI再次审查时仍未发现(这是第4次出现此类问题)

修复4:首次运行测试就立即发现问题 测试通过:
这类模式的结论:沙箱/生产环境路径不一致是AI引入的头号回归漏洞

Sandbox-Mode API Testing

沙箱模式API测试

Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
大多数采用AI友好架构的项目都具备沙箱/模拟模式,这是实现快速、无数据库API测试的关键。

Setup (Vitest + Next.js App Router)

配置(Vitest + Next.js App Router)

typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});
typescript
// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";
typescript
// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});
typescript
// __tests__/setup.ts
// 强制启用沙箱模式——无需数据库
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

Test Helper for Next.js API Routes

Next.js API路由测试工具

typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}
typescript
// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

Writing Regression Tests

编写回归测试

The key principle: write tests for bugs that were found, not for code that works.
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← Added after bug found it missing
];

describe("GET /api/user/profile", () => {
  it("returns all required fields", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // Regression test — this exact bug was introduced by AI 4 times
  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});
核心原则:针对已发现的漏洞编写测试,而非针对运行正常的代码
typescript
// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// 定义契约——响应中必须包含的字段
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← 发现漏洞遗漏此字段后添加
];

describe("GET /api/user/profile", () => {
  it("返回所有必填字段", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // 回归测试——该漏洞被AI引入了4次
  it("notification_settings字段不为undefined(BUG-R1回归测试)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

Testing Sandbox/Production Parity

测试沙箱/生产环境一致性

The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
typescript
// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
  it("includes partner_name in sandbox mode", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // This caught a bug where partner_name was added
    // to production path but not sandbox path
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});
最常见的AI回归漏洞:修复了生产环境路径,但遗漏了沙箱环境路径(反之亦然)。
typescript
// 测试沙箱环境响应是否符合预期契约
describe("GET /api/user/messages(会话列表)", () => {
  it("沙箱模式下包含partner_name字段", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // 该测试发现了一个漏洞:partner_name字段被添加到生产环境路径,但未添加到沙箱环境路径
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

Integrating Tests into Bug-Check Workflow

将测试集成到漏洞检查工作流

Custom Command Definition

自定义命令定义

markdown
<!-- .claude/commands/bug-check.md -->
markdown
<!-- .claude/commands/bug-check.md -->

Bug Check

漏洞检查

Step 1: Automated Tests (mandatory, cannot skip)

步骤1:自动化测试(强制要求,不可跳过)

Run these commands FIRST before any code review:
npm run test       # Vitest test suite
npm run build      # TypeScript type check + build
  • If tests fail → report as highest priority bug
  • If build fails → report type errors as highest priority
  • Only proceed to Step 2 if both pass
在进行任何代码审查前,先运行以下命令:
npm run test       # Vitest测试套件
npm run build      # TypeScript类型检查 + 构建
  • 若测试失败 → 将其列为最高优先级漏洞
  • 若构建失败 → 将类型错误列为最高优先级
  • 只有两者都通过,才能进入步骤2

Step 2: Code Review (AI review)

步骤2:代码审查(AI审查)

  1. Sandbox / production path consistency
  2. API response shape matches frontend expectations
  3. SELECT clause completeness
  4. Error handling with rollback
  5. Optimistic update race conditions
  1. 沙箱/生产环境路径一致性
  2. API响应格式是否符合前端预期
  3. SELECT子句完整性
  4. 带回滚的错误处理
  5. 乐观更新竞争条件

Step 3: For each bug fixed, propose a regression test

步骤3:针对每个修复的漏洞,编写回归测试

undefined
undefined

The Workflow

工作流

User: "バグチェックして" (or "/bug-check")
  ├─ Step 1: npm run test
  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
  │   └─ PASS → Continue
  ├─ Step 2: npm run build
  │   ├─ FAIL → Type error found mechanically
  │   └─ PASS → Continue
  ├─ Step 3: AI code review (with known blind spots in mind)
  │   └─ Findings reported
  └─ Step 4: For each fix, write a regression test
      └─ Next bug-check catches if fix breaks
用户:"バグチェックして"(或"/bug-check")
  ├─ 步骤1:npm run test
  │   ├─ 失败 → 自动发现漏洞(无需AI判断)
  │   └─ 通过 → 继续
  ├─ 步骤2:npm run build
  │   ├─ 失败 → 自动发现类型错误
  │   └─ 通过 → 继续
  ├─ 步骤3:AI代码审查(留意已知盲点)
  │   └─ 报告审查结果
  └─ 步骤4:针对每个修复,编写回归测试
      └─ 下次漏洞检查将发现修复是否引入新问题

Common AI Regression Patterns

常见AI回归模式

Pattern 1: Sandbox/Production Path Mismatch

模式1:沙箱/生产环境路径不匹配

Frequency: Most common (observed in 3 out of 4 regressions)
typescript
// FAIL: AI adds field to production path only
if (isSandboxMode()) {
  return { data: { id, email, name } };  // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };

// PASS: Both paths must return the same shape
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };
Test to catch it:
typescript
it("sandbox and production return same fields", async () => {
  // In test env, sandbox mode is forced ON
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});
出现频率:最常见(4次回归中占3次)
typescript
// 错误示例:AI仅在生产环境路径中添加字段
if (isSandboxMode()) {
  return { data: { id, email, name } };  // 遗漏新字段
}
// 生产环境路径
return { data: { id, email, name, notification_settings } };

// 正确示例:两条路径必须返回相同格式
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };
测试方法
typescript
it("沙箱和生产环境返回相同字段", async () => {
  // 测试环境中强制启用沙箱模式
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

Pattern 2: SELECT Clause Omission

模式2:SELECT子句遗漏

Frequency: Common with Supabase/Prisma when adding new columns
typescript
// FAIL: New column added to response but not to SELECT
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // notification_settings not here
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined

// PASS: Use SELECT * or explicitly include new columns
const { data } = await supabase
  .from("users")
  .select("*")
  .single();
出现频率:使用Supabase/Prisma添加新列时常见
typescript
// 错误示例:响应中添加了新列,但未在SELECT中包含
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // 未包含notification_settings
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings始终为undefined

// 正确示例:使用SELECT *或显式包含新列
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

Pattern 3: Error State Leakage

模式3:错误状态泄露

Frequency: Moderate — when adding error handling to existing components
typescript
// FAIL: Error state set but old data not cleared
catch (err) {
  setError("Failed to load");
  // reservations still shows data from previous tab!
}

// PASS: Clear related state on error
catch (err) {
  setReservations([]);  // Clear stale data
  setError("Failed to load");
}
出现频率:中等——为现有组件添加错误处理时
typescript
// 错误示例:设置了错误状态,但未清除旧数据
catch (err) {
  setError("加载失败");
  // reservations仍显示上一个标签页的数据!
}

// 正确示例:出错时清除相关状态
catch (err) {
  setReservations([]);  // 清除过期数据
  setError("加载失败");
}

Pattern 4: Optimistic Update Without Proper Rollback

模式4:乐观更新无回滚机制

typescript
// FAIL: No rollback on failure
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // If API fails, item is gone from UI but still in DB
};

// PASS: Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API error");
  } catch {
    setItems(prevItems);  // Rollback
    alert("削除に失敗しました");
  }
};
typescript
// 错误示例:失败时无回滚
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // 如果API调用失败,UI中已移除该条目,但数据库中仍存在
};

// 正确示例:保存先前状态,失败时回滚
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API错误");
  } catch {
    setItems(prevItems);  // 回滚
    alert("削除に失敗しました");
  }
};

Strategy: Test Where Bugs Were Found

策略:针对漏洞发现位置编写测试

Don't aim for 100% coverage. Instead:
Bug found in /api/user/profile     → Write test for profile API
Bug found in /api/user/messages    → Write test for messages API
Bug found in /api/user/favorites   → Write test for favorites API
No bug in /api/user/notifications  → Don't write test (yet)
Why this works with AI development:
  1. AI tends to make the same category of mistake repeatedly
  2. Bugs cluster in complex areas (auth, multi-path logic, state management)
  3. Once tested, that exact regression cannot happen again
  4. Test count grows organically with bug fixes — no wasted effort
不要追求100%覆盖率,而是:
漏洞出现在/api/user/profile → 为个人资料API编写测试
漏洞出现在/api/user/messages → 为消息API编写测试
漏洞出现在/api/user/favorites → 为收藏API编写测试
/api/user/notifications无漏洞 → 暂时不编写测试
为何该策略适用于AI开发:
  1. AI倾向于重复犯同一类错误
  2. 漏洞集中在复杂区域(认证、多路径逻辑、状态管理)
  3. 一旦编写测试,该类回归漏洞不会再次出现
  4. 测试数量随漏洞修复有机增长——无无效工作

Quick Reference

快速参考

AI Regression PatternTest StrategyPriority
Sandbox/production mismatchAssert same response shape in sandbox modeHigh
SELECT clause omissionAssert all required fields in responseHigh
Error state leakageAssert state cleanup on errorMedium
Missing rollbackAssert state restored on API failureMedium
Type cast masking nullAssert field is not undefinedMedium
AI回归模式测试策略优先级
沙箱/生产环境路径不匹配断言沙箱模式下响应格式一致
SELECT子句遗漏断言响应包含所有必填字段
错误状态泄露断言错误时状态已清除
缺失回滚机制断言API失败时状态已恢复
类型转换掩盖null值断言字段不为undefined

DO / DON'T

注意事项

DO:
  • Write tests immediately after finding a bug (before fixing it if possible)
  • Test the API response shape, not the implementation
  • Run tests as the first step of every bug-check
  • Keep tests fast (< 1 second total with sandbox mode)
  • Name tests after the bug they prevent (e.g., "BUG-R1 regression")
DON'T:
  • Write tests for code that has never had a bug
  • Trust AI self-review as a substitute for automated tests
  • Skip sandbox path testing because "it's just mock data"
  • Write integration tests when unit tests suffice
  • Aim for coverage percentage — aim for regression prevention
建议做:
  • 发现漏洞后立即编写测试(可能的话,在修复前编写)
  • 测试API响应格式,而非实现细节
  • 将测试作为每次漏洞检查的第一步
  • 保持测试快速(沙箱模式下总耗时<1秒)
  • 以测试防范的漏洞命名测试(例如:"BUG-R1回归测试")
不建议做:
  • 为从未出现过漏洞的代码编写测试
  • 用AI自审查替代自动化测试
  • 因“只是模拟数据”而跳过沙箱路径测试
  • 能用单元测试时却编写集成测试
  • 追求覆盖率百分比——应专注于预防回归漏洞