db-seed

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Seed Generator

数据库种子生成器

Generate seed scripts that populate databases with realistic, domain-appropriate sample data. Reads your schema and produces ready-to-run seed files.
生成可向数据库填充真实、符合业务场景样例数据的种子脚本。读取你的schema并生成可直接运行的种子文件。

Workflow

工作流程

1. Find the Schema

1. 查找Schema

Scan the project for schema definitions:
SourceLocation pattern
Drizzle schema
src/db/schema.ts
,
src/schema/*.ts
,
db/schema.ts
D1 migrations
drizzle/*.sql
,
migrations/*.sql
Raw SQL
schema.sql
,
db/*.sql
Prisma
prisma/schema.prisma
Read all schema files. Build a mental model of:
  • Tables and their columns
  • Data types and constraints (NOT NULL, UNIQUE, DEFAULT)
  • Foreign key relationships (which tables reference which)
  • JSON fields stored as TEXT (common in D1/SQLite)
扫描项目以查找schema定义:
来源位置模式
Drizzle schema
src/db/schema.ts
,
src/schema/*.ts
,
db/schema.ts
D1迁移文件
drizzle/*.sql
,
migrations/*.sql
原生SQL
schema.sql
,
db/*.sql
Prisma
prisma/schema.prisma
读取所有schema文件,构建如下模型:
  • 表及其列信息
  • 数据类型与约束(NOT NULL、UNIQUE、DEFAULT)
  • 外键关联关系(哪些表引用了哪些表)
  • 以TEXT格式存储的JSON字段(D1/SQLite中常见)

2. Determine Seed Parameters

2. 确定种子参数

Ask the user:
ParameterOptionsDefault
Purposedev, demo, testingdev
Volumesmall (5-10 rows/table), medium (20-50), large (100+)small
Domain context"e-commerce store", "SaaS app", "blog", etc.Infer from schema
Output formatTypeScript (Drizzle), raw SQL, or bothMatch project's ORM
Purpose affects data quality:
  • dev: Varied data, some edge cases (empty fields, long strings, unicode)
  • demo: Polished data that looks good in screenshots and presentations
  • testing: Systematic data covering boundary conditions, duplicates, special characters
询问用户以下参数:
参数选项默认值
用途dev(开发)、demo(演示)、testing(测试)dev
数据量small(每张表5-10行)、medium(20-50行)、large(100+行)small
业务场景"电商平台"、"SaaS应用"、"博客"等从schema推断
输出格式TypeScript(Drizzle)、原生SQL或两者兼具匹配项目使用的ORM
用途会影响数据质量
  • 开发环境:多样化数据,包含部分边缘情况(空字段、长字符串、Unicode字符)
  • 演示环境:经过打磨的数据,在截图和演示中呈现效果更佳
  • 测试环境:覆盖边界条件、重复值、特殊字符的系统化数据

3. Plan Insert Order

3. 规划插入顺序

Build a dependency graph from foreign keys. Insert parent tables before children.
Example order for a blog schema:
1. users        (no dependencies)
2. categories   (no dependencies)
3. posts        (depends on users, categories)
4. comments     (depends on users, posts)
5. tags         (no dependencies)
6. post_tags    (depends on posts, tags)
Circular dependencies: If table A references B and B references A, use nullable foreign keys and insert in two passes (insert with NULL, then UPDATE).
基于外键构建依赖关系图,先插入父表再插入子表。
博客schema的示例插入顺序:
1. users        (无依赖)
2. categories   (无依赖)
3. posts        (依赖users、categories)
4. comments     (依赖users、posts)
5. tags         (无依赖)
6. post_tags    (依赖posts、tags)
循环依赖处理:如果表A引用表B且表B引用表A,可使用可空外键并分两次插入(先以NULL值插入,再执行UPDATE更新)。

4. Generate Realistic Data

4. 生成真实数据

Do NOT use generic placeholders like "test123", "foo@bar.com", or "Lorem ipsum". Generate data that matches the domain.
请勿使用通用占位符,如"test123"、"foo@bar.com"或"Lorem ipsum"。生成符合业务场景的数据。

Data Generation Patterns (no external libraries needed)

数据生成模式(无需外部库)

Names: Use a hardcoded list of common names. Mix genders and cultural backgrounds.
typescript
const firstNames = ['Sarah', 'James', 'Priya', 'Mohammed', 'Emma', 'Wei', 'Carlos', 'Aisha'];
const lastNames = ['Chen', 'Smith', 'Patel', 'Garcia', 'Kim', 'O\'Brien', 'Nguyen', 'Wilson'];
Emails: Derive from names —
sarah.chen@example.com
. Use
example.com
domain (RFC 2606 reserved).
Dates: Generate within a realistic range. Use ISO 8601 format for D1/SQLite.
typescript
const randomDate = (daysBack: number) => {
  const d = new Date();
  d.setDate(d.getDate() - Math.floor(Math.random() * daysBack));
  return d.toISOString();
};
IDs: Use
crypto.randomUUID()
for UUIDs, or sequential integers if the schema uses auto-increment.
Deterministic seeding: For reproducible data, use a seeded PRNG:
typescript
function seededRandom(seed: number) {
  return () => {
    seed = (seed * 16807) % 2147483647;
    return (seed - 1) / 2147483646;
  };
}
const rand = seededRandom(42); // Same seed = same data every time
Prices/amounts: Use realistic ranges.
(rand() * 900 + 100).toFixed(2)
for $1-$10 range.
Descriptions/content: Write 3-5 realistic variations per content type and cycle through them. Don't generate AI-sounding prose — write like real user data.
姓名:使用硬编码的常见姓名列表,混合不同性别和文化背景。
typescript
const firstNames = ['Sarah', 'James', 'Priya', 'Mohammed', 'Emma', 'Wei', 'Carlos', 'Aisha'];
const lastNames = ['Chen', 'Smith', 'Patel', 'Garcia', 'Kim', 'O\'Brien', 'Nguyen', 'Wilson'];
邮箱:从姓名衍生,例如
sarah.chen@example.com
。使用RFC 2606保留的
example.com
域名。
日期:在合理范围内生成,D1/SQLite使用ISO 8601格式。
typescript
const randomDate = (daysBack: number) => {
  const d = new Date();
  d.setDate(d.getDate() - Math.floor(Math.random() * daysBack));
  return d.toISOString();
};
ID:UUID使用
crypto.randomUUID()
,如果schema使用自增则使用连续整数。
确定性种子:如需可复现的数据,使用带种子的伪随机数生成器:
typescript
function seededRandom(seed: number) {
  return () => {
    seed = (seed * 16807) % 2147483647;
    return (seed - 1) / 2147483646;
  };
}
const rand = seededRandom(42); // 相同种子=每次生成相同数据
价格/金额:使用合理范围,例如
(rand() * 900 + 100).toFixed(2)
生成1-10美元区间的价格。
描述/内容:为每种内容类型编写3-5种真实变体并循环使用。不要生成AI风格的散文,要模拟真实用户数据。

5. Output Format

5. 输出格式

TypeScript (Drizzle ORM)

TypeScript(Drizzle ORM)

typescript
// scripts/seed.ts
import { drizzle } from 'drizzle-orm/d1';
import * as schema from '../src/db/schema';

export async function seed(db: ReturnType<typeof drizzle>) {
  console.log('Seeding database...');

  // Clear existing data (reverse dependency order)
  await db.delete(schema.comments);
  await db.delete(schema.posts);
  await db.delete(schema.users);

  // Insert users
  const users = [
    { id: crypto.randomUUID(), name: 'Sarah Chen', email: 'sarah@example.com', ... },
    // ...
  ];

  // D1 batch limit: 10 rows per INSERT
  for (let i = 0; i < users.length; i += 10) {
    await db.insert(schema.users).values(users.slice(i, i + 10));
  }

  // Insert posts (references users)
  const posts = [
    { id: crypto.randomUUID(), userId: users[0].id, title: '...', ... },
    // ...
  ];

  for (let i = 0; i < posts.length; i += 10) {
    await db.insert(schema.posts).values(posts.slice(i, i + 10));
  }

  console.log(`Seeded: ${users.length} users, ${posts.length} posts`);
}
Run with:
npx tsx scripts/seed.ts
For Cloudflare Workers, add a seed endpoint (remove before production):
typescript
app.post('/api/seed', async (c) => {
  const db = drizzle(c.env.DB);
  await seed(db);
  return c.json({ ok: true });
});
typescript
// scripts/seed.ts
import { drizzle } from 'drizzle-orm/d1';
import * as schema from '../src/db/schema';

export async function seed(db: ReturnType<typeof drizzle>) {
  console.log('Seeding database...');

  // 清除现有数据(反向依赖顺序)
  await db.delete(schema.comments);
  await db.delete(schema.posts);
  await db.delete(schema.users);

  // 插入用户
  const users = [
    { id: crypto.randomUUID(), name: 'Sarah Chen', email: 'sarah@example.com', ... },
    // ...
  ];

  // D1批量限制:每次INSERT最多10行
  for (let i = 0; i < users.length; i += 10) {
    await db.insert(schema.users).values(users.slice(i, i + 10));
  }

  // 插入文章(关联用户)
  const posts = [
    { id: crypto.randomUUID(), userId: users[0].id, title: '...', ... },
    // ...
  ];

  for (let i = 0; i < posts.length; i += 10) {
    await db.insert(schema.posts).values(posts.slice(i, i + 10));
  }

  console.log(`Seeded: ${users.length} users, ${posts.length} posts`);
}
运行命令:
npx tsx scripts/seed.ts
针对Cloudflare Workers,添加种子接口(上线前移除):
typescript
app.post('/api/seed', async (c) => {
  const db = drizzle(c.env.DB);
  await seed(db);
  return c.json({ ok: true });
});

Raw SQL (D1)

原生SQL(D1)

sql
-- seed.sql
-- Run: npx wrangler d1 execute DB_NAME --local --file=./scripts/seed.sql

-- Clear existing (reverse order)
DELETE FROM comments;
DELETE FROM posts;
DELETE FROM users;

-- Users
INSERT INTO users (id, name, email, created_at) VALUES
  ('uuid-1', 'Sarah Chen', 'sarah@example.com', '2025-01-15T10:30:00Z'),
  ('uuid-2', 'James Wilson', 'james@example.com', '2025-02-01T14:22:00Z');

-- Posts (max 10 rows per INSERT for D1)
INSERT INTO posts (id, user_id, title, body, created_at) VALUES
  ('post-1', 'uuid-1', 'Getting Started', 'Welcome to...', '2025-03-01T09:00:00Z');
sql
-- seed.sql
-- 运行命令: npx wrangler d1 execute DB_NAME --local --file=./scripts/seed.sql

-- 清除现有数据(反向顺序)
DELETE FROM comments;
DELETE FROM posts;
DELETE FROM users;

-- 用户数据
INSERT INTO users (id, name, email, created_at) VALUES
  ('uuid-1', 'Sarah Chen', 'sarah@example.com', '2025-01-15T10:30:00Z'),
  ('uuid-2', 'James Wilson', 'james@example.com', '2025-02-01T14:22:00Z');

-- 文章数据(D1每次INSERT最多10行)
INSERT INTO posts (id, user_id, title, body, created_at) VALUES
  ('post-1', 'uuid-1', 'Getting Started', 'Welcome to...', '2025-03-01T09:00:00Z');

6. Idempotency

6. 幂等性

Seed scripts must be safe to re-run:
typescript
// Option A: Delete-then-insert (simple, loses data)
await db.delete(schema.users);
await db.insert(schema.users).values(seedUsers);

// Option B: Upsert (preserves non-seed data)
for (const user of seedUsers) {
  await db.insert(schema.users)
    .values(user)
    .onConflictDoUpdate({ target: schema.users.id, set: user });
}
Default to Option A for dev/testing, Option B for demo (where users may have added their own data).
种子脚本必须可安全重复运行:
typescript
// 选项A:先删除再插入(简单,但会丢失现有数据)
await db.delete(schema.users);
await db.insert(schema.users).values(seedUsers);

// 选项B:Upsert(保留非种子数据)
for (const user of seedUsers) {
  await db.insert(schema.users)
    .values(user)
    .onConflictDoUpdate({ target: schema.users.id, set: user });
}
开发/测试环境默认使用选项A,演示环境默认使用选项B(用户可能已添加自有数据)。

D1-Specific Gotchas

D1专属注意事项

GotchaSolution
Max ~10 rows per INSERTBatch inserts in chunks of 10
No native BOOLEANUse INTEGER (0/1)
No native DATETIMEUse TEXT with ISO 8601 strings
JSON stored as TEXT
JSON.stringify()
before insert
Foreign keys always enforcedInsert parent tables first
100 bound parameter limitKeep batch size × columns < 100
注意事项解决方案
每次INSERT最多约10行将插入操作按10行一批分批执行
无原生BOOLEAN类型使用INTEGER(0/1)代替
无原生DATETIME类型使用TEXT类型存储ISO 8601格式字符串
JSON以TEXT格式存储插入前执行
JSON.stringify()
外键始终强制生效先插入父表
绑定参数上限为100确保批量大小 × 列数 < 100

Quality Rules

质量规则

  1. Match the domain — an e-commerce seed has products with real-sounding names and prices, not "Product 1"
  2. Vary the data — don't make every user "John Smith" or every price "$9.99"
  3. Include edge cases (for testing seeds) — empty strings, very long text, special characters, maximum values
  4. Reference real IDs — foreign keys must point to actually-inserted parent rows
  5. Print what was seeded — always log counts so the user knows it worked
  6. Document the run command — put it in a comment at the top of the file
  1. 匹配业务场景 — 电商种子数据应包含真实名称和价格的产品,而非"Product 1"
  2. 数据多样化 — 不要所有用户都叫"John Smith",所有价格都是"$9.99"
  3. 包含边缘情况(测试用种子) — 空字符串、超长文本、特殊字符、最大值
  4. 引用真实ID — 外键必须指向实际插入的父行ID
  5. 打印种子结果 — 始终记录数据量,让用户知晓执行成功
  6. 文档化运行命令 — 将命令放在文件顶部的注释中