db-seed
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Seed Generator
数据库种子生成器
Generate seed scripts that populate databases with realistic, domain-appropriate sample data. Reads your schema and produces ready-to-run seed files.
生成可向数据库填充真实、符合业务场景样例数据的种子脚本。读取你的schema并生成可直接运行的种子文件。
Workflow
工作流程
1. Find the Schema
1. 查找Schema
Scan the project for schema definitions:
| Source | Location pattern |
|---|---|
| Drizzle schema | |
| D1 migrations | |
| Raw SQL | |
| Prisma | |
Read all schema files. Build a mental model of:
- Tables and their columns
- Data types and constraints (NOT NULL, UNIQUE, DEFAULT)
- Foreign key relationships (which tables reference which)
- JSON fields stored as TEXT (common in D1/SQLite)
扫描项目以查找schema定义:
| 来源 | 位置模式 |
|---|---|
| Drizzle schema | |
| D1迁移文件 | |
| 原生SQL | |
| Prisma | |
读取所有schema文件,构建如下模型:
- 表及其列信息
- 数据类型与约束(NOT NULL、UNIQUE、DEFAULT)
- 外键关联关系(哪些表引用了哪些表)
- 以TEXT格式存储的JSON字段(D1/SQLite中常见)
2. Determine Seed Parameters
2. 确定种子参数
Ask the user:
| Parameter | Options | Default |
|---|---|---|
| Purpose | dev, demo, testing | dev |
| Volume | small (5-10 rows/table), medium (20-50), large (100+) | small |
| Domain context | "e-commerce store", "SaaS app", "blog", etc. | Infer from schema |
| Output format | TypeScript (Drizzle), raw SQL, or both | Match project's ORM |
Purpose affects data quality:
- dev: Varied data, some edge cases (empty fields, long strings, unicode)
- demo: Polished data that looks good in screenshots and presentations
- testing: Systematic data covering boundary conditions, duplicates, special characters
询问用户以下参数:
| 参数 | 选项 | 默认值 |
|---|---|---|
| 用途 | dev(开发)、demo(演示)、testing(测试) | dev |
| 数据量 | small(每张表5-10行)、medium(20-50行)、large(100+行) | small |
| 业务场景 | "电商平台"、"SaaS应用"、"博客"等 | 从schema推断 |
| 输出格式 | TypeScript(Drizzle)、原生SQL或两者兼具 | 匹配项目使用的ORM |
用途会影响数据质量:
- 开发环境:多样化数据,包含部分边缘情况(空字段、长字符串、Unicode字符)
- 演示环境:经过打磨的数据,在截图和演示中呈现效果更佳
- 测试环境:覆盖边界条件、重复值、特殊字符的系统化数据
3. Plan Insert Order
3. 规划插入顺序
Build a dependency graph from foreign keys. Insert parent tables before children.
Example order for a blog schema:
1. users (no dependencies)
2. categories (no dependencies)
3. posts (depends on users, categories)
4. comments (depends on users, posts)
5. tags (no dependencies)
6. post_tags (depends on posts, tags)Circular dependencies: If table A references B and B references A, use nullable foreign keys and insert in two passes (insert with NULL, then UPDATE).
基于外键构建依赖关系图,先插入父表再插入子表。
博客schema的示例插入顺序:
1. users (无依赖)
2. categories (无依赖)
3. posts (依赖users、categories)
4. comments (依赖users、posts)
5. tags (无依赖)
6. post_tags (依赖posts、tags)循环依赖处理:如果表A引用表B且表B引用表A,可使用可空外键并分两次插入(先以NULL值插入,再执行UPDATE更新)。
4. Generate Realistic Data
4. 生成真实数据
Do NOT use generic placeholders like "test123", "foo@bar.com", or "Lorem ipsum". Generate data that matches the domain.
请勿使用通用占位符,如"test123"、"foo@bar.com"或"Lorem ipsum"。生成符合业务场景的数据。
Data Generation Patterns (no external libraries needed)
数据生成模式(无需外部库)
Names: Use a hardcoded list of common names. Mix genders and cultural backgrounds.
typescript
const firstNames = ['Sarah', 'James', 'Priya', 'Mohammed', 'Emma', 'Wei', 'Carlos', 'Aisha'];
const lastNames = ['Chen', 'Smith', 'Patel', 'Garcia', 'Kim', 'O\'Brien', 'Nguyen', 'Wilson'];Emails: Derive from names — . Use domain (RFC 2606 reserved).
sarah.chen@example.comexample.comDates: Generate within a realistic range. Use ISO 8601 format for D1/SQLite.
typescript
const randomDate = (daysBack: number) => {
const d = new Date();
d.setDate(d.getDate() - Math.floor(Math.random() * daysBack));
return d.toISOString();
};IDs: Use for UUIDs, or sequential integers if the schema uses auto-increment.
crypto.randomUUID()Deterministic seeding: For reproducible data, use a seeded PRNG:
typescript
function seededRandom(seed: number) {
return () => {
seed = (seed * 16807) % 2147483647;
return (seed - 1) / 2147483646;
};
}
const rand = seededRandom(42); // Same seed = same data every timePrices/amounts: Use realistic ranges. for $1-$10 range.
(rand() * 900 + 100).toFixed(2)Descriptions/content: Write 3-5 realistic variations per content type and cycle through them. Don't generate AI-sounding prose — write like real user data.
姓名:使用硬编码的常见姓名列表,混合不同性别和文化背景。
typescript
const firstNames = ['Sarah', 'James', 'Priya', 'Mohammed', 'Emma', 'Wei', 'Carlos', 'Aisha'];
const lastNames = ['Chen', 'Smith', 'Patel', 'Garcia', 'Kim', 'O\'Brien', 'Nguyen', 'Wilson'];邮箱:从姓名衍生,例如。使用RFC 2606保留的域名。
sarah.chen@example.comexample.com日期:在合理范围内生成,D1/SQLite使用ISO 8601格式。
typescript
const randomDate = (daysBack: number) => {
const d = new Date();
d.setDate(d.getDate() - Math.floor(Math.random() * daysBack));
return d.toISOString();
};ID:UUID使用,如果schema使用自增则使用连续整数。
crypto.randomUUID()确定性种子:如需可复现的数据,使用带种子的伪随机数生成器:
typescript
function seededRandom(seed: number) {
return () => {
seed = (seed * 16807) % 2147483647;
return (seed - 1) / 2147483646;
};
}
const rand = seededRandom(42); // 相同种子=每次生成相同数据价格/金额:使用合理范围,例如生成1-10美元区间的价格。
(rand() * 900 + 100).toFixed(2)描述/内容:为每种内容类型编写3-5种真实变体并循环使用。不要生成AI风格的散文,要模拟真实用户数据。
5. Output Format
5. 输出格式
TypeScript (Drizzle ORM)
TypeScript(Drizzle ORM)
typescript
// scripts/seed.ts
import { drizzle } from 'drizzle-orm/d1';
import * as schema from '../src/db/schema';
export async function seed(db: ReturnType<typeof drizzle>) {
console.log('Seeding database...');
// Clear existing data (reverse dependency order)
await db.delete(schema.comments);
await db.delete(schema.posts);
await db.delete(schema.users);
// Insert users
const users = [
{ id: crypto.randomUUID(), name: 'Sarah Chen', email: 'sarah@example.com', ... },
// ...
];
// D1 batch limit: 10 rows per INSERT
for (let i = 0; i < users.length; i += 10) {
await db.insert(schema.users).values(users.slice(i, i + 10));
}
// Insert posts (references users)
const posts = [
{ id: crypto.randomUUID(), userId: users[0].id, title: '...', ... },
// ...
];
for (let i = 0; i < posts.length; i += 10) {
await db.insert(schema.posts).values(posts.slice(i, i + 10));
}
console.log(`Seeded: ${users.length} users, ${posts.length} posts`);
}Run with:
npx tsx scripts/seed.tsFor Cloudflare Workers, add a seed endpoint (remove before production):
typescript
app.post('/api/seed', async (c) => {
const db = drizzle(c.env.DB);
await seed(db);
return c.json({ ok: true });
});typescript
// scripts/seed.ts
import { drizzle } from 'drizzle-orm/d1';
import * as schema from '../src/db/schema';
export async function seed(db: ReturnType<typeof drizzle>) {
console.log('Seeding database...');
// 清除现有数据(反向依赖顺序)
await db.delete(schema.comments);
await db.delete(schema.posts);
await db.delete(schema.users);
// 插入用户
const users = [
{ id: crypto.randomUUID(), name: 'Sarah Chen', email: 'sarah@example.com', ... },
// ...
];
// D1批量限制:每次INSERT最多10行
for (let i = 0; i < users.length; i += 10) {
await db.insert(schema.users).values(users.slice(i, i + 10));
}
// 插入文章(关联用户)
const posts = [
{ id: crypto.randomUUID(), userId: users[0].id, title: '...', ... },
// ...
];
for (let i = 0; i < posts.length; i += 10) {
await db.insert(schema.posts).values(posts.slice(i, i + 10));
}
console.log(`Seeded: ${users.length} users, ${posts.length} posts`);
}运行命令:
npx tsx scripts/seed.ts针对Cloudflare Workers,添加种子接口(上线前移除):
typescript
app.post('/api/seed', async (c) => {
const db = drizzle(c.env.DB);
await seed(db);
return c.json({ ok: true });
});Raw SQL (D1)
原生SQL(D1)
sql
-- seed.sql
-- Run: npx wrangler d1 execute DB_NAME --local --file=./scripts/seed.sql
-- Clear existing (reverse order)
DELETE FROM comments;
DELETE FROM posts;
DELETE FROM users;
-- Users
INSERT INTO users (id, name, email, created_at) VALUES
('uuid-1', 'Sarah Chen', 'sarah@example.com', '2025-01-15T10:30:00Z'),
('uuid-2', 'James Wilson', 'james@example.com', '2025-02-01T14:22:00Z');
-- Posts (max 10 rows per INSERT for D1)
INSERT INTO posts (id, user_id, title, body, created_at) VALUES
('post-1', 'uuid-1', 'Getting Started', 'Welcome to...', '2025-03-01T09:00:00Z');sql
-- seed.sql
-- 运行命令: npx wrangler d1 execute DB_NAME --local --file=./scripts/seed.sql
-- 清除现有数据(反向顺序)
DELETE FROM comments;
DELETE FROM posts;
DELETE FROM users;
-- 用户数据
INSERT INTO users (id, name, email, created_at) VALUES
('uuid-1', 'Sarah Chen', 'sarah@example.com', '2025-01-15T10:30:00Z'),
('uuid-2', 'James Wilson', 'james@example.com', '2025-02-01T14:22:00Z');
-- 文章数据(D1每次INSERT最多10行)
INSERT INTO posts (id, user_id, title, body, created_at) VALUES
('post-1', 'uuid-1', 'Getting Started', 'Welcome to...', '2025-03-01T09:00:00Z');6. Idempotency
6. 幂等性
Seed scripts must be safe to re-run:
typescript
// Option A: Delete-then-insert (simple, loses data)
await db.delete(schema.users);
await db.insert(schema.users).values(seedUsers);
// Option B: Upsert (preserves non-seed data)
for (const user of seedUsers) {
await db.insert(schema.users)
.values(user)
.onConflictDoUpdate({ target: schema.users.id, set: user });
}Default to Option A for dev/testing, Option B for demo (where users may have added their own data).
种子脚本必须可安全重复运行:
typescript
// 选项A:先删除再插入(简单,但会丢失现有数据)
await db.delete(schema.users);
await db.insert(schema.users).values(seedUsers);
// 选项B:Upsert(保留非种子数据)
for (const user of seedUsers) {
await db.insert(schema.users)
.values(user)
.onConflictDoUpdate({ target: schema.users.id, set: user });
}开发/测试环境默认使用选项A,演示环境默认使用选项B(用户可能已添加自有数据)。
D1-Specific Gotchas
D1专属注意事项
| Gotcha | Solution |
|---|---|
| Max ~10 rows per INSERT | Batch inserts in chunks of 10 |
| No native BOOLEAN | Use INTEGER (0/1) |
| No native DATETIME | Use TEXT with ISO 8601 strings |
| JSON stored as TEXT | |
| Foreign keys always enforced | Insert parent tables first |
| 100 bound parameter limit | Keep batch size × columns < 100 |
| 注意事项 | 解决方案 |
|---|---|
| 每次INSERT最多约10行 | 将插入操作按10行一批分批执行 |
| 无原生BOOLEAN类型 | 使用INTEGER(0/1)代替 |
| 无原生DATETIME类型 | 使用TEXT类型存储ISO 8601格式字符串 |
| JSON以TEXT格式存储 | 插入前执行 |
| 外键始终强制生效 | 先插入父表 |
| 绑定参数上限为100 | 确保批量大小 × 列数 < 100 |
Quality Rules
质量规则
- Match the domain — an e-commerce seed has products with real-sounding names and prices, not "Product 1"
- Vary the data — don't make every user "John Smith" or every price "$9.99"
- Include edge cases (for testing seeds) — empty strings, very long text, special characters, maximum values
- Reference real IDs — foreign keys must point to actually-inserted parent rows
- Print what was seeded — always log counts so the user knows it worked
- Document the run command — put it in a comment at the top of the file
- 匹配业务场景 — 电商种子数据应包含真实名称和价格的产品,而非"Product 1"
- 数据多样化 — 不要所有用户都叫"John Smith",所有价格都是"$9.99"
- 包含边缘情况(测试用种子) — 空字符串、超长文本、特殊字符、最大值
- 引用真实ID — 外键必须指向实际插入的父行ID
- 打印种子结果 — 始终记录数据量,让用户知晓执行成功
- 文档化运行命令 — 将命令放在文件顶部的注释中