system-design-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSystem Design Generator
系统设计生成器
Create comprehensive system architecture plans from requirements.
根据需求生成全面的系统架构方案。
System Design Document Template
系统设计文档模板
markdown
undefinedmarkdown
undefinedSystem Design: [Feature/Product Name]
系统设计:[功能/产品名称]
Overview
概述
Brief description of what we're building and why.
简要说明我们要构建的内容及其目的。
Requirements
需求
Functional
功能性需求
- User can upload videos (max 1GB)
- System processes video within 5 minutes
- User receives notification when complete
- 用户可上传视频(最大1GB)
- 系统需在5分钟内完成视频处理
- 处理完成后向用户发送通知
Non-Functional
非功能性需求
- Handle 1000 uploads/day
- 99.9% uptime
- Process videos in <5 minutes (p95)
- Cost: <$0.50 per video
- 每日可处理1000次上传
- 99.9%的可用性
- 视频处理耗时需小于5分钟(p95分位)
- 单视频处理成本低于$0.50
High-Level Architecture
高层架构
┌─────────┐ ┌──────────┐ ┌─────────────┐
│ Client │─────▶│ API │─────▶│ Upload │
│ │ │ Gateway │ │ Service │
└─────────┘ └──────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Storage │
│ (S3) │
└─────────────┘
│
▼
┌─────────────┐
│ Processing │◀─┐
│ Queue │ │
└─────────────┘ │
│ │
▼ │
┌─────────────┐ │
│ Processor │─┘
│ Workers │
└─────────────┘
│
▼
┌─────────────┐
│Notification │
│ Service │
└─────────────┘
┌─────────┐ ┌──────────┐ ┌─────────────┐
│ 客户端 │─────▶│ API网关 │─────▶│ 上传服务 │
│ │ │ │ │
└─────────┘ └──────────┘ └─────────────┘
│
▼
┌─────────────┐
│ 存储服务 │
│ (S3) │
└─────────────┘
│
▼
┌─────────────┐
│ 处理队列 │◀─┐
│ │ │
└─────────────┘ │
│ │
▼ │
┌─────────────┐ │
│ 处理工作节点 │─┘
│ │
└─────────────┘
│
▼
┌─────────────┐
│通知服务 │
│ │
└─────────────┘
Components
组件
1. API Gateway
1. API网关
Responsibilities:
- Authentication
- Rate limiting
- Request routing
Technology: Kong/AWS API Gateway
Scaling: Auto-scale based on requests/sec
职责:
- 身份认证
- 请求限流
- 请求路由
技术选型: Kong/AWS API Gateway
扩容方式: 根据每秒请求数自动扩容
2. Upload Service
2. 上传服务
Responsibilities:
- Generate pre-signed S3 URLs
- Validate file metadata
- Enqueue processing jobs
API:
POST /uploads
Request: { filename, size, content_type }
Response: { upload_url, upload_id }
Technology: Node.js + Express
Scaling: Horizontal (stateless)
职责:
- 生成预签名S3 URL
- 验证文件元数据
- 将处理任务加入队列
API:
POST /uploads
请求体: { filename, size, content_type }
响应体: { upload_url, upload_id }
技术选型: Node.js + Express
扩容方式: 水平扩容(无状态)
3. Storage (S3)
3. 存储服务(S3)
Responsibilities:
- Store raw videos
- Store processed outputs
- Serve content via CDN
Structure:
/uploads/{user_id}/{upload_id}/original.mp4
/processed/{user_id}/{upload_id}/output.mp4
职责:
- 存储原始视频
- 存储处理后的输出文件
- 通过CDN提供内容分发
存储结构:
/uploads/{user_id}/{upload_id}/original.mp4
/processed/{user_id}/{upload_id}/output.mp4
4. Processing Queue
4. 处理队列
Responsibilities:
- Buffer processing jobs
- Ensure at-least-once delivery
- DLQ for failed jobs
Technology: AWS SQS
Configuration:
- Visibility timeout: 15 minutes
- DLQ after 3 retries
职责:
- 缓冲处理任务
- 确保任务至少被投递一次
- 为失败任务设置死信队列(DLQ)
技术选型: AWS SQS
配置:
- 可见性超时:15分钟
- 重试3次后进入死信队列
5. Processor Workers
5. 处理工作节点
Responsibilities:
- Transcode videos
- Generate thumbnails
- Update database
Technology: Python + FFmpeg
Scaling: Auto-scale on queue depth
职责:
- 视频转码
- 生成缩略图
- 更新数据库状态
技术选型: Python + FFmpeg
扩容方式: 根据队列深度自动扩容
Data Flow
数据流
Upload Flow
上传流程
- Client requests upload URL from Upload Service
- Upload Service generates pre-signed S3 URL
- Client uploads directly to S3
- Client notifies Upload Service of completion
- Upload Service enqueues processing job
- Returns upload_id to client
- 客户端向上传服务请求上传URL
- 上传服务生成预签名S3 URL
- 客户端直接向S3上传文件
- 客户端通知上传服务上传完成
- 上传服务将处理任务加入队列
- 向客户端返回upload_id
Processing Flow
处理流程
- Worker polls queue for jobs
- Downloads video from S3
- Processes video (transcode, thumbnail)
- Uploads results to S3
- Updates database status
- Sends notification
- Deletes message from queue
- 工作节点轮询队列获取任务
- 从S3下载视频
- 处理视频(转码、生成缩略图)
- 将处理结果上传至S3
- 更新数据库中的任务状态
- 发送通知
- 从队列中删除任务消息
Data Model
数据模型
typescript
interface Upload {
id: string;
user_id: string;
filename: string;
size: number;
status: 'pending' | 'processing' | 'complete' | 'failed';
original_url: string;
processed_url?: string;
created_at: Date;
processed_at?: Date;
}
interface ProcessingJob {
upload_id: string;
attempts: number;
error?: string;
}typescript
interface Upload {
id: string;
user_id: string;
filename: string;
size: number;
status: 'pending' | 'processing' | 'complete' | 'failed';
original_url: string;
processed_url?: string;
created_at: Date;
processed_at?: Date;
}
interface ProcessingJob {
upload_id: string;
attempts: number;
error?: string;
}API Contract
API契约
Upload Endpoints
上传相关接口
POST /uploads - Request upload URL
GET /uploads/:id - Get upload status
DELETE /uploads/:id - Cancel upload
GET /uploads - List user uploadsPOST /uploads - 请求上传URL
GET /uploads/:id - 获取上传状态
DELETE /uploads/:id - 取消上传
GET /uploads - 列出用户所有上传记录Webhooks
Webhook
POST {webhook_url}
{
"event": "upload.completed",
"upload_id": "...",
"status": "complete",
"processed_url": "..."
}POST {webhook_url}
{
"event": "upload.completed",
"upload_id": "...",
"status": "complete",
"processed_url": "..."
}Scaling Considerations
扩容考量
Current Capacity
当前容量
- 1000 uploads/day = ~1 per minute
- Single worker can process 1 video every 5 minutes
- Need 5 workers for current load
- 每日1000次上传 = 约每分钟1次
- 单个工作节点每5分钟可处理1个视频
- 当前负载下需要5个工作节点
10x Scale (10,000/day)
10倍扩容(每日10000次)
- ~10 uploads per minute
- Need 50 workers
- Use spot instances for cost savings
- Add Redis cache for status checks
- 约每分钟10次上传
- 需要50个工作节点
- 使用竞价实例降低成本
- 新增Redis缓存用于状态查询
100x Scale (100,000/day)
100倍扩容(每日100000次)
- ~100 uploads per minute
- Partition by region
- Use Kafka instead of SQS
- Database sharding by user_id
- 约每分钟100次上传
- 按区域进行分区
- 用Kafka替代SQS
- 按user_id对数据库进行分片
Failure Modes
故障场景
S3 Unavailable
S3不可用
- Impact: Uploads fail
- Mitigation: Multi-region S3 replication
- 影响:上传失败
- 缓解方案:多区域S3复制
Queue Backed Up
队列任务堆积
- Impact: Processing delays
- Mitigation: Auto-scale workers faster
- 影响:处理延迟
- 缓解方案:加快工作节点的自动扩容速度
Worker Crash During Processing
工作节点处理时崩溃
- Impact: Job retried
- Mitigation: Idempotent processing
- 影响:任务会被重试
- 缓解方案:实现幂等性处理
Cost Estimate
成本估算
Monthly (1000 uploads/day):
- S3 Storage: $50
- S3 Transfer: $100
- SQS: $10
- Workers (EC2): $300
- Database: $100 Total: ~$560/month
月度成本(每日1000次上传):
- S3存储:$50
- S3流量费用:$100
- SQS费用:$10
- 工作节点(EC2):$300
- 数据库:$100 总计:约$560/月
Security
安全考量
- Pre-signed URLs expire in 1 hour
- Videos in private S3 buckets
- CloudFront signed URLs for delivery
- Rate limiting per user
- 预签名URL有效期为1小时
- 视频存储在私有S3桶中
- 使用CloudFront签名URL进行内容分发
- 按用户维度设置请求限流
Monitoring
监控方案
Metrics:
- Upload success rate
- Processing time (p50, p95, p99)
- Queue depth
- Worker CPU/memory
- Error rate by type
Alerts:
- Queue depth >1000
- Processing time p95 >10 minutes
- Error rate >5%
核心指标:
- 上传成功率
- 处理耗时(p50、p95、p99分位)
- 队列深度
- 工作节点CPU/内存使用率
- 按类型统计的错误率
告警规则:
- 队列深度超过1000
- 处理耗时p95分位超过10分钟
- 错误率超过5%
Open Questions
待确认问题
- Video retention policy? (30 days? 1 year?)
- Maximum video duration? (affects processing time)
- Regional data residency requirements?
undefined- 视频保留策略?(30天?1年?)
- 最大视频时长?(会影响处理耗时)
- 区域数据驻留要求?
undefinedComponent Template
组件模板
markdown
undefinedmarkdown
undefinedComponent Name
组件名称
Responsibilities:
- Primary responsibility
- Secondary responsibility
Technology Stack:
- Language: [Python/Node/Go]
- Framework: [Express/FastAPI/Gin]
- Database: [PostgreSQL/MongoDB]
API/Interface:
typescript
interface ComponentAPI {
method(params): ReturnType;
}Scaling Strategy:
- Horizontal: Stateless, load balanced
- Vertical: Cache layer, connection pooling
Dependencies:
- Service A (for X)
- Database B (for persistence)
Failure Handling:
- Retry with exponential backoff
- Circuit breaker for downstream services
- Fallback to cached data
undefined职责:
- 核心职责
- 次要职责
技术栈:
- 编程语言:[Python/Node/Go]
- 框架:[Express/FastAPI/Gin]
- 数据库:[PostgreSQL/MongoDB]
API/接口定义:
typescript
interface ComponentAPI {
method(params): ReturnType;
}扩容策略:
- 水平扩容:无状态、负载均衡
- 垂直优化:缓存层、连接池
依赖项:
- 服务A(用于实现X功能)
- 数据库B(用于持久化存储)
故障处理:
- 指数退避重试
- 下游服务熔断机制
- 降级至缓存数据
undefinedBest Practices
最佳实践
- Start with requirements: Functional + non-functional
- Draw diagrams first: Visual clarity
- Define boundaries: What's in scope vs out
- Document tradeoffs: Every choice has costs
- Plan for failure: What breaks and how to handle
- Consider scale: Current, 10x, 100x
- Estimate costs: Build vs buy decisions
- Leave open questions: Don't pretend to know everything
- 从需求出发:覆盖功能性+非功能性需求
- 先画架构图:提升视觉清晰度
- 明确边界:区分范围内外的内容
- 记录取舍:每个决策都有对应的成本
- 提前规划故障处理:识别可能的故障点及应对方案
- 考虑不同规模:当前、10倍、100倍扩容场景
- 估算成本:辅助自研vs采购的决策
- 保留待确认问题:不要假装所有问题都有答案
Output Checklist
输出检查清单
- Requirements documented (functional + non-functional)
- High-level architecture diagram
- Component breakdown (3-7 components)
- Data flow documented
- Data model defined
- API contracts specified
- Scaling considerations (1x, 10x, 100x)
- Failure modes identified
- Cost estimate provided
- Security considerations
- Monitoring plan
undefined- 已记录需求(功能性+非功能性)
- 已提供高层架构图
- 已拆分组件(3-7个)
- 已记录数据流
- 已定义数据模型
- 已明确API契约
- 已考虑扩容场景(1x、10x、100x)
- 已识别故障场景
- 已提供成本估算
- 已考虑安全因素
- 已制定监控方案
undefined