ml-api-endpoint
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseML API Endpoint Expert
ML API 端点专家
Expert in designing and deploying machine learning API endpoints.
专注于机器学习API端点的设计与部署。
Core Principles
核心原则
API Design
API设计
- Stateless Design: Each request contains all necessary information
- Consistent Response Format: Standardize success/error structures
- Versioning Strategy: Plan for model updates
- Input Validation: Rigorous validation before inference
- 无状态设计:每个请求包含所有必要信息
- 统一响应格式:标准化成功/错误响应结构
- 版本化策略:为模型更新做好规划
- 输入验证:推理前进行严格验证
FastAPI Implementation
FastAPI 实现
Basic ML Endpoint
基础机器学习端点
python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, validator
import joblib
import numpy as np
app = FastAPI(title="ML Model API", version="1.0.0")
model = None
@app.on_event("startup")
async def load_model():
global model
model = joblib.load("model.pkl")
class PredictionInput(BaseModel):
features: list[float]
@validator('features')
def validate_features(cls, v):
if len(v) != 10:
raise ValueError('Expected 10 features')
return v
class PredictionResponse(BaseModel):
prediction: float
confidence: float | None = None
model_version: str
request_id: str
@app.post("/predict", response_model=PredictionResponse)
async def predict(input_data: PredictionInput):
features = np.array([input_data.features])
prediction = model.predict(features)[0]
return PredictionResponse(
prediction=float(prediction),
model_version="v1",
request_id=generate_request_id()
)python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, validator
import joblib
import numpy as np
app = FastAPI(title="ML Model API", version="1.0.0")
model = None
@app.on_event("startup")
async def load_model():
global model
model = joblib.load("model.pkl")
class PredictionInput(BaseModel):
features: list[float]
@validator('features')
def validate_features(cls, v):
if len(v) != 10:
raise ValueError('Expected 10 features')
return v
class PredictionResponse(BaseModel):
prediction: float
confidence: float | None = None
model_version: str
request_id: str
@app.post("/predict", response_model=PredictionResponse)
async def predict(input_data: PredictionInput):
features = np.array([input_data.features])
prediction = model.predict(features)[0]
return PredictionResponse(
prediction=float(prediction),
model_version="v1",
request_id=generate_request_id()
)Batch Prediction
批量预测
python
class BatchInput(BaseModel):
instances: list[list[float]]
@validator('instances')
def validate_batch_size(cls, v):
if len(v) > 100:
raise ValueError('Batch size cannot exceed 100')
return v
@app.post("/predict/batch")
async def batch_predict(input_data: BatchInput):
features = np.array(input_data.instances)
predictions = model.predict(features)
return {
"predictions": predictions.tolist(),
"count": len(predictions)
}python
class BatchInput(BaseModel):
instances: list[list[float]]
@validator('instances')
def validate_batch_size(cls, v):
if len(v) > 100:
raise ValueError('Batch size cannot exceed 100')
return v
@app.post("/predict/batch")
async def batch_predict(input_data: BatchInput):
features = np.array(input_data.instances)
predictions = model.predict(features)
return {
"predictions": predictions.tolist(),
"count": len(predictions)
}Performance Optimization
性能优化
Model Caching
模型缓存
python
class ModelCache:
def __init__(self, ttl_seconds=300):
self.cache = {}
self.ttl = ttl_seconds
def get(self, features):
key = hashlib.md5(str(features).encode()).hexdigest()
if key in self.cache:
result, timestamp = self.cache[key]
if time.time() - timestamp < self.ttl:
return result
return None
def set(self, features, prediction):
key = hashlib.md5(str(features).encode()).hexdigest()
self.cache[key] = (prediction, time.time())python
class ModelCache:
def __init__(self, ttl_seconds=300):
self.cache = {}
self.ttl = ttl_seconds
def get(self, features):
key = hashlib.md5(str(features).encode()).hexdigest()
if key in self.cache:
result, timestamp = self.cache[key]
if time.time() - timestamp < self.ttl:
return result
return None
def set(self, features, prediction):
key = hashlib.md5(str(features).encode()).hexdigest()
self.cache[key] = (prediction, time.time())Health Checks
健康检查
python
@app.get("/health")
async def health_check():
return {
"status": "healthy",
"model_loaded": model is not None
}
@app.get("/metrics")
async def get_metrics():
return {
"requests_total": request_counter,
"prediction_latency_avg": avg_latency,
"error_rate": error_rate
}python
@app.get("/health")
async def health_check():
return {
"status": "healthy",
"model_loaded": model is not None
}
@app.get("/metrics")
async def get_metrics():
return {
"requests_total": request_counter,
"prediction_latency_avg": avg_latency,
"error_rate": error_rate
}Docker Deployment
Docker 部署
dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]Best Practices
最佳实践
- Use async/await for I/O operations
- Validate data types, ranges, and business rules
- Cache predictions for deterministic models
- Handle model failures with fallback responses
- Log predictions, latencies, and errors
- Support multiple model versions
- Set memory and CPU limits
- 对I/O操作使用async/await
- 验证数据类型、范围及业务规则
- 为确定性模型缓存预测结果
- 通过降级响应处理模型故障
- 记录预测结果、延迟及错误信息
- 支持多模型版本
- 设置内存和CPU限制