Loading...
Loading...
ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization.
npx skill4agent add alirezarezvani/claude-skills senior-ml-engineerFROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model/ /app/model/
COPY src/ /app/src/
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]| Option | Latency | Throughput | Use Case |
|---|---|---|---|
| FastAPI + Uvicorn | Low | Medium | REST APIs, small models |
| Triton Inference Server | Very Low | Very High | GPU inference, batching |
| TensorFlow Serving | Low | High | TensorFlow models |
| TorchServe | Low | High | PyTorch models |
| Ray Serve | Medium | High | Complex pipelines, multi-model |
from feast import Entity, Feature, FeatureView, FileSource
user = Entity(name="user_id", value_type=ValueType.INT64)
user_features = FeatureView(
name="user_features",
entities=["user_id"],
ttl=timedelta(days=1),
features=[
Feature(name="purchase_count_30d", dtype=ValueType.INT64),
Feature(name="avg_order_value", dtype=ValueType.FLOAT),
],
online=True,
source=FileSource(path="data/user_features.parquet"),
)| Trigger | Detection | Action |
|---|---|---|
| Scheduled | Cron (weekly/monthly) | Full retrain |
| Performance drop | Accuracy < threshold | Immediate retrain |
| Data drift | PSI > 0.2 | Evaluate, then retrain |
| New data volume | X new samples | Incremental update |
from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> str:
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
return provider.complete(prompt)| Provider | Input Cost | Output Cost |
|---|---|---|
| GPT-4 | $0.03/1K | $0.06/1K |
| GPT-3.5 | $0.0005/1K | $0.0015/1K |
| Claude 3 Opus | $0.015/1K | $0.075/1K |
| Claude 3 Haiku | $0.00025/1K | $0.00125/1K |
| Database | Hosting | Scale | Latency | Best For |
|---|---|---|---|---|
| Pinecone | Managed | High | Low | Production, managed |
| Qdrant | Both | High | Very Low | Performance-critical |
| Weaviate | Both | High | Low | Hybrid search |
| Chroma | Self-hosted | Medium | Low | Prototyping |
| pgvector | Self-hosted | Medium | Medium | Existing Postgres |
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Fixed | 500-1000 tokens | 50-100 | General text |
| Sentence | 3-5 sentences | 1 sentence | Structured text |
| Semantic | Variable | Based on meaning | Research papers |
| Recursive | Hierarchical | Parent-child | Long documents |
from scipy.stats import ks_2samp
def detect_drift(reference, current, threshold=0.05):
statistic, p_value = ks_2samp(reference, current)
return {
"drift_detected": p_value < threshold,
"ks_statistic": statistic,
"p_value": p_value
}| Metric | Warning | Critical |
|---|---|---|
| p95 latency | > 100ms | > 200ms |
| Error rate | > 0.1% | > 1% |
| PSI (drift) | > 0.1 | > 0.2 |
| Accuracy drop | > 2% | > 5% |
references/mlops_production_patterns.mdreferences/llm_integration_guide.mdreferences/rag_system_architecture.mdpython scripts/model_deployment_pipeline.py --model model.pkl --target stagingpython scripts/rag_system_builder.py --config rag_config.yaml --analyzepython scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy| Category | Tools |
|---|---|
| ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM Frameworks | LangChain, LlamaIndex, DSPy |
| MLOps | MLflow, Weights & Biases, Kubeflow |
| Data | Spark, Airflow, dbt, Kafka |
| Deployment | Docker, Kubernetes, Triton |
| Databases | PostgreSQL, BigQuery, Pinecone, Redis |