Search Results: production-monitoring

Found 16 Skills

AI & Machine Learningsupercent-io/skills-templ...

agent-evaluation

Design and implement comprehensive evaluation systems for AI agents. Use when building evals for coding agents, conversational agents, research agents, or computer-use agents. Covers grader types, benchmarks, 8-step roadmap, and production integration.

🇺🇸|EnglishTranslated

10.1k

Backend Developmentyonatangross/orchestkit

celery-advanced

Advanced Celery patterns including canvas workflows, priority queues, rate limiting, multi-queue routing, and production monitoring. Use when implementing complex task orchestration, task prioritization, or enterprise-grade background processing.

🇺🇸|EnglishTranslated

5 scripts/Checked

AI & Machine Learningjeremylongshore/claude-co...

sagemaker-endpoint-deployer

Sagemaker Endpoint Deployer - Auto-activating skill for ML Deployment. Triggers on: sagemaker endpoint deployer, sagemaker endpoint deployer Part of the ML Deployment skill category.

🇺🇸|EnglishTranslated

Backend Developmentgiuseppe-trisciuoglio/dev...

spring-boot-actuator

Configure Spring Boot Actuator for production-grade monitoring, health probes, secured management endpoints, and Micrometer metrics across JVM services.

🇺🇸|EnglishTranslated

Backend Developmentvintasoftware/django-ai-p...

django-celery-expert

Expert Django and Celery guidance for asynchronous task processing. Use when designing background tasks, configuring workers, handling retries and errors, optimizing task performance, implementing periodic tasks, or setting up production monitoring. Follows Celery best practices with Django integration patterns.

🇺🇸|EnglishTranslated

Backend Developmentmartinholovsky/claude-ski...

celery-expert

Expert Celery distributed task queue engineer specializing in async task processing, workflow orchestration, broker configuration (Redis/RabbitMQ), Celery Beat scheduling, and production monitoring. Deep expertise in task patterns (chains, groups, chords), retries, rate limiting, Flower monitoring, and security best practices. Use when designing distributed task systems, implementing background job processing, building workflow orchestration, or optimizing task queue performance.

🇺🇸|EnglishTranslated

AI & Machine Learningarize-ai/phoenix

phoenix-evals

Build and run evaluators for AI/LLM applications using Phoenix.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesbobmatnyc/claude-mpm-skil...

vercel-observability

Vercel observability for Web Analytics, Speed Insights, logs, tracing, alerts, and observability tooling. Use when monitoring performance or debugging production behavior on Vercel.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesmystenlabs/skills

sui-publish

Publishing, upgrading, and deploying Sui Move packages. Use this skill when the user needs to publish a package, upgrade a published package, deploy to multiple networks, serialize transactions for multisig signing, run a local Sui network (localnet), prepare for Mainnet launch, monitor production deployments, or debug dry run failures. Also use when the user asks about sui client publish, sui client upgrade, UpgradeCap, upgrade policies, Published.toml, --serialize-output, localnet, mainnet launch checklist, gas estimation, multisig publishing, production monitoring, rollback, incident response, devInspectTransactionBlock, or --dry-run.

🇺🇸|EnglishTranslated

AI & Machine Learningomidzamani/dspy-skills

dspy-debugging-observability

This skill should be used when the user asks to "debug DSPy programs", "trace LLM calls", "monitor production DSPy", "use MLflow with DSPy", mentions "inspect_history", "custom callbacks", "observability", "production monitoring", "cost tracking", or needs to debug, trace, and monitor DSPy applications in development and production.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesdawiddutoit/custom-claude

clickhouse-operations

Complete ClickHouse operations guide for DevOps and SRE teams managing production deployments. Provides practical guidance on monitoring essential metrics (query latency, throughput, memory, disk), introspecting system tables, performance analysis, scaling strategies (vertical and horizontal), backup/disaster recovery, tuning at query/server/table levels, and troubleshooting common issues. Use when diagnosing ClickHouse problems, optimizing performance, planning capacity, setting up monitoring, implementing backups, or managing production clusters. Includes resource management strategies for disk space, connections, and background operations plus production checklists.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesdadbodgeoff/drift

anomaly-detection

Rule-based anomaly detection for production systems with configurable thresholds, cooldown periods to prevent alert storms, and error pattern tracking for repeated failures.

🇺🇸|EnglishTranslated