Loading...
Loading...
Found 776 Skills
GCP cloud resources including Compute Engine, GKE, Cloud Run, Pub/Sub, VPC networking, DNS, IAM, Secret Manager, and monitoring. Monitor GCP infrastructure, analyze resource usage, audit security posture, and manage organizational hierarchy across projects and folders.
How to launch distributed Megatron-LM training jobs on a SLURM cluster. Covers a minimal sbatch skeleton, environment-variable setup for torch.distributed.run, CUDA_DEVICE_MAX_CONNECTIONS rules across hardware and parallelism modes, container conventions, monitoring, and per-rank failure diagnosis.
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Implement uptime monitoring and status page systems for tracking service availability. Use when monitoring application uptime, creating status pages, or implementing health checks.
Expert in building scalable ML systems, from data pipelines and model training to production deployment and monitoring.
Expert database optimizer specializing in modern performance tuning, query optimization, and scalable architectures. Masters advanced indexing, N+1 resolution, multi-tier caching, partitioning strategies, and cloud database optimization. Handles complex query analysis, migration strategies, and performance monitoring. Use PROACTIVELY for database optimization, performance issues, or scalability challenges.
Linux system administration and monitoring
Programmatic social media and marketing research using free tools: pytrends (Google Trends), yars (Reddit without API keys), and Perplexity MCP (Twitter/TikTok/Web). Use when finding trending topics in a niche, tracking keyword velocity and volume, monitoring Reddit discussions, discovering what's going viral, or researching content opportunities before writing. Zero-cost research stack with built-in rate limiting. Complements content-marketing-social-listening skill with executable code.
Amazon Bedrock AgentCore Evaluations for testing and monitoring AI agent quality. 13 built-in evaluators plus custom LLM-as-Judge patterns. Use when testing agents, monitoring production quality, setting up alerts, or validating agent behavior.
Advanced Celery patterns including canvas workflows, priority queues, rate limiting, multi-queue routing, and production monitoring. Use when implementing complex task orchestration, task prioritization, or enterprise-grade background processing.
Full-stack observability with Datadog APM, logs, metrics, synthetics, and RUM. Use when implementing monitoring, tracing, alerting, or cost optimization for production systems.
Use when preparing to deploy to production. Use when you need a pre-launch checklist, when setting up monitoring, when planning a staged rollout, or when you need a rollback strategy.