Loading...
Loading...
Found 675 Skills
Implement machine learning experiment tracking using MLflow or Weights & Biases. Configures environment and provides code for logging parameters, metrics, and artifacts. Use when asked to "setup experiment tracking" or "initialize MLflow". Trigger with relevant phrases based on skill purpose.
Systematic performance optimization and regression debugging for OneKey mobile app (iOS). Use when: (1) Fixing performance regressions - when metrics like tokensStartMs, tokensSpanMs, or functionCallCount have regressed and need to be brought back to normal levels, (2) Improving baseline performance - when there's a need to optimize cold start time or reduce function call overhead, (3) User requests performance optimization/improvement/debugging for the app's startup or home screen refresh flow.
Build and run LLM-as-judge evaluation pipelines using Amazon Bedrock Evaluation Jobs with pre-computed inference datasets. Use when setting up automated model evaluation, designing test scenarios, collecting pre-computed responses, configuring custom metrics, creating AWS infrastructure, running evaluation jobs, parsing results, and iterating on findings.
Braiins Manager - web-based dashboard for monitoring and managing Bitcoin mining operations with real-time metrics, alerts, and multi-user access control
Testing framework for evaluating Databricks skills. Use when building test cases for skills, running skill evaluations, comparing skill versions, or creating ground truth datasets with the Generate-Review-Promote (GRP) pipeline. Triggers include "test skill", "evaluate skill", "skill regression", "ground truth", "GRP pipeline", "skill quality", and "skill metrics".
Set up comprehensive observability for Fireflies.ai integrations with metrics, traces, and alerts. Use when implementing monitoring for Fireflies.ai operations, setting up dashboards, or configuring alerting for Fireflies.ai integration health. Trigger with phrases like "fireflies monitoring", "fireflies metrics", "fireflies observability", "monitor fireflies", "fireflies alerts", "fireflies tracing".
Validate Godot GDScript files using gdlint, gdformat, gdradon, and LSP diagnostics. Use when users want to: (1) Check code quality after making changes, (2) Validate before committing, (3) Run code metrics analysis, (4) Run export validation, (5) Get real-time LSP diagnostics. Uses command-line tools directly and MCP tools for LSP integration.
Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation.
Detects unrealistic planning and hidden delivery risks like overcommitment, missing dependencies, resource mismatches, and undefined metrics. Use when reviewing quarterly roadmaps or sprint plans.
Use this skill when the user needs to build a financial model, calculate unit economics, understand MRR/ARR/churn, or figure out their quit number. Covers SaaS metrics, CAC/LTV, burn rate, cash flow modeling, and making unit economics legible for non-finance founders.
Profiles DAG execution performance including latency, token usage, cost, and resource consumption. Identifies bottlenecks and optimization opportunities. Activate on 'performance profile', 'execution metrics', 'latency analysis', 'token usage', 'cost analysis'. NOT for execution tracing (use dag-execution-tracer) or failure analysis (use dag-failure-analyzer).
Gate 1: Business requirements document - defines WHAT/WHY before HOW. Creates PRD with problem definition, user stories, success metrics.