Loading...
Loading...
Found 61 Skills
This skill should be used when the user asks to "validate a DataFrame with pandera", "write a pandera schema", "use pandera DataFrameModel", "add data validation to a pipeline", or needs guidance on pandera best practices for data quality.
Use this skill whenever the user mentions IP geolocation feeds, RFC 8805, geofeeds, or wants help creating, tuning, validating, or publishing a self-published IP geolocation feed in CSV format. Intended user audience is a network operator, ISP, mobile carrier, cloud provider, hosting company, IXP, or satellite provider asking about IP geolocation accuracy, or geofeed authoring best practices. Helps create, refine, and improve CSV-format IP geolocation feeds with opinionated recommendations beyond RFC 8805 compliance. Do NOT use for private or internal IP address management — applies only to publicly routable IP addresses.
Pro tips for B2B list building - source mixing, enrichment workflow, template usage, and efficiency principles. Use when building prospect lists, optimizing data quality, or improving prospecting efficiency.
Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.
Comprehensive data validation using Pydantic v2 with data quality monitoring and schema alignment for PlanetScale PostgreSQL. Use when implementing API validation, database schema alignment, or data quality assurance. Triggers: 'validation', 'Pydantic', 'schema', 'data quality'.
Data validation and pipeline testing utilities for ML training projects. Validates datasets, model checkpoints, training pipelines, and dependencies. Use when validating training data, checking model outputs, testing ML pipelines, verifying dependencies, debugging training failures, or ensuring data quality before training.
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
Validate and audit CSV data for quality, consistency, and completeness. Use when you need to check CSV files for data issues, missing values, or format inconsistencies.
Complete 9-step Clay enrichment workflow for 90%+ data coverage plus 58 Clay templates across 8 categories. Use when building enrichment workflows, setting up Clay tables, or maximizing data quality.
Data pipeline expert for ETL, Apache Spark, Airflow, dbt, and data quality
Decision-first data analysis with statistical rigor gates. Use when analyzing CSV, JSON, database exports, API responses, logs, or any structured data to support a business decision. Handles: trend analysis, cohort comparison, A/B test evaluation, distribution profiling, anomaly detection. Do NOT use for codebase analysis (use codebase-analyzer), codebase exploration (use explore-pipeline), or ML model training.
Set up, audit, and debug analytics tracking implementation — GA4, Google Tag Manager, event taxonomy, conversion tracking, and data quality. Use when building a tracking plan from scratch, auditing existing analytics for gaps or errors, debugging missing events, or setting up GTM. Trigger keywords: GA4 setup, Google Tag Manager, GTM, event tracking, analytics implementation, conversion tracking, tracking plan, event taxonomy, custom dimensions, UTM tracking, analytics audit, missing events, tracking broken. NOT for analyzing marketing campaign data — use campaign-analytics for that. NOT for BI dashboards — use product-analytics for in-product event analysis.