Loading...
Loading...
Found 33 Skills
Optimize provider selection, routing, and credit usage across 150+ enrichment sources for company/contact intelligence.
Bronze/Silver/Gold layer design patterns and templates for building scalable data lakehouse architectures. Includes incremental processing, data quality checks, and optimization strategies.
Profile datasets to understand schema, quality, and characteristics. Use when analyzing data files (CSV, JSON, Parquet), discovering dataset properties, assessing data quality, or when user mentions data profiling, schema detection, data analysis, or quality metrics. Provides basic and intermediate profiling including distributions, uniqueness, and pattern detection.
Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.
Comprehensive data validation using Pydantic v2 with data quality monitoring and schema alignment for PlanetScale PostgreSQL. Use when implementing API validation, database schema alignment, or data quality assurance. Triggers: 'validation', 'Pydantic', 'schema', 'data quality'.
Data validation and pipeline testing utilities for ML training projects. Validates datasets, model checkpoints, training pipelines, and dependencies. Use when validating training data, checking model outputs, testing ML pipelines, verifying dependencies, debugging training failures, or ensuring data quality before training.
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
Validate and audit CSV data for quality, consistency, and completeness. Use when you need to check CSV files for data issues, missing values, or format inconsistencies.
Run a comprehensive data quality assessment and produce a scorecard across 6 dimensions: completeness, uniqueness, consistency, timeliness, accuracy, validity. Use when the user asks about data quality, mentions data issues, wants to audit a table, is onboarding a new data source, or needs to validate pipeline output.