Loading...
Loading...
Found 42 Skills
Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.
Comprehensive data validation using Pydantic v2 with data quality monitoring and schema alignment for PlanetScale PostgreSQL. Use when implementing API validation, database schema alignment, or data quality assurance. Triggers: 'validation', 'Pydantic', 'schema', 'data quality'.
Data validation and pipeline testing utilities for ML training projects. Validates datasets, model checkpoints, training pipelines, and dependencies. Use when validating training data, checking model outputs, testing ML pipelines, verifying dependencies, debugging training failures, or ensuring data quality before training.
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
Validate and audit CSV data for quality, consistency, and completeness. Use when you need to check CSV files for data issues, missing values, or format inconsistencies.
Data engineering patterns for ETL pipelines, data warehousing, Apache Spark, and data quality validation