Loading...
Loading...
Found 4 Skills
Generate synthetic training data when you don't have enough real examples. Use when you're starting from scratch with no data, need a proof of concept fast, have too few examples for optimization, can't use real customer data for privacy or compliance, need to fill gaps in edge cases, have unbalanced categories, added new categories, or changed your schema. Covers DSPy synthetic data generation, quality filtering, and bootstrapping from zero.
Scale AI integration. Manage Organizations, Users, Goals. Use when the user wants to interact with Scale AI data.
Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.
Chief Data Officer advisory for startups: AI training data rights and consent provenance, data product strategy (warehouse vs lakehouse vs mesh, build-vs-buy), B2B customer-data-as-asset valuation and M&A readiness, data team org evolution. Use when deciding whether to train models on customer data, choosing data architecture, valuing data for fundraising or M&A, sequencing data hires, or when user mentions CDO, chief data officer, data strategy, data mesh, lakehouse, training data, data product, data monetization, or customer data asset. NOT a tactical data engineering skill — strategic decisions only.