Search Results: dataset

Found 331 Skills

spice-connect-data

Connect Spice to data sources and query across them with federated SQL. Use when connecting to databases (Postgres, MySQL, DynamoDB), data lakes (S3, Delta Lake, Iceberg), warehouses (Snowflake, Databricks), files, APIs, or catalogs; configuring datasets; creating views; writing data; or setting up cross-source queries.

🇺🇸|EnglishTranslated

AI & Machine Learningawslabs/agent-plugins

dataset-evaluation

Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingdaemon-blockint-tech/agen...

data-scrubbing

Guides cleaning and standardizing tabular datasets before analysis, modeling, or reporting—profiling, quality rules, missing values, duplicates, outliers, type coercion, encoding fixes, record linkage, deduplication, high-level PII handling (not legal advice), actuarial/insurance field scrubbing, reproducible scrub pipelines, validation checks, and sign-off. Distinct from warehouse ETL or statistical modeling. Use when the user asks for "data scrubbing", "clean this dataset", "scrub the data", "data cleaning", "dedupe records", "handle missing values", "outlier treatment", "standardize columns", "data quality rules", "profile this table", or "prepare data for modeling". Not warehouse pipelines (data-warehouse-engineer), ML modeling (data-scientist, actuary), privacy programs (compliance-engineer), FinOps only (finops-analyst), or assumption governance (assumption-setting).

🇺🇸|EnglishTranslated

Data Processingcartodb/agent-skills

carto-find-spatial-data

Discover and subscribe to external spatial datasets via CARTO Data Observatory and partner catalogs.

🇺🇸|EnglishTranslated

AI & Machine Learningnousresearch/hermes-agent

huggingface-hub

HuggingFace hf CLI: search/download/upload models, datasets.

🇺🇸|EnglishTranslated

AI & Machine Learningsammcj/agentic-coding

piper-tts-training

Train custom TTS voices for Piper (ONNX format) using fine-tuning or from-scratch approaches. Use when creating new synthetic voices, fine-tuning existing Piper checkpoints, preparing audio datasets for TTS training, or deploying voice models to devices like Raspberry Pi or Home Assistant. Covers dataset preparation, Whisper-based validation, training configuration, and ONNX export.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learninglangchain-ai/langsmith-sk...

langsmith-dataset

INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.

🇺🇸|EnglishTranslated

Data Processingstahura/domo-ai-vibe-rule...

cap-de-programmatic-filters

Control which data each viewer sees in an embedded Domo dashboard/card via server-side programmatic filters and dataset switching. Covers the OAuth → embed token flow, standard filters, SQL filters (OR/BETWEEN/LIKE), per-dataset targeting, datasetRedirects for multi-tenant architectures, and token size limits. Use for any per-viewer, per-role, or per-tenant data restrictions at embed time. Not for client-side JS API filtering (use cap-de-jsapi-filters).

🇺🇸|EnglishTranslated

AI & Machine Learningharbor-framework/harbor

publish

Publish a Harbor task or dataset to the registry. Use when the user wants to upload, publish, or share tasks or datasets/benchmarks on the Harbor registry.

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

powerbi-authoring-cli

Create, manage, and deploy Power BI semantic models inside Microsoft Fabric workspaces via `az rest` CLI against Fabric and Power BI REST APIs. Use when the user wants to: (1) create a semantic model from TMDL definition files, (2) retrieve or download semantic model definitions, (3) update a semantic model definition with modified TMDL, (4) trigger or manage dataset refresh operations, (5) configure data sources, parameters, or permissions, (6) deploy semantic models between pipeline stages. Covers Fabric Items API (CRUD) and Power BI Datasets API (refresh, data sources, permissions). For read-only DAX queries, use `powerbi-consumption-cli`. For fine-grained modeling changes, route to `powerbi-modeling-mcp`. Triggers: "create semantic model", "upload TMDL", "download semantic model TMDL", "refresh dataset", "semantic model deployment pipeline", "dataset permissions", "list dataset users", "semantic model authoring".

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-gym-pivot-datasets

Use when creating, validating, or documenting Nemo Gym pivot datasets from rollout, trajectory, chat-completion, Responses API, or tool-call artifacts. Covers Gym Responses-style row conversion, pivot selection, single-step tool-use configs, agent_ref alignment, verifier knobs, expected-action row contracts, and train/eval usage.

🇺🇸|EnglishTranslated

5 scripts/Checked

Data Processingwentorai/research-plugins

scraping-skills

6 web scraping & data collection skills. Trigger: collecting web data, finding datasets, API access for research. Design: ethical scraping methods with rate limiting and data quality checks.

🇺🇸|EnglishTranslated