Search Results: dataset

Found 288 Skills

hf-cli

Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models, datasets, and Spaces on the Hugging Face Hub. Replaces now deprecated `huggingface-cli` command.

🇺🇸|EnglishTranslated

AI & Machine Learninggalaxy-dawn/claude-schola...

architecture-design

Use ONLY when creating NEW registrable components in ML projects that require Factory/Registry patterns. ✅ USE when: - Creating a new Dataset class (needs @register_dataset) - Creating a new Model class (needs @register_model) - Creating a new module directory with __init__.py factory - Initializing a new ML project structure from scratch - Adding new component types (Augmentation, CollateFunction, Metrics) ❌ DO NOT USE when: - Modifying existing functions or methods - Fixing bugs in existing code - Adding helper functions or utilities - Refactoring without adding new registrable components - Simple code changes to a single file - Modifying configuration files - Reading or understanding existing code Key indicator: Does the task require @register_* decorator or Factory pattern? If no, skip this skill.

🇺🇸|EnglishTranslated

4 scripts/Attention

Tools & Utilitiesnotque/claude-code-toolki...

pr-miner

Deterministic 3-phase GitHub PR review comment extraction: Authenticate, Mine, Validate. Use when mining tribal knowledge from PR reviews, extracting coding standards from review history, or building datasets for the Code Archaeologist agent. Use for "mine PRs", "extract review comments", "tribal knowledge", or "PR review history". Do NOT use for analyzing patterns, generating rules, or interpreting comments — that is the Code Archaeologist agent's responsibility.

🇺🇸|EnglishTranslated

2 scripts/Attention

Data Processingdkyazzentwatwa/chatgpt-sk...

data-quality-auditor

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

🇺🇸|EnglishTranslated

1 scripts/Attention

Data Processingaborruso/opensdmx

sdmx-explorer

Guided, interactive exploration of statistical data via SDMX providers (Eurostat, OECD, ECB, World Bank, ISTAT, and others) using the opensdmx CLI. Use this skill whenever the user asks ANY question about statistics or data that could be answered with SDMX data — even if they don't mention SDMX, Eurostat, or any provider by name. Topics include demographics, economy, employment, births, deaths, population, prices, trade, health, agriculture, GDP, inflation, unemployment, fertility rates, migration, energy, education, poverty, housing, and any other statistical topic. Also use it when the user mentions a specific dataflow ID they want to explore. Trigger this skill even for implicit questions like "how many births were there in Italy last year?" or "I need EU unemployment data by age group" — these clearly need SDMX data even if the user doesn't say so. The skill guides the user step by step: discovers relevant datasets, proposes the most meaningful candidates, explores the schema using real constraints (not codelists), explains the dataset structure, and invites the user to make informed filter choices before fetching any data.

🇺🇸|EnglishTranslated

Data Processingholistics/skills

write-aql

Write and run AQL (Analytic Query Language) queries to answer data questions. Use this whenever the user asks for data, wants to query a dataset, needs to filter/aggregate/join data, or asks about metrics and dimensions in Holistics.

🇺🇸|EnglishTranslated

Tools & Utilitiesevoscientist/evoskills

paper-navigator

Find and read academic papers: disambiguate queries, discover papers (search, citation traversal, recommendations, arXiv monitoring, trending, GitHub search), evaluate (TLDR, citations, code, SOTA), and read with structured analysis (3-level strategy). Use when: finding papers, reading a paper, related work, citation analysis, research trends, SOTA results, datasets. Do NOT use for generating literature survey reports (use research-survey), generating research ideas (use research-ideation), writing a paper's Related Work section (use paper-writing), comparing/ranking research ideas (use research-ideation), or planning paper structure (use paper-planning).

🇺🇸|EnglishTranslated

13 scripts/Checked

AI & Machine Learningjettyio/jettyio-skills

jetty

Manage Jetty workflows and assets. Use when the user wants to create, edit, run, deploy, debug, or monitor AI/ML workflows on Jetty. Also use when they mention collections, tasks, trajectories, datasets, models, labels, step templates, or workflow runs. Triggers include 'run workflow', 'create task', 'list collections', 'check trajectory', 'label trajectory', 'add label', 'deploy workflow', 'show results', 'download output', 'debug run', 'workflow failed', or any Jetty/mise/dock operations. Even if the user doesn't say 'Jetty' explicitly, use this skill whenever they're working with Jetty API endpoints, workflow JSON, or init_params.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningonewave-ai/claude-skills

agent-swarm-deployer

Deploys swarms of sub-agents for massive parallel data processing tasks. Unlike agent-army (which is for code changes), this is for DATA tasks -- processing 1000 documents, analyzing datasets, bulk content generation. Configurable swarm size, task distribution, result aggregation, progress tracking, and error recovery.

🇺🇸|EnglishTranslated

Data Processingduckdb/duckdb-skills

s3-explore

Explore and query data on S3, Cloudflare R2, GCS, MinIO, or any S3-compatible storage. Use when the user mentions an s3://, r2://, gs://, or gcs:// URL, asks "what's in this bucket", wants to list remote files, preview remote Parquet/CSV/JSON, or query data on object storage without downloading it. Also triggers when the user wants to know the size, schema, or row count of remote datasets.

🇺🇸|EnglishTranslated

Tools & Utilitieslovrabet/lovrabet-cli

lovrabet

Lovrabet Runtime CLI — Manage application directories, dataset queries, data CRUD, SQL execution, and BFF invocations via the lovrabet command. Trigger words: Cloud Diagram, lovrabet, lovrabet-cli, app list, dataset, data filter, data getOne, create, update, delete, sql exec, bff exec, accessKey, compress, jq.

🇨🇳|ChineseTranslated

Data Processingnvidia-nemo/datadesigner

data-designer

Use when the user wants to create a dataset, generate synthetic data, or build a data generation pipeline.

🇺🇸|EnglishTranslated

1 scripts/Checked