Search Results: dataset

Found 331 Skills

annotating-task-lineage

Annotate Airflow tasks with data lineage using inlets and outlets. Use when the user wants to add lineage metadata to tasks, specify input/output datasets, or enable lineage tracking for operators without built-in OpenLineage extraction.

🇺🇸|EnglishTranslated

Data Processinglegout/data-platform-agen...

data-engineering-storage-remote-access-libraries-pyarrow-fs

Native Arrow filesystem integration with PyArrow. Optimized for Parquet workflows, zero-copy data transfer, predicate pushdown, and column pruning. Covers S3, GCS, HDFS with PyArrow datasets.

🇺🇸|EnglishTranslated

AI & Machine Learningfamaoai-creator/gemini-sk...

ai-ethics-auditor

Audits AI systems for bias, fairness, and privacy. Analyzes prompts and datasets to ensure ethical and safe AI implementation.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningmlflow/skills

agent-evaluation

Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).

🇺🇸|EnglishTranslated

12 scripts/Attention

AI & Machine Learningtiangong-ai/skills

dify-knowledge-base-search

Dify dataset retrieve API for knowledge base chunk search/testing. Use when integrating or debugging Dify knowledge base retrieval requests, retrieval_model options, or response shaping.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

azure-ai-projects-ts

Build AI applications using Azure AI Projects SDK for JavaScript (@azure/ai-projects). Use when working with Foundry project clients, agents, connections, deployments, datasets, indexes, evaluations, or getting OpenAI clients.

🇺🇸|EnglishTranslated

Backend Developmentlovrabet/lovrabet-skill

lovrabet

Used when user requests involve dataset queries, SQL creation, and BFF development for the Lovrabet/Yuntoo platform. Trigger words: dataset, data table, custom SQL, filter, sql.execute, bff.execute, get_dataset_detail, validate_sql_content, save_or_update_custom_sql, save_or_update_bff_script, @lovrabet/sdk, MCP SQL workflow, multi-table association, lovrabet development.

🇨🇳|ChineseTranslated

Data Processingaliyun/alibabacloud-aiops...

alibabacloud-dataworks-metadata

DataWorks metadata Skill for Alibaba Cloud — browse Data Map metadata and perform non-destructive writes via Aliyun CLI. READ scope: list/get catalogs, databases, tables, columns, partitions; query data lineage (upstream/downstream impact); list/get datasets & versions; list/get metadata collections (Category/Album) and entities inside them; preview dataset version content. WRITE scope (non-destructive only): update table & column business metadata; register lineage relationships; create/update datasets and versions; create/update metadata collections and add entities to them. This Skill exposes NO delete or remove APIs — every `delete-*` and `remove-*` operation is intentionally out of scope. For deletions, use the DataWorks console. Triggers: "dataworks metadata", "data map", "data lineage", "meta collection", "dataset", "catalog", "table info", "column info", "partition", "impact analysis", "register lineage", "create dataset", "update business metadata".

🇺🇸|EnglishTranslated

Data Processingspiceai/skills

spice-accelerators

Configure data accelerators for local materialization and caching in Spice (Arrow, DuckDB, SQLite, Cayenne, PostgreSQL, Turso). Use when asked to "accelerate data", "enable caching", "materialize dataset", "configure refresh", "set up local storage", "improve query performance", "choose an accelerator", or "configure snapshots".

🇺🇸|EnglishTranslated

Data Processinggemini-cli-extensions/big...

bigquery-data

Use these skills when you need to handle large-scale data exploration and dataset management. Use when users need to find data assets or run SQL at scale. Provides metadata discovery and query execution across the data warehouse.

🇺🇸|EnglishTranslated

6 scripts/Attention

Security & Compliancevincenzoimp/academic-rese...

ethics-data-governance

Use when academic research involves human subjects, public web data, platform scraping, sensitive domains, privacy risk, dataset sharing, consent, IRB, licenses, or data retention.

🇺🇸|EnglishTranslated

AI & Machine Learningsundial-org/skills

training-data-curation

Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.

🇺🇸|EnglishTranslated