Search Results: data-ingestion

Found 27 Skills

Data Processingdatahub-project/datahub-s...

datahub-connector-planning

Plans new DataHub connectors by classifying the source system, researching it using a dedicated agent or inline research, and generating a _PLANNING.md blueprint with entity mapping and architecture decisions. Use when building a new connector, researching a source system for DataHub, or designing connector architecture. Triggers on: "plan a connector", "new connector for X", "research X for DataHub", "design connector for X", "create planning doc", or any request to plan/research/design a DataHub ingestion source.

🇺🇸|EnglishTranslated

Data Processingtiangong-ai/skills

kb-meta-fetch

Fetch journal articles from Crossref published after a user-specified date and insert them into PostgreSQL `journals` with DOI deduplication. Use when incrementally ingesting journal metadata from `journals_issn` into `journals`.

🇺🇸|EnglishTranslated

1 scripts/Checked

DevOps & Cloud Servicesmicrosoft/skills-for-fabr...

eventhouse-authoring-cli

Execute KQL management commands (table management, ingestion, policies, functions, materialized views) against Fabric Eventhouse and KQL Databases via CLI. Use when the user wants to: 1. Create or alter KQL tables, columns, or functions 2. Ingest data into an Eventhouse (inline, from storage, streaming) 3. Configure retention, caching, or partitioning policies 4. Create or manage materialized views and update policies 5. Manage data mappings for ingestion pipelines 6. Deploy KQL schema via scripts Triggers: "create kql table", "kql ingestion", "ingest into eventhouse", "kql function", "materialized view", "kql retention policy", "eventhouse schema", "kql authoring", "create eventhouse table", "kql mapping"

🇺🇸|EnglishTranslated

Data Processingmicrosoft/skills-for-fabr...

sqldw-authoring-cli

Execute authoring T-SQL (DDL, DML, data ingestion, transactions, schema changes) against Microsoft Fabric Data Warehouse and SQL endpoints from agentic CLI environments. Use when the user wants to: (1) create/alter/drop tables from terminal, (2) insert/update/delete/merge data via CLI, (3) run COPY INTO or OPENROWSET ingestion, (4) manage transactions or stored procedures, (5) perform schema evolution, (6) use time travel or snapshots, (7) generate ETL/ELT shell scripts, (8) create views/functions/procedures on Lakehouse SQLEP. Triggers: "create table in warehouse", "insert data via T-SQL", "load from ADLS", "COPY INTO", "run ETL with T-SQL", "alter warehouse table", "upsert with T-SQL", "merge into warehouse", "create T-SQL procedure", "warehouse time travel", "recover deleted warehouse data", "create warehouse schema", "deploy warehouse", "transaction conflict", "snapshot isolation error".

🇺🇸|EnglishTranslated

Data Processingneo4j-contrib/neo4j-skill...

neo4j-spark-skill

Use when reading from or writing to Neo4j with Apache Spark or Databricks using the Neo4j Connector for Apache Spark (org.neo4j:neo4j-connector-apache-spark). Covers SparkSession setup, DataFrame reads via labels/Cypher/relationship scan, DataFrame writes with SaveMode, node.keys for MERGE, relationship write mapping, partition and batch tuning, PySpark and Scala examples, Databricks cluster config, Databricks secrets for credentials, Delta Lake to Neo4j pipelines. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT handle the Python bolt driver — use neo4j-driver-python-skill. Does NOT handle GDS algorithms — use neo4j-gds-skill.

🇺🇸|EnglishTranslated

Data Processinggoldsky-io/goldsky-agent

turbo-pipelines

Goldsky Turbo pipeline YAML reference — the authoritative source for field names, required vs optional fields, and valid values. Use whenever the user asks about specific YAML fields: what does `start_at: earliest` vs `latest` do, what fields does a postgres/clickhouse/kafka sink require, what is the `from:` field in a sink, how does `checkpoint` work, what's the syntax for `batch_size` or `primary_key`. Also use for validation errors like 'unknown field' or 'missing required field'. For interactive pipeline building end-to-end, use /turbo-builder instead.

🇺🇸|EnglishTranslated

Data Processingjaganpro/sf-skills

sf-datacloud-prepare

Salesforce Data Cloud Prepare phase. TRIGGER when: user creates or manages Data Cloud data streams, DLOs, transforms, or Document AI configurations, or asks about ingestion into Data Cloud. DO NOT TRIGGER when: the task is connection setup only (use sf-datacloud-connect), DMOs and identity resolution (use sf-datacloud-harmonize), or query/search work (use sf-datacloud-retrieve).

🇺🇸|EnglishTranslated

Data Processingtinybirdco/tinybird-agent...

tinybird-python-sdk-guidelines

Tinybird Python SDK for defining datasources, pipes, and queries in Python. Use when working with tinybird-sdk, Python Tinybird projects, or data ingestion and queries in Python.

🇺🇸|EnglishTranslated

Data Processingmicrosoftdocs/agent-skill...

azure-data-manager-for-agri

Expert knowledge for Azure Data Manager for Agriculture development including limits & quotas, security, configuration, and integrations & coding patterns. Use when setting up BYOL creds/Private Link, ag data ingestion/IoT, AI/nutrient APIs, throttling, or Event Grid logs, and other Azure Data Manager for Agriculture related development tasks. Not for Azure Data Explorer (use azure-data-explorer), Azure Data Factory (use azure-data-factory), Azure Synapse Analytics (use azure-synapse-analytics), Azure Databricks (use azure-databricks).

🇺🇸|EnglishTranslated

Data Processingruvnet/ruflo

market-ingest

Ingest and normalize market data into OHLCV vectors with HNSW indexing

🇺🇸|EnglishTranslated

Data Processingjorgealves/agent_skills

clinical-trial-schema-designer

Analyzes clinical trial protocols and generates CDISC-compliant (SDTM/ADaM) data schemas. Use when designing data ingestion pipelines for clinical research or preparing regulatory submissions.

🇺🇸|EnglishTranslated

Data Processingjohnsonshi/skills365

azure-data-explorer-kusto-queries

Comprehensive guide for Azure Data Explorer (ADX) and Kusto Query Language (KQL); use when writing/optimizing KQL queries, setting up ingestion, building dashboards, doing time-series/ML analysis, configuring management/security, or when users mention Kusto, KQL, ADX, Azure Data Explorer, or log analytics queries.

🇺🇸|EnglishTranslated