Search Results: data-ingestion

Found 28 Skills

clickhouse-best-practices

MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses.

🇺🇸|EnglishTranslated

Data Processingclickhouse/agent-skills

clickhouse-architecture-advisor

MUST USE when designing ClickHouse architectures, selecting between ingestion or modeling patterns, or translating best practices into workload-specific system designs. Complements clickhouse-best-practices with decision frameworks and explicit provenance labels.

🇺🇸|EnglishTranslated

Data Processingmotherduckdb/agent-skills

motherduck-build-data-pipeline

Design an end-to-end MotherDuck pipeline. Use when choosing raw, staging, and analytics boundaries, bulk ingestion paths, transformation sequencing, publication targets, or whether DuckLake is actually required.

🇺🇸|EnglishTranslated

9 scripts/Attention

Data Processingneo4j-contrib/neo4j-skill...

neo4j-spark-skill

Use when reading from or writing to Neo4j with Apache Spark or Databricks using the Neo4j Connector for Apache Spark (org.neo4j:neo4j-connector-apache-spark). Covers SparkSession setup, DataFrame reads via labels/Cypher/relationship scan, DataFrame writes with SaveMode, node.keys for MERGE, relationship write mapping, partition and batch tuning, PySpark and Scala examples, Databricks cluster config, Databricks secrets for credentials, Delta Lake to Neo4j pipelines. Does NOT handle Cypher authoring — use neo4j-cypher-skill. Does NOT handle the Python bolt driver — use neo4j-driver-python-skill. Does NOT handle GDS algorithms — use neo4j-gds-skill.

🇺🇸|EnglishTranslated

Data Processingmaterializeinc/agent-skil...

materialize-docs

Materialize documentation for SQL syntax, data ingestion, concepts, and best practices. Use when users ask about Materialize queries, sources, sinks, views, or clusters.

🇺🇸|EnglishTranslated

Data Processinglanej/dotfiles

bigquery

Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.

🇺🇸|EnglishTranslated

Data Processingjorgealves/agent_skills

clinical-trial-schema-designer

Analyzes clinical trial protocols and generates CDISC-compliant (SDTM/ADaM) data schemas. Use when designing data ingestion pipelines for clinical research or preparing regulatory submissions.

🇺🇸|EnglishTranslated

Data Processinggreptimeteam/docs

greptimedb-pipeline

Guide for creating GreptimeDB Pipeline, by which user can add a process layer to GreptimeDB between ingestion and storage, to transform data.

🇺🇸|EnglishTranslated

Data Processingtinybirdco/tinybird-agent...

tinybird-typescript-sdk-guidelines

Tinybird TypeScript SDK for defining datasources, pipes, and queries with full type inference. Use when working with @tinybirdco/sdk, TypeScript Tinybird projects, or type-safe data ingestion and queries.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesmicrosoft/skills-for-fabr...

eventhouse-authoring-cli

Execute KQL management commands (table management, ingestion, policies, functions, materialized views) against Fabric Eventhouse and KQL Databases via CLI. Use when the user wants to: 1. Create or alter KQL tables, columns, or functions 2. Ingest data into an Eventhouse (inline, from storage, streaming) 3. Configure retention, caching, or partitioning policies 4. Create or manage materialized views and update policies 5. Manage data mappings for ingestion pipelines 6. Deploy KQL schema via scripts Triggers: "create kql table", "kql ingestion", "ingest into eventhouse", "kql function", "materialized view", "kql retention policy", "eventhouse schema", "kql authoring", "create eventhouse table", "kql mapping"

🇺🇸|EnglishTranslated

Data Processinguntitled-data-company/dlt...

dlt-skill

Creates and maintains dlt (data load tool) pipelines from APIs, databases, and other sources. Use when the user wants to build or debug pipelines; use verified sources (e.g. Salesforce, GitHub, Stripe) or declarative REST API or custom Python; configure destinations (e.g. DuckDB, BigQuery, Snowflake); implement incremental loading; or edit .dlt config and secrets. Use when the user mentions data ingestion, dlt pipeline, dlt init, rest_api_source, incremental load, or pipeline dashboard.

🇺🇸|EnglishTranslated

5 scripts/Checked

AI & Machine Learninggarrytan/gbrain

cold-start

Day-one data bootstrapping for a new brain. Sequences the highest-leverage data sources to go from empty brain to useful brain in one session. Uses ClawVisor for safe credential handling — the agent never holds raw API keys. Covers Gmail import, calendar sync, contacts seeding, X/Twitter archive, conversation imports, and file archives. Use when a user has just finished gbrain setup and asks "now what?"

🇺🇸|EnglishTranslated