Search Results: data-ingestion

Found 12 Skills

clickhouse-best-practices

MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses.

🇺🇸|EnglishTranslated

Data Processinggreptimeteam/docs

greptimedb-pipeline

Guide for creating GreptimeDB Pipeline, by which user can add a process layer to GreptimeDB between ingestion and storage, to transform data.

🇺🇸|EnglishTranslated

Data Processinglanej/dotfiles

bigquery

Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.

🇺🇸|EnglishTranslated

Data Processingmaterializeinc/agent-skil...

materialize-docs

Materialize documentation for SQL syntax, data ingestion, concepts, and best practices. Use when users ask about Materialize queries, sources, sinks, views, or clusters.

🇺🇸|EnglishTranslated

Data Processingtinybirdco/tinybird-agent...

tinybird-typescript-sdk-guidelines

Tinybird TypeScript SDK for defining datasources, pipes, and queries with full type inference. Use when working with @tinybirdco/sdk, TypeScript Tinybird projects, or type-safe data ingestion and queries.

🇺🇸|EnglishTranslated

Data Processingdengineproblem/agents-mon...

kinesis-stream-processor

Эксперт AWS Kinesis. Используй для stream processing, real-time data и Kinesis patterns.

🇺🇸|EnglishTranslated

Data Processinggoldsky-io/goldsky-agent

turbo-pipelines

Goldsky Turbo pipeline YAML reference — the authoritative source for field names, required vs optional fields, and valid values. Use whenever the user asks about specific YAML fields: what does `start_at: earliest` vs `latest` do, what fields does a postgres/clickhouse/kafka sink require, what is the `from:` field in a sink, how does `checkpoint` work, what's the syntax for `batch_size` or `primary_key`. Also use for validation errors like 'unknown field' or 'missing required field'. For interactive pipeline building end-to-end, use /turbo-builder instead.

🇺🇸|EnglishTranslated

Data Processingvasilyu1983/ai-agents-pub...

data-lake-platform

Data lake and lakehouse platform patterns: ingestion/CDC, transformations, open table formats (Iceberg/Delta/Hudi), query and serving engines (Trino/ClickHouse/DuckDB), orchestration, governance/lineage, cost and operations. Self-hosted and cloud options.

🇺🇸|EnglishTranslated

Data Processinguntitled-data-company/dlt...

dlt-skill

Creates and maintains dlt (data load tool) pipelines from APIs, databases, and other sources. Use when the user wants to build or debug pipelines; use verified sources (e.g. Salesforce, GitHub, Stripe) or declarative REST API or custom Python; configure destinations (e.g. DuckDB, BigQuery, Snowflake); implement incremental loading; or edit .dlt config and secrets. Use when the user mentions data ingestion, dlt pipeline, dlt init, rest_api_source, incremental load, or pipeline dashboard.

🇺🇸|EnglishTranslated

5 scripts/Checked

Data Processingtiangong-ai/skills

kb-meta-fetch

Fetch journal articles from Crossref published after a user-specified date and insert them into PostgreSQL `journals` with DOI deduplication. Use when incrementally ingesting journal metadata from `journals_issn` into `journals`.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingjorgealves/agent_skills

clinical-trial-schema-designer

Analyzes clinical trial protocols and generates CDISC-compliant (SDTM/ADaM) data schemas. Use when designing data ingestion pipelines for clinical research or preparing regulatory submissions.

🇺🇸|EnglishTranslated

Data Processingjohnsonshi/skills365

azure-data-explorer-kusto-queries

Comprehensive guide for Azure Data Explorer (ADX) and Kusto Query Language (KQL); use when writing/optimizing KQL queries, setting up ingestion, building dashboards, doing time-series/ML analysis, configuring management/security, or when users mention Kusto, KQL, ADX, Azure Data Explorer, or log analytics queries.

🇺🇸|EnglishTranslated