Search Results: etl

Found 48 Skills

Data Processingk-dense-ai/claude-scienti...

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

cocoindex

Comprehensive toolkit for developing with the CocoIndex library. Use when users need to create data transformation pipelines (flows), write custom functions, or operate flows via CLI or API. Covers building ETL workflows for AI data processing, including embedding documents into vector databases, building knowledge graphs, creating search indexes, or processing data streams with incremental updates.

🇺🇸|EnglishTranslated

Data Processingalirezarezvani/claude-ski...

senior-data-engineer

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

🇺🇸|EnglishTranslated

3 scripts/Attention

Data Processingdaffy0208/ai-dev-standard...

data-engineer

Expert in data pipelines, ETL processes, and data infrastructure

🇺🇸|EnglishTranslated

Data Processinglanej/dotfiles

bigquery

Use bigquery CLI (instead of `bq`) for all Google BigQuery and GCP data warehouse operations including SQL query execution, data ingestion (streaming insert, bulk load, JSONL/CSV/Parquet), data extraction/export, dataset/table/view management, external tables, schema operations, query templates, cost estimation with dry-run, authentication with gcloud, data pipelines, ETL workflows, and MCP/LSP server integration for AI-assisted querying and editor support. Modern Rust-based replacement for the Python `bq` CLI with faster startup, better cost awareness, and streaming support. Handles both small-scale streaming inserts (<1000 rows) and large-scale bulk loading (>10MB files), with support for Cloud Storage integration.

🇺🇸|EnglishTranslated

Data Processingdkyazzentwatwa/chatgpt-sk...

dataset-comparer

Compare two datasets to find differences, added/removed rows, changed values. Use for data validation, ETL verification, or tracking changes.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingborghei/claude-skills

senior-data-engineer

Expert data engineering covering data pipelines, ETL/ELT, data warehousing, streaming, and data quality.

🇺🇸|EnglishTranslated

Data Processingerichowens/some_claude_sk...

data-pipeline-engineer

Expert data engineer for ETL/ELT pipelines, streaming, data warehousing. Activate on: data pipeline, ETL, ELT, data warehouse, Spark, Kafka, Airflow, dbt, data modeling, star schema, streaming data, batch processing, data quality. NOT for: API design (use api-architect), ML training (use ML skills), dashboards (use design skills).

🇺🇸|EnglishTranslated

3 scripts/Attention

Data Processingclaude-office-skills/skil...

data-pipeline

Data pipeline and ETL automation - extract, transform, load workflows for data integration and analytics

🇺🇸|EnglishTranslated

Data Processingdengineproblem/agents-mon...

airbyte-connection-setup

Эксперт Airbyte. Используй для настройки ETL/ELT пайплайнов, коннекторов, синхронизации данных и data pipelines.

🇺🇸|EnglishTranslated

Data Processingrightnow-ai/openfang

data-pipeline

Data pipeline expert for ETL, Apache Spark, Airflow, dbt, and data quality

🇺🇸|EnglishTranslated

Data Processingletta-ai/skills

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.

🇺🇸|EnglishTranslated