Search Results: etl-pipeline

Found 14 Skills

Data Processingk-dense-ai/claude-scienti...

polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

🇺🇸|EnglishTranslated

Data Processingaj-geddes/useful-ai-promp...

batch-processing-jobs

Implement robust batch processing systems with job queues, schedulers, background tasks, and distributed workers. Use when processing large datasets, scheduled tasks, async operations, or resource-intensive computations.

🇺🇸|EnglishTranslated

Data Processingeyadsibai/ltk

polars

Use when "Polars", "fast dataframe", "lazy evaluation", "Arrow backend", or asking about "pandas alternative", "parallel dataframe", "large CSV processing", "ETL pipeline", "expression API"

🇺🇸|EnglishTranslated

Data Processingletta-ai/skills

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.

🇺🇸|EnglishTranslated

Data Processingdkyazzentwatwa/chatgpt-sk...

data-quality-auditor

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

🇺🇸|EnglishTranslated

1 scripts/Attention

Data Processinggemini-cli-extensions/dat...

gcp-spark

Develops and executes Spark code on Dataproc Clusters and Serverless. Reads and writes data using BigLake Iceberg catalogs, BigQuery and Spanner. Debugs execution failures. Use when: - Writing Spark ETL pipelines on GCP. - Training or running inference with ML models with spark on GCP. - Managing Spark clusters, jobs, batches, and interactive sessions. Don't use when: - Writing generic Python scripts that don't use Spark. - Performing simple SQL queries that can be done directly in BigQuery.

🇺🇸|EnglishTranslated

Data Processingarmanzeroeight/fastagent-...

etl-designer

Design ETL/ELT pipelines with proper orchestration, error handling, and monitoring. Use when building data pipelines, designing data workflows, or implementing data transformations.

🇺🇸|EnglishTranslated

Data Processingdkyazzentwatwa/chatgpt-sk...

date-normalizer

Use when asked to parse, normalize, standardize, or convert dates from various formats to consistent ISO 8601 or custom formats.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingaliyun/alibabacloud-aiops...

alibabacloud-dataworks-datastudio-develop

DataWorks data development Skill. Create, configure, validate, deploy, update, move, and rename nodes and workflows. Manage components, file resources, and UDF functions. Covers 150+ node types: Shell, SQL, Python, DI, Flink, EMR, etc. Supports scheduled and manual workflow orchestration via aliyun CLI or Python SDK. WARNING: Supports mutating operations (Move, Rename) requiring explicit user confirmation. Delete operations are NOT supported by this skill. Triggers: DataWorks, data development nodes, workflows, FlowSpec, scheduling tasks, data integration, ETL pipelines, .spec.json. Also triggers for Alibaba Cloud data development, scheduling node configuration, FlowSpec format, or DI task orchestration.

🇺🇸|EnglishTranslated

7 scripts/Attention

Data Processingeyadsibai/ltk

data-engineering

Use when "data pipelines", "ETL", "data warehousing", "data lakes", or asking about "Airflow", "Spark", "dbt", "Snowflake", "BigQuery", "data modeling"

🇺🇸|EnglishTranslated

DevOps & Cloud Servicespluginagentmarketplace/cu...

cloud-platforms

AWS, GCP, Azure data platforms, infrastructure as code, and cloud-native data solutions

🇺🇸|EnglishTranslated

1 scripts/Checked

Testing & QAmajesticlabs-dev/majestic...

test-fixture-generator

Generate synthetic test data with edge cases for ETL pipeline testing.

🇺🇸|EnglishTranslated