Search Results: etl-pipeline

Found 30 Skills

multi-source-data-merger

This skill provides guidance for merging data from multiple heterogeneous sources (JSON, CSV, Parquet, XML, etc.) into a unified dataset. Use this skill when tasks involve combining records from different file formats, applying field mappings, resolving conflicts based on priority rules, or generating merged outputs with conflict reports. Applicable to ETL pipelines, data consolidation, and record deduplication scenarios.

🇺🇸|EnglishTranslated

Data Processingdkyazzentwatwa/chatgpt-sk...

data-quality-auditor

Assess data quality with checks for missing values, duplicates, type issues, and inconsistencies. Use for data validation, ETL pipelines, or dataset documentation.

🇺🇸|EnglishTranslated

1 scripts/Attention

Data Processinggemini-cli-extensions/dat...

gcp-spark

Develops and executes Spark code on Dataproc Clusters and Serverless. Reads and writes data using BigLake Iceberg catalogs, BigQuery and Spanner. Debugs execution failures. Use when: - Writing Spark ETL pipelines on GCP. - Training or running inference with ML models with spark on GCP. - Managing Spark clusters, jobs, batches, and interactive sessions. Don't use when: - Writing generic Python scripts that don't use Spark. - Performing simple SQL queries that can be done directly in BigQuery.

🇺🇸|EnglishTranslated

Data Processingaradotso/data-skills

harvard-artifacts-data-engineering-pipeline

Build ETL pipelines and analytics dashboards using the Harvard Art Museums API with Python, SQL, and Streamlit

🇺🇸|EnglishTranslated

DevOps & Cloud Servicespluginagentmarketplace/cu...

cloud-platforms

AWS, GCP, Azure data platforms, infrastructure as code, and cloud-native data solutions

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingaliyun/alibabacloud-aiops...

alibabacloud-dataworks-datastudio-develop

DataWorks data development Skill. Create, configure, validate, deploy, update, move, and rename nodes and workflows. Manage components, file resources, and UDF functions. Covers 150+ node types: Shell, SQL, Python, DI, Flink, EMR, etc. Supports scheduled and manual workflow orchestration via aliyun CLI or Python SDK. WARNING: Supports mutating operations (Move, Rename) requiring explicit user confirmation. Delete operations are NOT supported by this skill. Triggers: DataWorks, data development nodes, workflows, FlowSpec, scheduling tasks, data integration, ETL pipelines, .spec.json. Also triggers for Alibaba Cloud data development, scheduling node configuration, FlowSpec format, or DI task orchestration.

🇺🇸|EnglishTranslated

7 scripts/Attention

Data Processingaradotso/data-skills

harvard-art-museum-data-pipeline

ETL pipeline and analytics application for Harvard Art Museums API with SQL storage and Streamlit visualization

🇺🇸|EnglishTranslated

Data Processingaradotso/data-skills

harvard-artifacts-collection-data-engineering-analytics

End-to-end data engineering and analytics application using Harvard Art Museums API with ETL pipelines, SQL analytics, and Streamlit visualization

🇺🇸|EnglishTranslated

Data Processingaradotso/data-skills

harvard-art-museums-data-engineering-pipeline

End-to-end ETL pipeline for Harvard Art Museums API with SQL analytics and Streamlit visualization

🇺🇸|EnglishTranslated

Data Processingmicrosoftdocs/agent-skill...

azure-data-factory

Expert knowledge for Azure Data Factory development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when designing ADF pipelines, mapping data flows, SHIR/SSIS IR, SAP CDC, or CI/CD with ARM/DevOps, and other Azure Data Factory related development tasks. Not for Azure Synapse Analytics (use azure-synapse-analytics), Azure Databricks (use azure-databricks), Azure Stream Analytics (use azure-stream-analytics), Azure Data Explorer (use azure-data-explorer).

🇺🇸|EnglishTranslated

Data Processingaradotso/data-skills

harvard-artifacts-data-engineering-analytics

Build ETL pipelines and analytics dashboards for Harvard Art Museums API data using Python, SQL, and Streamlit

🇺🇸|EnglishTranslated

Data Processingaradotso/data-skills

harvard-artifacts-etl-analytics

Build ETL pipelines and analytics dashboards using Harvard Art Museums API with SQL and Streamlit

🇺🇸|EnglishTranslated