Search Results: lake

Found 52 Skills

Data Processingaws/agent-toolkit-for-aws

ingesting-into-data-lake

Import data into the AWS data lake from S3 files, local uploads, JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS, Aurora), Amazon Redshift, Snowflake, BigQuery, DynamoDB, or existing Glue catalog tables (migration). Default target is S3 Tables; standard Iceberg on a general purpose bucket is supported where S3 Tables is not adopted. Handles one-time loads, recurring pipelines, migrations. Triggers on: import data, load data, ingest, sync database, migrate table, move data to AWS, set up pipeline, ETL, pull from Snowflake, query BigQuery into S3, export DynamoDB, CTAS, convert to Iceberg. Do NOT use for setting up or troubleshooting Glue connections (use connecting-to-data-source), creating empty tables (use creating-data-lake-table), running queries (use querying-data-lake), finding tables by fuzzy name (use finding-data-lake-assets), catalog audit (use exploring-data-catalog), or SaaS platforms like Salesforce, ServiceNow, SAP, MongoDB, Kafka.

🇺🇸|EnglishTranslated

Data Processingmotherduckdb/agent-skills

motherduck-migrate-to-motherduck

Plan a migration onto MotherDuck. Use when moving from Snowflake, Redshift, PostgreSQL, dbt-heavy stacks, or lakehouse tooling and the key decisions are target pattern, cutover slices, validation, rollback, and native-versus-DuckLake posture.

🇺🇸|EnglishTranslated

2 scripts/Checked

Platform Servicespubnub/skills

pubnub-history

Retrieves historical PubNub messages via Message Persistence (Storage & Playback). Covers timetoken-based pagination, per-channel ordering guarantees, offline catch-up flows, retention configuration, and the "catch-up tool not a data lake" principle. Use when fetching past messages, paginating with timetokens, building offline-resume UI, retrieving messages with actions, or configuring retention.

🇺🇸|EnglishTranslated

Data Processingeyadsibai/ltk

data-engineering

Use when "data pipelines", "ETL", "data warehousing", "data lakes", or asking about "Airflow", "Spark", "dbt", "Snowflake", "BigQuery", "data modeling"

🇺🇸|EnglishTranslated

Tools & Utilitiesmvanhorn/printing-press-l...

pp-espn

Use this skill whenever the user asks about live sports scores, standings, team stats, game summaries (with box score, leaders, scoring plays, odds, and win probability), NFL / NBA / MLB / NHL / NCAA / MLS / EPL / WNBA games, team schedules, polls, or rankings. ESPN sports CLI with live scores across 10 leagues, offline search, head-to-head comparisons, and rich per-game summary payloads. No API key required. Triggers on natural phrasings like 'what's the score of the Lakers game', 'Patriots schedule this week', 'NFL standings', 'box score for tonight's Mavs game', 'Chiefs vs Eagles head to head', 'who's on top of the AP poll'.

🇺🇸|EnglishTranslated

Data Processingjosiahsiegel/claude-plugi...

powerbi-core

Core Power BI data modeling, source connectivity, and platform fundamentals. PROACTIVELY activate for: (1) Power BI data modeling and star-schema design, (2) relationships (active/inactive, bidirectional, USERELATIONSHIP), (3) data-source selection (DirectQuery vs Import vs Direct Lake vs composite), (4) incremental refresh setup, (5) gateway configuration (on-prem and VNet gateways), (6) streaming datasets and push-data scenarios, (7) Dataflow Gen2 basics, (8) Power BI common gotchas and pitfalls (bidirectional filtering, AutoExist, blank-row), (9) workspace identity and OAuth2 / service-principal auth, (10) semantic model architecture review. Provides: star-schema templates, mode-selection matrix, incremental refresh recipe, gateway setup steps, and a common-gotchas reference.

🇺🇸|EnglishTranslated

Data Processingspiceai/skills

spice-connect-data

Connect Spice to data sources and query across them with federated SQL. Use when connecting to databases (Postgres, MySQL, DynamoDB), data lakes (S3, Delta Lake, Iceberg), warehouses (Snowflake, Databricks), files, APIs, or catalogs; configuring datasets; creating views; writing data; or setting up cross-source queries.

🇺🇸|EnglishTranslated

Data Processingdatabricks/databricks-age...

databricks-jobs

Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation.

🇺🇸|EnglishTranslated

Data Processingclickhouse/agent-skills

chdb-datastore

Drop-in pandas replacement with ClickHouse performance. Use `import chdb.datastore as pd` (or `from datastore import DataStore`) and write standard pandas code — same API, 10-100x faster on large datasets. Supports 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats (Parquet, CSV, JSON, Arrow, ORC, etc.) with cross-source joins. Use this skill when the user wants to analyze data with pandas-style syntax, speed up slow pandas code, query remote databases or cloud storage as DataFrames, or join data across different sources — even if they don't explicitly mention chdb or DataStore. Do NOT use for raw SQL queries, ClickHouse server administration, or non-Python languages.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingaws/agent-toolkit-for-aws

finding-data-lake-assets

Resolve data lake and lakehouse asset references across Glue Data Catalog, S3, S3 Tables, and Redshift. Triggers on: find the table, where is our data, which table has, locate dataset, find data for, search catalog, what tables match, Redshift table, lakehouse table, data lake table, warehouse table, reverse lookup S3 path. Do NOT use for: full catalog audits (use exploring-data-catalog), running queries (use querying-data-lake), creating tables (use creating-data-lake-table).

🇺🇸|EnglishTranslated

Data Processingaws/agent-toolkit-for-aws

creating-data-lake-table

Create managed Iceberg tables using Amazon S3 Tables (s3tables API namespace) with automatic compaction and snapshot management. Sets up table bucket, namespace, table, schema, Glue catalog registration, partitioning, IAM access control. Triggers on: create table, data lake table, analytics table, structured data storage, S3 Tables, Iceberg, Athena table, partitioning strategy, access permissions. Do NOT use for: importing files (use ingesting-into-data-lake), vector storage (use storing-and-querying-vectors), querying existing tables (use querying-data-lake), or locating existing table (use finding-data-lake-assets).

🇺🇸|EnglishTranslated

Data Processingaliyun/alibabacloud-aiops...

alibabacloud-dlf-manage

Query Catalog, database, and table metadata resources in Alibaba Cloud Data Lake Formation (DLF). Provides read-only queries via the DLF OpenAPI Python SDK, supporting listing and viewing Catalogs, databases, tables with their detailed information and Schema definitions. Use cases: "list available Catalogs", "list databases", "view table schema", "search tables", "search tables by name", "fuzzy search", "view DLF metadata", "what databases are in the data lake", "what columns does a table have", "find tables whose name contains xxx". This Skill only contains read-only operations — no create, modify, or delete operations.

🇺🇸|EnglishTranslated

1 scripts/Attention