Search Results: parquet

Found 26 Skills

huggingface-datasets

Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics.

🇺🇸|EnglishTranslated

Data Processinglegout/data-platform-agen...

data-engineering-storage-remote-access-libraries-pyarrow-fs

Native Arrow filesystem integration with PyArrow. Optimized for Parquet workflows, zero-copy data transfer, predicate pushdown, and column pruning. Covers S3, GCS, HDFS with PyArrow datasets.

🇺🇸|EnglishTranslated

Data Processingtondevrel/scientific-agen...

duckdb

An analytical in-process SQL database management system. Designed for fast analytical queries (OLAP). Highly interoperable with Python's data ecosystem (Pandas, NumPy, Arrow, Polars). Supports querying files (CSV, Parquet, JSON) directly without an ingestion step. Use for complex SQL queries on Pandas/Polars data, querying large Parquet/CSV files directly, joining data from different sources, analytical pipelines, local datasets too big for Excel, intermediate data storage and feature engineering for ML.

🇺🇸|EnglishTranslated

Data Processingduckdb/duckdb-skills

convert-file

Convert any data file to another format: CSV, Parquet, JSON, Excel, GeoJSON, and more. Use when the user says "convert to parquet", "save as xlsx", "export as JSON", "make this a CSV", "turn into parquet", or any variation of format-to-format conversion for data files. Also triggers when the user wants to write Parquet, Excel, or other binary formats that Claude cannot produce natively.

🇺🇸|EnglishTranslated

Data Processingopensensenova/sensenova-s...

sn-da-large-file-analysis

万行以上 Excel 数据集的高性能分析引擎。提供 openpyxl read_only 流式读取（iter_rows 支持 10 万行以上）、Parquet 转换加速、内存优化、分块处理和大文件写入模式。**遇到以下任一情况就主动使用本 skill**：①数据行数 ≥ 10k（由 sn-da-excel-workflow 的行数评估步骤触发）；②用户出现触发词：大文件 / 大数据量 / 性能优化 / 内存不足 / OOM / 百万行 / 十万行 / 流式读取 / Parquet / 分块处理 / large file / big data / streaming read / chunked processing；③直接使用 pd.read_excel() 导致超时或内存溢出；④用户明确要求对大规模数据集进行高性能处理。仅不用于：小于 10k 行的常规 Excel 分析（使用 sn-da-excel-workflow 即可）。

🇺🇸|EnglishTranslated

Data Processingaws/agent-toolkit-for-aws

exporting-rds-to-s3

Exports Amazon RDS or Aurora database snapshots to Amazon S3 in Apache Parquet format for analytics, backup, or data migration. Handles snapshot selection or creation, IAM role setup, KMS encryption, S3 bucket preparation, export task execution, progress monitoring, and data verification. Use when exporting RDS/Aurora data to S3 for Athena, Glue, or Redshift Spectrum consumption.

🇺🇸|EnglishTranslated

Data Processingclickhouse/agent-skills

chdb-sql

In-process ClickHouse SQL engine for Python — run ClickHouse SQL queries directly on local files, remote databases, and cloud storage without a server. Use when the user wants to write SQL queries against Parquet/CSV/ JSON files, use ClickHouse table functions (mysql(), s3(), postgresql(), iceberg(), deltaLake() etc.), build stateful analytical pipelines with Session, use parametrized queries, window functions, or other advanced ClickHouse SQL features. Also use when the user explicitly mentions chdb.query(), ClickHouse SQL syntax, or wants cross-source SQL joins. Do NOT use for pandas-style DataFrame operations — use chdb-datastore instead.

🇺🇸|EnglishTranslated

157

1 scripts/Checked

Data Processingfusionet24/aiskills

data-profiler

Profile datasets to understand schema, quality, and characteristics. Use when analyzing data files (CSV, JSON, Parquet), discovering dataset properties, assessing data quality, or when user mentions data profiling, schema detection, data analysis, or quality metrics. Provides basic and intermediate profiling including distributions, uniqueness, and pattern detection.

🇺🇸|EnglishTranslated

Data Processingcatalyst-cooperative/agen...

datapackage

Explore and query any dataset annotated with a Frictionless Data Package descriptor (datapackage.json). Use this skill whenever a user wants to discover what tables or resources a dataset contains, look up column names and descriptions, surface usage warnings embedded in metadata, or understand how to load data from Parquet files, DuckDB or SQLite databases, or CSV files described by a datapackage.json. Also use when the user has a datapackage.json and wants to know what's in it, how to query it efficiently, or how to connect its metadata to actual data files. Pairs well with dataset-specific skills (like `pudl`) that layer domain knowledge on top.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tao-route-visual-changenet-samples

Routes the weakest VCN samples (output of `tao-analyze-gaps-visual-changenet`) into per-augmentation-module subsets — one parquet for k-NN mining, one for AnomalyGen (Cosmos SDG) — based on each module's label eligibility. Use as the immediate next step after DEFT gap analysis in a VCN AOI SDA iteration.

🇺🇸|EnglishTranslated

6 scripts/Attention

Data Processingarthur0824hao/skills

skill-system-eda

Exploratory Data Analysis skill for CSV and parquet datasets with deterministic profiling, drift/anomaly scans, contract generation and validation, and optional memory writeback into skill-system-memory. The implementation is Polars-first (lazy scan for large files and early `--sample` head), includes high-cardinality guards for profile/importance/contract flows, and supports categorical correlation with Cramer's V. Use when building or reviewing tabular fraud/risk/data-quality workflows, profiling new datasets, checking leakage or drift, or saving/validating data contracts.

🇺🇸|EnglishTranslated

2 scripts/Attention

Data Processingbrojonat/llmsrules

tabular-eda

Profile a new tabular dataset before modeling. Find target leakage, missing data patterns, high-cardinality categoricals, near-constant features, redundant pairs, and non-linear relationships that Pearson correlation misses. Use whenever the user hands you a CSV or parquet and asks "what should I do with this?" Always run this skill before training any model on data you haven't seen before.

🇺🇸|EnglishTranslated

1 scripts/Checked