Loading...
Loading...
Found 85 Skills
Use when "Polars", "fast dataframe", "lazy evaluation", "Arrow backend", or asking about "pandas alternative", "parallel dataframe", "large CSV processing", "ETL pipeline", "expression API"
Fast in-process analytical database for SQL queries on DataFrames, CSV, Parquet, JSON files, and more. Use when user wants to perform SQL analytics on data files or Python DataFrames (pandas, Polars), run complex aggregations, joins, or window functions, or query external data sources without loading into memory. Best for analytical workloads, OLAP queries, and data exploration.
Smart Excel/CSV file parsing with intelligent routing based on file complexity analysis. Analyzes file structure (merged cells, row count, table layout) using lightweight metadata scanning, then recommends optimal processing strategy - either high-speed Pandas mode for standard tables or semantic HTML mode for complex reports. Use when processing Excel/CSV files with unknown or varying structure where optimization between speed and accuracy is needed.
Analyzes CSV files, generates summary stats, and plots quick visualizations using Python and pandas.
Use when tasks involve creating, editing, analyzing, or formatting spreadsheets (`.xlsx`, `.csv`, `.tsv`) using Python (`openpyxl`, `pandas`), especially when formulas, references, and formatting need to be preserved and verified.
Python data analysis with pandas, numpy, and analytics libraries
Data analysis expert for statistics, visualization, pandas, and exploration
Design ETL workflows with data validation using tools like Pandas, Dask, or PySpark. Use when building robust data processing systems in Python.
Patterns for efficient ML data pipelines using Polars, Arrow, and ClickHouse. TRIGGERS - data pipeline, polars vs pandas, arrow format, clickhouse ml, efficient loading, zero-copy, memory optimization.
A Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Great for exploring relationships between variables and visualizing distributions. Use for statistical data visualization, exploratory data analysis (EDA), relationship plots, distribution plots, categorical comparisons, regression visualization, heatmaps, cluster maps, and creating publication-quality statistical graphics from Pandas DataFrames.
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.