Loading...
Loading...
Found 5 Skills
Native Arrow filesystem integration with PyArrow. Optimized for Parquet workflows, zero-copy data transfer, predicate pushdown, and column pruning. Covers S3, GCS, HDFS with PyArrow datasets.
Delta Lake integration with cloud storage (S3, GCS, Azure). Covers storage_options, PyArrow filesystem, time travel, and partitioned writes.
Reading and writing data with Pandas from/to cloud storage (S3, GCS, Azure) using fsspec and PyArrow filesystems.
Refactor Pandas code to improve maintainability, readability, and performance. Identifies and fixes loops/.iterrows() that should be vectorized, overuse of .apply() where vectorized alternatives exist, chained indexing patterns, inplace=True usage, inefficient dtypes, missing method chaining opportunities, complex filters, merge operations without validation, and SettingWithCopyWarning patterns. Applies Pandas 2.0+ features including PyArrow backend, Copy-on-Write, vectorized operations, method chaining, .query()/.eval(), optimized dtypes, and pipeline patterns.
Pandas data manipulation with DataFrames. Use for data analysis.