pandas-best-practices

Original：🇺🇸 English

Translated

Best practices for Pandas data manipulation, analysis, and DataFrame operations in Python

11installs

Sourcemindrally/skills

Added on2026-02-07

NPX Install

npx skill4agent add mindrally/skills pandas-best-practices

SKILL.md Content

View Translation Comparison →

Pandas Best Practices

Expert guidelines for Pandas development, focusing on data manipulation, analysis, and efficient DataFrame operations.

Code Style and Structure

Write concise, technical responses with accurate Python examples
Prioritize reproducibility in data analysis workflows
Use functional programming; avoid unnecessary classes
Prefer vectorized operations over explicit loops
Use descriptive variable names reflecting data content
Follow PEP 8 style guidelines

DataFrame Creation and I/O

Use

pd.read_csv()

pd.read_excel()

pd.read_json()

with appropriate parameters

Specify
```
dtype
```
parameter to ensure correct data types on load
Use
```
parse_dates
```
for automatic datetime parsing
Set
```
index_col
```
when the data has a natural index column
Use
```
chunksize
```
for reading large files incrementally

Data Selection

Use
```
.loc[]
```
for label-based indexing
Use
```
.iloc[]
```
for integer position-based indexing
Avoid chained indexing (e.g.,
```
df['col'][0]
```
) - use
```
.loc
```
or
```
.iloc
```
instead
Use boolean indexing for conditional selection:
```
df[df['col'] > value]
```
Use
```
.query()
```
method for complex filtering conditions

Method Chaining

Prefer method chaining for data transformations when possible
Use
```
.pipe()
```
for applying custom functions in a chain
Chain operations like
```
.assign()
```
,
```
.query()
```
,
```
.groupby()
```
,
```
.agg()
```
Keep chains readable by breaking across multiple lines

Data Cleaning and Validation

Missing Data

Check for missing data with
```
.isna()
```
and
```
.info()
```
Handle missing data appropriately:
```
.fillna()
```
,
```
.dropna()
```
, or imputation
Use
```
pd.NA
```
for nullable integer and boolean types
Document decisions about missing data handling

Data Quality Checks

Implement data quality checks at the beginning of analysis
Validate data types with
```
.dtypes
```
and convert as needed
Check for duplicates with
```
.duplicated()
```
and handle appropriately
Use
```
.describe()
```
for quick statistical overview

Type Conversion

Use
```
.astype()
```
for explicit type conversion
Use
```
pd.to_datetime()
```
for date parsing
Use
```
pd.to_numeric()
```
with
```
errors='coerce'
```
for safe numeric conversion
Utilize categorical data types for low-cardinality string columns

Grouping and Aggregation

GroupBy Operations

Use
```
.groupby()
```
for efficient aggregation operations
Specify aggregation functions with
```
.agg()
```
for multiple operations
Use named aggregation for clearer output column names
Consider
```
.transform()
```
for broadcasting results back to original shape

Pivot Tables and Reshaping

Use
```
.pivot_table()
```
for multi-dimensional aggregation
Use
```
.melt()
```
to convert wide to long format
Use
```
.pivot()
```
to convert long to wide format
Use
```
.stack()
```
and
```
.unstack()
```
for hierarchical index manipulation

Performance Optimization

Memory Efficiency

Use categorical data types for low-cardinality strings
Downcast numeric types when appropriate
Use
```
pd.eval()
```
and
```
.eval()
```
for large expression evaluation

Computation Speed

Use vectorized operations instead of
```
.apply()
```
with row-wise functions
Prefer built-in aggregation functions over custom ones
Use
```
.values
```
or
```
.to_numpy()
```
for NumPy operations when faster

Avoiding Common Pitfalls

Avoid iterating with
```
.iterrows()
```
- use vectorized operations
Don't modify DataFrames while iterating
Be aware of SettingWithCopyWarning - use
```
.copy()
```
when needed
Avoid growing DataFrames row by row - collect in list and create once

Time Series Operations

Use
```
DatetimeIndex
```
for time series data
Leverage
```
.resample()
```
for time-based aggregation
Use
```
.shift()
```
and
```
.diff()
```
for lag operations
Use
```
.rolling()
```
and
```
.expanding()
```
for window calculations

Merging and Joining

Use
```
.merge()
```
for SQL-style joins
Specify
```
how
```
parameter: 'inner', 'outer', 'left', 'right'
Use
```
validate
```
parameter to check join cardinality
Use
```
.concat()
```
for stacking DataFrames

Key Conventions

Import as
```
import pandas as pd
```
Use
```
snake_case
```
for column names when possible
Document data sources and transformations
Keep notebooks reproducible with clear cell execution order