Loading...
Loading...
Validate and audit CSV data for quality, consistency, and completeness. Use when you need to check CSV files for data issues, missing values, or format inconsistencies.
npx skill4agent add aig787/agpm csv-data-auditorimport pandas as pd
import numpy as np
from datetime import datetime
def audit_csv(file_path, delimiter=','):
"""Perform comprehensive CSV audit"""
# Load CSV
try:
df = pd.read_csv(file_path, delimiter=delimiter)
except Exception as e:
return {"error": f"Failed to load CSV: {str(e)}"}
audit_report = {
"file_info": {
"rows": len(df),
"columns": len(df.columns),
"headers": list(df.columns)
},
"issues": []
}
# Check for missing values
missing_counts = df.isnull().sum()
for col, count in missing_counts.items():
if count > 0:
percentage = (count / len(df)) * 100
audit_report["issues"].append({
"type": "missing_values",
"column": col,
"count": count,
"percentage": round(percentage, 2)
})
# Check for duplicates
duplicate_rows = df.duplicated().sum()
if duplicate_rows > 0:
audit_report["issues"].append({
"type": "duplicates",
"count": duplicate_rows
})
# Data type validation
for col in df.columns:
# Check for mixed types in object columns
if df[col].dtype == 'object':
sample_values = df[col].dropna().head(10)
# Add specific type checks based on column name patterns
return audit_report# CSV Audit Report
## File Information
- File: data.csv
- Size: 15.2 MB
- Rows: 50,000
- Columns: 12
- Headers: id, name, email, age, join_date, salary, department, ...
## Issues Found
### Critical Issues
1. **Missing Values**:
- `email`: 2,500 missing (5.0%)
- `salary`: 150 missing (0.3%)
### Warnings
1. **Inconsistent Date Format**:
- `join_date`: Mix of ISO and US formats detected
- Examples: 2023-01-15, 01/15/2023, 15-Jan-2023
2. **Potential Outliers**:
- `age`: Values 0 and 150 detected
- `salary`: Extremely high values > $1M
### Recommendations
1. Clean up email field - contact data source
2. Standardize date format to ISO 8601
3. Validate age and salary ranges
4. Remove or investigate duplicate rows
## Summary
- Overall Quality: Good
- Issues to Fix: 3 critical, 5 warnings
- Estimated Fix Time: 2-3 hourschardet