anomaly-detector
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAnomaly Detector
异常检测器
Audience: Data engineers and analysts detecting outliers in datasets.
Goal: Provide production-ready anomaly detection functions for various data types.
受众: 需要检测数据集中异常值的数据工程师和分析师。
目标: 为各种数据类型提供可用于生产环境的异常检测函数。
Scripts
脚本
Execute detection functions from :
scripts/anomaly_detection.pypython
from scripts.anomaly_detection import (
detect_anomalies_zscore,
detect_anomalies_iqr,
detect_anomalies_modified_zscore,
detect_anomalies_isolation_forest,
detect_anomalies_lof,
detect_anomalies_rolling,
detect_anomalies_stl,
detect_anomalies_ensemble
)执行中的检测函数:
scripts/anomaly_detection.pypython
from scripts.anomaly_detection import (
detect_anomalies_zscore,
detect_anomalies_iqr,
detect_anomalies_modified_zscore,
detect_anomalies_isolation_forest,
detect_anomalies_lof,
detect_anomalies_rolling,
detect_anomalies_stl,
detect_anomalies_ensemble
)Method Selection
方法选择
| Method | Best For | Limitations |
|---|---|---|
| Z-Score | Normal distributions | Sensitive to outliers |
| IQR | Skewed distributions | Less sensitive overall |
| Modified Z-Score | Robust detection | Slower computation |
| Isolation Forest | High-dimensional data | Requires tuning |
| LOF | Local density anomalies | Computationally expensive |
| Rolling | Time-series with trends | Window size sensitive |
| STL | Seasonal time-series | Requires known period |
| 方法 | 适用场景 | 局限性 |
|---|---|---|
| Z-Score | 正态分布数据 | 对异常值敏感 |
| IQR | 偏态分布数据 | 整体敏感性较低 |
| Modified Z-Score | 鲁棒性检测 | 计算速度较慢 |
| Isolation Forest | 高维数据 | 需要调参 |
| LOF | 局部密度异常 | 计算成本高 |
| Rolling | 带趋势的时间序列数据 | 对窗口大小敏感 |
| STL | 季节性时间序列数据 | 需要已知周期 |
Usage Examples
使用示例
Single Column Detection
单列数据检测
python
import pandas as pd
from scripts.anomaly_detection import detect_anomalies_zscore, detect_anomalies_iqr
df = pd.read_csv('data.csv')python
import pandas as pd
from scripts.anomaly_detection import detect_anomalies_zscore, detect_anomalies_iqr
df = pd.read_csv('data.csv')Z-score method (good for normal distributions)
Z-score方法(适用于正态分布数据)
anomalies_z = detect_anomalies_zscore(df['value'], threshold=3.0)
anomalies_z = detect_anomalies_zscore(df['value'], threshold=3.0)
IQR method (robust to skewed data)
IQR方法(对偏态数据鲁棒)
anomalies_iqr = detect_anomalies_iqr(df['value'], multiplier=1.5)
print(f"Z-score found {anomalies_z.sum()} anomalies")
print(f"IQR found {anomalies_iqr.sum()} anomalies")
undefinedanomalies_iqr = detect_anomalies_iqr(df['value'], multiplier=1.5)
print(f"Z-score发现 {anomalies_z.sum()} 个异常值")
print(f"IQR发现 {anomalies_iqr.sum()} 个异常值")
undefinedMulti-Column with Isolation Forest
多列数据与孤立森林
python
from scripts.anomaly_detection import detect_anomalies_isolation_forest
numeric_cols = ['revenue', 'quantity', 'price']
anomalies = detect_anomalies_isolation_forest(df, numeric_cols, contamination=0.01)
df_anomalies = df[anomalies]python
from scripts.anomaly_detection import detect_anomalies_isolation_forest
numeric_cols = ['revenue', 'quantity', 'price']
anomalies = detect_anomalies_isolation_forest(df, numeric_cols, contamination=0.01)
df_anomalies = df[anomalies]Ensemble Approach (Recommended)
集成方法(推荐)
python
from scripts.anomaly_detection import detect_anomalies_ensemble
results = detect_anomalies_ensemble(
df,
columns=['revenue', 'quantity'],
methods=['zscore', 'iqr', 'isolation_forest'],
min_agreement=2 # Flag if 2+ methods agree
)
confirmed_anomalies = df[results['is_anomaly']]python
from scripts.anomaly_detection import detect_anomalies_ensemble
results = detect_anomalies_ensemble(
df,
columns=['revenue', 'quantity'],
methods=['zscore', 'iqr', 'isolation_forest'],
min_agreement=2 # 若2种及以上方法判定则标记为异常
)
confirmed_anomalies = df[results['is_anomaly']]Time-Series Anomalies
时间序列异常检测
python
from scripts.anomaly_detection import detect_anomalies_rolling, detect_anomalies_stlpython
from scripts.anomaly_detection import detect_anomalies_rolling, detect_anomalies_stlRolling window (for trending data)
滚动窗口(适用于带趋势的数据)
anomalies = detect_anomalies_rolling(df['daily_sales'], window=7, n_std=2.0)
anomalies = detect_anomalies_rolling(df['daily_sales'], window=7, n_std=2.0)
STL decomposition (for seasonal data)
STL分解(适用于季节性数据)
anomalies = detect_anomalies_stl(df['monthly_revenue'], period=12, threshold=3.0)
undefinedanomalies = detect_anomalies_stl(df['monthly_revenue'], period=12, threshold=3.0)
undefinedDependencies
依赖项
pandas
numpy
scikit-learn # For Isolation Forest, LOF
statsmodels # For STL decompositionpandas
numpy
scikit-learn # 用于Isolation Forest、LOF
statsmodels # 用于STL分解