great-expectations

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Great Expectations

Great Expectations

Audience: Data engineers building validated data pipelines.
Goal: Provide GX patterns for expectation-based validation and monitoring.
受众: 构建经过验证的数据管道的数据工程师。
目标: 提供基于期望的验证与监控的GX模式。

Scripts

脚本

Execute GX functions from
scripts/expectations.py
:
python
from scripts.expectations import (
    get_pandas_context,
    add_dataframe_asset,
    create_basic_suite,
    run_validation
)
scripts/expectations.py
执行GX函数:
python
from scripts.expectations import (
    get_pandas_context,
    add_dataframe_asset,
    create_basic_suite,
    run_validation
)

Usage Examples

使用示例

Quick Setup

快速设置

python
from scripts.expectations import get_pandas_context, add_dataframe_asset

context, datasource = get_pandas_context("my_datasource")
batch_request = add_dataframe_asset(datasource, "users", df)
python
from scripts.expectations import get_pandas_context, add_dataframe_asset

context, datasource = get_pandas_context("my_datasource")
batch_request = add_dataframe_asset(datasource, "users", df)

Create Expectation Suite

创建期望套件

python
from scripts.expectations import create_basic_suite

columns_config = {
    'user_id': {'not_null': True, 'unique': True, 'type': 'int'},
    'age': {'min': 0, 'max': 150},
    'status': {'values': ['active', 'inactive', 'pending']},
    'email': {'regex': r'^[\w\.-]+@[\w\.-]+\.\w+$'}
}

suite = create_basic_suite(context, "user_suite", columns_config)
python
from scripts.expectations import create_basic_suite

columns_config = {
    'user_id': {'not_null': True, 'unique': True, 'type': 'int'},
    'age': {'min': 0, 'max': 150},
    'status': {'values': ['active', 'inactive', 'pending']},
    'email': {'regex': r'^[\w\.-]+@[\w\.-]+\.\w+$'}
}

suite = create_basic_suite(context, "user_suite", columns_config)

Run Validation

运行验证

python
from scripts.expectations import run_validation

results = run_validation(
    context,
    checkpoint_name="user_checkpoint",
    batch_request=batch_request,
    suite_name="user_suite"
)

if results['success']:
    print("All expectations passed!")
else:
    for failure in results['failures']:
        print(f"Failed: {failure['expectation']} on {failure['column']}")
python
from scripts.expectations import run_validation

results = run_validation(
    context,
    checkpoint_name="user_checkpoint",
    batch_request=batch_request,
    suite_name="user_suite"
)

if results['success']:
    print("All expectations passed!")
else:
    for failure in results['failures']:
        print(f"Failed: {failure['expectation']} on {failure['column']}")

Common Expectations Reference

常见期望参考

CategoryExpectationDescription
Table
ExpectTableRowCountToBeBetween
Row count range
Existence
ExpectColumnToExist
Column must exist
Nulls
ExpectColumnValuesToNotBeNull
No null values
Range
ExpectColumnValuesToBeBetween
Value bounds
Set
ExpectColumnValuesToBeInSet
Allowed values
Pattern
ExpectColumnValuesToMatchRegex
Regex match
Unique
ExpectColumnValuesToBeUnique
No duplicates
类别期望说明
表格
ExpectTableRowCountToBeBetween
行数范围
存在性
ExpectColumnToExist
列必须存在
空值
ExpectColumnValuesToNotBeNull
无空值
范围
ExpectColumnValuesToBeBetween
值的边界
集合
ExpectColumnValuesToBeInSet
允许的值
模式
ExpectColumnValuesToMatchRegex
正则匹配
唯一性
ExpectColumnValuesToBeUnique
无重复值

Data Docs

数据文档

python
undefined
python
undefined

Build and open HTML reports

构建并打开HTML报告

context.build_data_docs() context.open_data_docs()
undefined
context.build_data_docs() context.open_data_docs()
undefined

Directory Structure

目录结构

great_expectations/
├── great_expectations.yml     # Config
├── expectations/              # Expectation suites (JSON)
├── checkpoints/               # Checkpoint definitions
├── plugins/                   # Custom expectations
└── uncommitted/
    ├── data_docs/            # Generated HTML docs
    └── validations/          # Validation results
great_expectations/
├── great_expectations.yml     # 配置文件
├── expectations/              # 期望套件(JSON格式)
├── checkpoints/               # 检查点定义
├── plugins/                   # 自定义期望
└── uncommitted/
    ├── data_docs/            # 生成的HTML文档
    └── validations/          # 验证结果

When to Use Great Expectations

何时使用Great Expectations

Use CaseGXAlternative
Pipeline monitoring-
Data warehouse validation-
Automated data docs-
Simple DataFrame checks-Pandera
Record-level API validation-Pydantic
使用场景GX替代方案
管道监控-
数据仓库验证-
自动化数据文档-
简单DataFrame检查-Pandera
记录级API验证-Pydantic

Dependencies

依赖项

great_expectations>=0.18
pandas
great_expectations>=0.18
pandas