great-expectations
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGreat Expectations
Great Expectations
Audience: Data engineers building validated data pipelines.
Goal: Provide GX patterns for expectation-based validation and monitoring.
受众: 构建经过验证的数据管道的数据工程师。
目标: 提供基于期望的验证与监控的GX模式。
Scripts
脚本
Execute GX functions from :
scripts/expectations.pypython
from scripts.expectations import (
get_pandas_context,
add_dataframe_asset,
create_basic_suite,
run_validation
)从执行GX函数:
scripts/expectations.pypython
from scripts.expectations import (
get_pandas_context,
add_dataframe_asset,
create_basic_suite,
run_validation
)Usage Examples
使用示例
Quick Setup
快速设置
python
from scripts.expectations import get_pandas_context, add_dataframe_asset
context, datasource = get_pandas_context("my_datasource")
batch_request = add_dataframe_asset(datasource, "users", df)python
from scripts.expectations import get_pandas_context, add_dataframe_asset
context, datasource = get_pandas_context("my_datasource")
batch_request = add_dataframe_asset(datasource, "users", df)Create Expectation Suite
创建期望套件
python
from scripts.expectations import create_basic_suite
columns_config = {
'user_id': {'not_null': True, 'unique': True, 'type': 'int'},
'age': {'min': 0, 'max': 150},
'status': {'values': ['active', 'inactive', 'pending']},
'email': {'regex': r'^[\w\.-]+@[\w\.-]+\.\w+$'}
}
suite = create_basic_suite(context, "user_suite", columns_config)python
from scripts.expectations import create_basic_suite
columns_config = {
'user_id': {'not_null': True, 'unique': True, 'type': 'int'},
'age': {'min': 0, 'max': 150},
'status': {'values': ['active', 'inactive', 'pending']},
'email': {'regex': r'^[\w\.-]+@[\w\.-]+\.\w+$'}
}
suite = create_basic_suite(context, "user_suite", columns_config)Run Validation
运行验证
python
from scripts.expectations import run_validation
results = run_validation(
context,
checkpoint_name="user_checkpoint",
batch_request=batch_request,
suite_name="user_suite"
)
if results['success']:
print("All expectations passed!")
else:
for failure in results['failures']:
print(f"Failed: {failure['expectation']} on {failure['column']}")python
from scripts.expectations import run_validation
results = run_validation(
context,
checkpoint_name="user_checkpoint",
batch_request=batch_request,
suite_name="user_suite"
)
if results['success']:
print("All expectations passed!")
else:
for failure in results['failures']:
print(f"Failed: {failure['expectation']} on {failure['column']}")Common Expectations Reference
常见期望参考
| Category | Expectation | Description |
|---|---|---|
| Table | | Row count range |
| Existence | | Column must exist |
| Nulls | | No null values |
| Range | | Value bounds |
| Set | | Allowed values |
| Pattern | | Regex match |
| Unique | | No duplicates |
| 类别 | 期望 | 说明 |
|---|---|---|
| 表格 | | 行数范围 |
| 存在性 | | 列必须存在 |
| 空值 | | 无空值 |
| 范围 | | 值的边界 |
| 集合 | | 允许的值 |
| 模式 | | 正则匹配 |
| 唯一性 | | 无重复值 |
Data Docs
数据文档
python
undefinedpython
undefinedBuild and open HTML reports
构建并打开HTML报告
context.build_data_docs()
context.open_data_docs()
undefinedcontext.build_data_docs()
context.open_data_docs()
undefinedDirectory Structure
目录结构
great_expectations/
├── great_expectations.yml # Config
├── expectations/ # Expectation suites (JSON)
├── checkpoints/ # Checkpoint definitions
├── plugins/ # Custom expectations
└── uncommitted/
├── data_docs/ # Generated HTML docs
└── validations/ # Validation resultsgreat_expectations/
├── great_expectations.yml # 配置文件
├── expectations/ # 期望套件(JSON格式)
├── checkpoints/ # 检查点定义
├── plugins/ # 自定义期望
└── uncommitted/
├── data_docs/ # 生成的HTML文档
└── validations/ # 验证结果When to Use Great Expectations
何时使用Great Expectations
| Use Case | GX | Alternative |
|---|---|---|
| Pipeline monitoring | ✓ | - |
| Data warehouse validation | ✓ | - |
| Automated data docs | ✓ | - |
| Simple DataFrame checks | - | Pandera |
| Record-level API validation | - | Pydantic |
| 使用场景 | GX | 替代方案 |
|---|---|---|
| 管道监控 | ✓ | - |
| 数据仓库验证 | ✓ | - |
| 自动化数据文档 | ✓ | - |
| 简单DataFrame检查 | - | Pandera |
| 记录级API验证 | - | Pydantic |
Dependencies
依赖项
great_expectations>=0.18
pandasgreat_expectations>=0.18
pandas