Loading...
Loading...
Data validation using Great Expectations. Expectation suites, checkpoints, and data docs for pipeline monitoring.
npx skill4agent add majesticlabs-dev/majestic-marketplace great-expectationsscripts/expectations.pyfrom scripts.expectations import (
get_pandas_context,
add_dataframe_asset,
create_basic_suite,
run_validation
)from scripts.expectations import get_pandas_context, add_dataframe_asset
context, datasource = get_pandas_context("my_datasource")
batch_request = add_dataframe_asset(datasource, "users", df)from scripts.expectations import create_basic_suite
columns_config = {
'user_id': {'not_null': True, 'unique': True, 'type': 'int'},
'age': {'min': 0, 'max': 150},
'status': {'values': ['active', 'inactive', 'pending']},
'email': {'regex': r'^[\w\.-]+@[\w\.-]+\.\w+$'}
}
suite = create_basic_suite(context, "user_suite", columns_config)from scripts.expectations import run_validation
results = run_validation(
context,
checkpoint_name="user_checkpoint",
batch_request=batch_request,
suite_name="user_suite"
)
if results['success']:
print("All expectations passed!")
else:
for failure in results['failures']:
print(f"Failed: {failure['expectation']} on {failure['column']}")| Category | Expectation | Description |
|---|---|---|
| Table | | Row count range |
| Existence | | Column must exist |
| Nulls | | No null values |
| Range | | Value bounds |
| Set | | Allowed values |
| Pattern | | Regex match |
| Unique | | No duplicates |
# Build and open HTML reports
context.build_data_docs()
context.open_data_docs()great_expectations/
├── great_expectations.yml # Config
├── expectations/ # Expectation suites (JSON)
├── checkpoints/ # Checkpoint definitions
├── plugins/ # Custom expectations
└── uncommitted/
├── data_docs/ # Generated HTML docs
└── validations/ # Validation results| Use Case | GX | Alternative |
|---|---|---|
| Pipeline monitoring | ✓ | - |
| Data warehouse validation | ✓ | - |
| Automated data docs | ✓ | - |
| Simple DataFrame checks | - | Pandera |
| Record-level API validation | - | Pydantic |
great_expectations>=0.18
pandas