unity-catalog-governance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Unity Catalog Governance Skill

Unity Catalog治理技能

Overview

概述

Unity Catalog provides centralized governance for data and AI assets across Databricks workspaces. This skill covers governance patterns, security models, access control, and compliance frameworks.
Unity Catalog为Databricks工作区中的数据和AI资产提供集中式治理。本技能涵盖治理模式、安全模型、访问控制以及合规框架。

When to Use This Skill

何时使用本技能

Use this skill when you need to:
  • Design Unity Catalog security architecture
  • Implement RBAC/ABAC access controls
  • Configure row-level security and column masking
  • Set up data classification and tagging
  • Implement compliance requirements (GDPR, HIPAA, SOC2)
  • Audit permissions and access patterns
当你需要以下操作时,可使用本技能:
  • 设计Unity Catalog安全架构
  • 实现RBAC/ABAC访问控制
  • 配置行级安全与列掩码
  • 建立数据分类与标签体系
  • 满足合规要求(GDPR、HIPAA、SOC2)
  • 审计权限与访问模式

Core Concepts

核心概念

Three-Level Namespace

三级命名空间

catalog.schema.table
production.customers.profiles
catalog.schema.table
production.customers.profiles

Securable Objects

可管控对象

  • Metastore: Top-level container
  • Catalog: Database container
  • Schema: Table container
  • Table/View: Data objects
  • Volume: File storage
  • Function: Callable routines
  • External Location: External storage
  • Storage Credential: Cloud credentials
  • Metastore:顶层容器
  • Catalog:数据库容器
  • Schema:表容器
  • Table/View:数据对象
  • Volume:文件存储
  • Function:可调用例程
  • External Location:外部存储
  • Storage Credential:云凭证

Privilege Model

权限模型

sql
-- Catalog privileges
USE CATALOG, CREATE SCHEMA, USE SCHEMA, CREATE TABLE,
CREATE FUNCTION, CREATE VOLUME, ALL PRIVILEGES, OWNERSHIP

-- Schema privileges
USE SCHEMA, CREATE TABLE, CREATE FUNCTION, CREATE VOLUME,
SELECT, MODIFY, READ FILES, WRITE FILES

-- Table privileges
SELECT, MODIFY, READ METADATA

-- Function privileges
EXECUTE

-- Volume privileges
READ FILES, WRITE FILES
sql
-- Catalog privileges
USE CATALOG, CREATE SCHEMA, USE SCHEMA, CREATE TABLE,
CREATE FUNCTION, CREATE VOLUME, ALL PRIVILEGES, OWNERSHIP

-- Schema privileges
USE SCHEMA, CREATE TABLE, CREATE FUNCTION, CREATE VOLUME,
SELECT, MODIFY, READ FILES, WRITE FILES

-- Table privileges
SELECT, MODIFY, READ METADATA

-- Function privileges
EXECUTE

-- Volume privileges
READ FILES, WRITE FILES

Governance Patterns

治理模式

Pattern 1: Environment Isolation

模式1:环境隔离

sql
-- Separate catalogs per environment
CREATE CATALOG IF NOT EXISTS dev;
CREATE CATALOG IF NOT EXISTS staging;
CREATE CATALOG IF NOT EXISTS production;

-- Grant appropriate access
GRANT USE CATALOG, CREATE SCHEMA ON CATALOG dev TO `developers`;
GRANT USE CATALOG, USE SCHEMA ON CATALOG staging TO `testers`;
GRANT USE CATALOG, USE SCHEMA, SELECT ON CATALOG production TO `analysts`;
sql
-- Separate catalogs per environment
CREATE CATALOG IF NOT EXISTS dev;
CREATE CATALOG IF NOT EXISTS staging;
CREATE CATALOG IF NOT EXISTS production;

-- Grant appropriate access
GRANT USE CATALOG, CREATE SCHEMA ON CATALOG dev TO `developers`;
GRANT USE CATALOG, USE SCHEMA ON CATALOG staging TO `testers`;
GRANT USE CATALOG, USE SCHEMA, SELECT ON CATALOG production TO `analysts`;

Pattern 2: Data Domain Organization

模式2:数据域组织

sql
-- Organize by business domain
CREATE CATALOG business_domains;
CREATE SCHEMA business_domains.customer_domain;
CREATE SCHEMA business_domains.financial_domain;
CREATE SCHEMA business_domains.product_domain;

-- Assign domain ownership
ALTER SCHEMA business_domains.customer_domain
  SET OWNER TO `customer_data_team`;

-- Domain-specific access
GRANT SELECT ON SCHEMA business_domains.customer_domain TO `customer_analytics`;
GRANT MODIFY ON SCHEMA business_domains.customer_domain TO `customer_engineering`;
sql
-- Organize by business domain
CREATE CATALOG business_domains;
CREATE SCHEMA business_domains.customer_domain;
CREATE SCHEMA business_domains.financial_domain;
CREATE SCHEMA business_domains.product_domain;

-- Assign domain ownership
ALTER SCHEMA business_domains.customer_domain
  SET OWNER TO `customer_data_team`;

-- Domain-specific access
GRANT SELECT ON SCHEMA business_domains.customer_domain TO `customer_analytics`;
GRANT MODIFY ON SCHEMA business_domains.customer_domain TO `customer_engineering`;

Pattern 3: Medallion with Security

模式3:带安全管控的Medallion架构

sql
-- Bronze: Raw data - limited access
CREATE SCHEMA production.bronze;
GRANT USE SCHEMA, SELECT, MODIFY ON SCHEMA production.bronze TO `data_engineers`;

-- Silver: Refined data - broader access
CREATE SCHEMA production.silver;
GRANT USE SCHEMA, SELECT ON SCHEMA production.silver TO `data_analysts`;
GRANT MODIFY ON SCHEMA production.silver TO `data_engineers`;

-- Gold: Business data - wide access
CREATE SCHEMA production.gold;
GRANT USE SCHEMA, SELECT ON SCHEMA production.gold TO `business_users`;
GRANT MODIFY ON SCHEMA production.gold TO `analytics_engineers`;
sql
-- Bronze: Raw data - limited access
CREATE SCHEMA production.bronze;
GRANT USE SCHEMA, SELECT, MODIFY ON SCHEMA production.bronze TO `data_engineers`;

-- Silver: Refined data - broader access
CREATE SCHEMA production.silver;
GRANT USE SCHEMA, SELECT ON SCHEMA production.silver TO `data_analysts`;
GRANT MODIFY ON SCHEMA production.silver TO `data_engineers`;

-- Gold: Business data - wide access
CREATE SCHEMA production.gold;
GRANT USE SCHEMA, SELECT ON SCHEMA production.gold TO `business_users`;
GRANT MODIFY ON SCHEMA production.gold TO `analytics_engineers`;

Pattern 4: Row-Level Security

模式4:行级安全

sql
-- Regional data access control
CREATE FUNCTION governance.regional_filter(user_region STRING, data_region STRING)
RETURNS BOOLEAN
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('global_access') THEN TRUE
  WHEN user_region = data_region THEN TRUE
  ELSE FALSE
END;

-- Apply row filter
ALTER TABLE production.customers.orders
SET ROW FILTER governance.regional_filter(current_user_region(), region);
sql
-- Regional data access control
CREATE FUNCTION governance.regional_filter(user_region STRING, data_region STRING)
RETURNS BOOLEAN
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('global_access') THEN TRUE
  WHEN user_region = data_region THEN TRUE
  ELSE FALSE
END;

-- Apply row filter
ALTER TABLE production.customers.orders
SET ROW FILTER governance.regional_filter(current_user_region(), region);

Pattern 5: Column Masking

模式5:列掩码

sql
-- Create masking function
CREATE FUNCTION governance.mask_email(email STRING)
RETURNS STRING
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('pii_admin') THEN email
  WHEN IS_ACCOUNT_GROUP_MEMBER('pii_viewer') THEN
    CONCAT(SUBSTRING(email, 1, 3), '***@', SPLIT(email, '@')[1])
  ELSE 'REDACTED'
END;

-- Apply to column
ALTER TABLE production.customers.profiles
ALTER COLUMN email SET MASK governance.mask_email;
sql
-- Create masking function
CREATE FUNCTION governance.mask_email(email STRING)
RETURNS STRING
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('pii_admin') THEN email
  WHEN IS_ACCOUNT_GROUP_MEMBER('pii_viewer') THEN
    CONCAT(SUBSTRING(email, 1, 3), '***@', SPLIT(email, '@')[1])
  ELSE 'REDACTED'
END;

-- Apply to column
ALTER TABLE production.customers.profiles
ALTER COLUMN email SET MASK governance.mask_email;

Pattern 6: Data Classification

模式6:数据分类

sql
-- Create classification tags
CREATE TAG IF NOT EXISTS governance.sensitivity
  VALUES ('PUBLIC', 'INTERNAL', 'CONFIDENTIAL', 'RESTRICTED');

CREATE TAG IF NOT EXISTS governance.data_domain
  VALUES ('CUSTOMER', 'FINANCIAL', 'EMPLOYEE', 'PRODUCT');

-- Apply classification
ALTER TABLE production.customers.profiles
SET TAGS (
  'governance.sensitivity' = 'RESTRICTED',
  'governance.data_domain' = 'CUSTOMER'
);

ALTER TABLE production.customers.profiles
ALTER COLUMN ssn SET TAGS ('governance.sensitivity' = 'RESTRICTED');
sql
-- Create classification tags
CREATE TAG IF NOT EXISTS governance.sensitivity
  VALUES ('PUBLIC', 'INTERNAL', 'CONFIDENTIAL', 'RESTRICTED');

CREATE TAG IF NOT EXISTS governance.data_domain
  VALUES ('CUSTOMER', 'FINANCIAL', 'EMPLOYEE', 'PRODUCT');

-- Apply classification
ALTER TABLE production.customers.profiles
SET TAGS (
  'governance.sensitivity' = 'RESTRICTED',
  'governance.data_domain' = 'CUSTOMER'
);

ALTER TABLE production.customers.profiles
ALTER COLUMN ssn SET TAGS ('governance.sensitivity' = 'RESTRICTED');

Security Best Practices

安全最佳实践

1. Least Privilege Access

1. 最小权限访问

sql
-- ❌ BAD: Overly permissive
GRANT ALL PRIVILEGES ON CATALOG production TO `all_users`;

-- ✅ GOOD: Minimal required permissions
GRANT USE CATALOG ON CATALOG production TO `analysts`;
GRANT USE SCHEMA ON SCHEMA production.gold TO `analysts`;
GRANT SELECT ON SCHEMA production.gold TO `analysts`;
sql
-- ❌ BAD: Overly permissive
GRANT ALL PRIVILEGES ON CATALOG production TO `all_users`;

-- ✅ GOOD: Minimal required permissions
GRANT USE CATALOG ON CATALOG production TO `analysts`;
GRANT USE SCHEMA ON SCHEMA production.gold TO `analysts`;
GRANT SELECT ON SCHEMA production.gold TO `analysts`;

2. Separation of Duties

2. 职责分离

python
undefined
python
undefined

Define distinct roles

Define distinct roles

ROLES = { "data_consumer": ["USE CATALOG", "USE SCHEMA", "SELECT"], "data_producer": ["USE CATALOG", "USE SCHEMA", "SELECT", "MODIFY"], "data_steward": ["USE CATALOG", "USE SCHEMA", "SELECT", "MODIFY", "ALTER"], "platform_admin": ["ALL PRIVILEGES"] # Minimal assignments }
undefined
ROLES = { "data_consumer": ["USE CATALOG", "USE SCHEMA", "SELECT"], "data_producer": ["USE CATALOG", "USE SCHEMA", "SELECT", "MODIFY"], "data_steward": ["USE CATALOG", "USE SCHEMA", "SELECT", "MODIFY", "ALTER"], "platform_admin": ["ALL PRIVILEGES"] # Minimal assignments }
undefined

3. Service Principal Ownership

3. 服务主体所有权

sql
-- Use service principals for ownership, not individuals
ALTER SCHEMA production.customers
  SET OWNER TO `data-engineering-sp`;

-- Transfer from individual to service principal
ALTER TABLE production.customers.profiles
  SET OWNER TO `customer-domain-sp`;
sql
-- Use service principals for ownership, not individuals
ALTER SCHEMA production.customers
  SET OWNER TO `data-engineering-sp`;

-- Transfer from individual to service principal
ALTER TABLE production.customers.profiles
  SET OWNER TO `customer-domain-sp`;

4. Audit and Monitor

4. 审计与监控

sql
-- Query audit logs
SELECT
  user_identity.email,
  request_params.full_name_arg as object_accessed,
  action_name,
  event_time
FROM system.access.audit
WHERE event_date >= CURRENT_DATE - INTERVAL 7 DAYS
  AND action_name IN ('GET_TABLE', 'READ_TABLE')
ORDER BY event_time DESC;
sql
-- Query audit logs
SELECT
  user_identity.email,
  request_params.full_name_arg as object_accessed,
  action_name,
  event_time
FROM system.access.audit
WHERE event_date >= CURRENT_DATE - INTERVAL 7 DAYS
  AND action_name IN ('GET_TABLE', 'READ_TABLE')
ORDER BY event_time DESC;

5. Regular Access Reviews

5. 定期访问评审

python
def quarterly_access_review(catalog: str):
    """Review and recertify access quarterly."""
    # Get all grants
    grants = spark.sql(f"SHOW GRANTS ON CATALOG {catalog}")

    # Identify excessive access
    excessive = grants.filter("privilege = 'ALL PRIVILEGES'")

    # Find unused permissions
    unused = find_unused_grants(grants)

    # Generate review report
    report = generate_review_report(excessive, unused)
    send_to_data_stewards(report)
python
def quarterly_access_review(catalog: str):
    """Review and recertify access quarterly."""
    # Get all grants
    grants = spark.sql(f"SHOW GRANTS ON CATALOG {catalog}")

    # Identify excessive access
    excessive = grants.filter("privilege = 'ALL PRIVILEGES'")

    # Find unused permissions
    unused = find_unused_grants(grants)

    # Generate review report
    report = generate_review_report(excessive, unused)
    send_to_data_stewards(report)

Compliance Frameworks

合规框架

GDPR Compliance Pattern

GDPR合规模式

python
class GDPRCompliance:
    """GDPR compliance implementation."""

    def setup_data_inventory(self):
        """Article 30: Records of processing."""
        # Tag all PII
        self.classify_pii_data()
        # Document processing purposes
        self.document_purposes()

    def implement_right_to_erasure(self):
        """Article 17: Right to erasure."""
        # Create deletion workflow
        self.create_deletion_api()
        # Setup cascade delete
        self.configure_lineage()

    def enable_data_portability(self):
        """Article 20: Right to data portability."""
        # Export user data
        self.create_export_api()
python
class GDPRCompliance:
    """GDPR compliance implementation."""

    def setup_data_inventory(self):
        """Article 30: Records of processing."""
        # Tag all PII
        self.classify_pii_data()
        # Document processing purposes
        self.document_purposes()

    def implement_right_to_erasure(self):
        """Article 17: Right to erasure."""
        # Create deletion workflow
        self.create_deletion_api()
        # Setup cascade delete
        self.configure_lineage()

    def enable_data_portability(self):
        """Article 20: Right to data portability."""
        # Export user data
        self.create_export_api()

HIPAA Compliance Pattern

HIPAA合规模式

python
class HIPAACompliance:
    """HIPAA compliance implementation."""

    def implement_access_controls(self):
        """Technical Safeguards: Access Control."""
        # Unique user identification
        self.enforce_sso()
        # Automatic logoff
        self.configure_session_timeout()
        # Audit controls
        self.enable_comprehensive_logging()

    def enable_encryption(self):
        """Technical Safeguards: Encryption."""
        # Encryption at rest
        self.enable_catalog_encryption()
        # Encryption in transit
        self.enforce_tls()
python
class HIPAACompliance:
    """HIPAA compliance implementation."""

    def implement_access_controls(self):
        """Technical Safeguards: Access Control."""
        # Unique user identification
        self.enforce_sso()
        # Automatic logoff
        self.configure_session_timeout()
        # Audit controls
        self.enable_comprehensive_logging()

    def enable_encryption(self):
        """Technical Safeguards: Encryption."""
        # Encryption at rest
        self.enable_catalog_encryption()
        # Encryption in transit
        self.enforce_tls()

Templates

模板

See
/templates/
directory for:
  • governance-framework: Complete governance setup
  • rbac-configuration: Role-based access control
  • compliance-checklist: Regulatory compliance validation
  • audit-procedures: Regular audit workflows
请查看
/templates/
目录获取:
  • governance-framework:完整治理搭建方案
  • rbac-configuration:角色型访问控制配置
  • compliance-checklist:监管合规验证清单
  • audit-procedures:定期审计流程

Examples

示例

See
/examples/
directory for:
  • regional-access-control: Multi-region data isolation
  • pii-protection: PII masking and encryption
  • compliance-reporting: Automated compliance reports
请查看
/examples/
目录获取:
  • regional-access-control:多区域数据隔离
  • pii-protection:PII数据掩码与加密
  • compliance-reporting:自动化合规报告

Related Skills

相关技能

  • data-classification
    : Data classification and tagging
  • compliance-automation
    : Automated compliance checks
  • access-management
    : RBAC/ABAC implementation
  • data-classification
    :数据分类与标签
  • compliance-automation
    :自动化合规检查
  • access-management
    :RBAC/ABAC实现

References

参考资料