alibabacloud-dlf-manage

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DLF Data Lake Metadata Query

DLF数据湖元数据查询

Query Catalog, Database, and Table metadata resources in Alibaba Cloud Data Lake Formation (DLF).
CRITICAL: Use only the Python SDK script provided by this Skill. All operations go through the DLF Python SDK (
alibabacloud-dlfnext20250310
) via
scripts/dlf_metadata_query.py
. This Skill does not invoke any shell-based command-line client and does not require AI-Mode configuration.
  • DO NOT attempt access via any shell-based command-line client — DLF is not exposed through one in this Skill
  • DO NOT use curl, wget, or other HTTP clients to call the DLF API directly
  • MUST use the
    scripts/dlf_metadata_query.py
    script provided by this Skill, which wraps the DLF Python SDK
  • All query operations are executed via
    python3 scripts/dlf_metadata_query.py <action> [options]
查询阿里云Data Lake Formation(DLF)中的Catalog、数据库和表元数据资源。
重要提示:仅使用本Skill提供的Python SDK脚本。 所有操作通过DLF Python SDK(
alibabacloud-dlfnext20250310
),经由
scripts/dlf_metadata_query.py
执行。 本Skill不会调用任何基于Shell的命令行客户端,也不需要配置AI-Mode。
  • 请勿尝试通过任何基于Shell的命令行客户端访问——本Skill未通过该方式暴露DLF
  • 请勿使用curl、wget或其他HTTP客户端直接调用DLF API
  • 必须使用本Skill提供的
    scripts/dlf_metadata_query.py
    脚本,该脚本封装了DLF Python SDK
  • 所有查询操作通过
    python3 scripts/dlf_metadata_query.py <action> [options]
    执行

Architecture

架构

Catalog (Data Catalog)
  └── Database
        └── Table
              ├── Schema (column definitions)
              ├── PartitionKeys (partition keys)
              ├── PrimaryKeys (primary keys)
              └── Options (table properties)
Catalog (数据目录)
  └── Database
        └── Table
              ├── Schema (列定义)
              ├── PartitionKeys (分区键)
              ├── PrimaryKeys (主键)
              └── Options (表属性)

Installation

安装

bash
pip install -r requirements.txt
requirements.txt
pins the full transitive dependency closure (including
alibabacloud-dlfnext20250310==3.0.0
) for reproducible installs.
Pre-check: Python SDK dependency
bash
python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"
If not installed, run
pip install -r requirements.txt
.
bash
pip install -r requirements.txt
requirements.txt
固定了完整的传递依赖闭包(包括
alibabacloud-dlfnext20250310==3.0.0
),以实现可复现的安装。
预检查:Python SDK依赖
bash
python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"
如果未安装,请运行
pip install -r requirements.txt

Authentication

身份认证

Pre-check: Alibaba Cloud Credentials Required
Use the default credential chain (CredentialClient) to obtain credentials automatically. Supported sources (in priority order):
  1. Environment variables (ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)
  2. Configuration file (~/.alibabacloud/credentials)
  3. ECS Instance RAM Role
  4. OIDC Role ARN
Security Rules:
  • NEVER read, echo, or print AK/SK values
  • NEVER ask the user to input AK/SK directly in the conversation or command line
  • NEVER explicitly handle or pass AK/SK in code — rely on the default credential chain
See https://help.aliyun.com/document_detail/378659.html for credential configuration details.
预检查:需要阿里云凭证
使用默认凭证链(CredentialClient)自动获取凭证。支持的来源(按优先级排序):
  1. 环境变量(ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)
  2. 配置文件(~/.alibabacloud/credentials)
  3. ECS实例RAM角色
  4. OIDC角色ARN
安全规则:
  • 绝对不要读取、回显或打印AK/SK值
  • 绝对不要要求用户在对话或命令行中直接输入AK/SK
  • 绝对不要在代码中显式处理或传递AK/SK——依赖默认凭证链

RAM Permissions

RAM权限

This Skill only involves read-only operations (List / Get). See references/ram-policies.md for the full permission list.
[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:
  1. Read
    references/ram-policies.md
    to get the full list of permissions required by this SKILL
  2. Pause and wait until the user confirms that the required permissions have been granted
本Skill仅涉及只读操作(列出/获取)。完整权限列表请查看references/ram-policies.md
[必须] 权限失败处理: 若执行过程中任何命令或API调用因权限错误失败,请遵循以下流程:
  1. 查看
    references/ram-policies.md
    获取本Skill所需的完整权限列表
  2. 暂停操作,等待用户确认已授予所需权限

Parameter Confirmation

参数确认

IMPORTANT: Parameter Confirmation — Before invoking the API, the following user-specific parameters must be confirmed with the user; do not assume them. Region defaults to cn-hangzhou; if the user does not specify one, use the default without asking.
ParameterRequiredDescriptionDefault
region
NoRegion IDcn-hangzhou
catalog_name
ConditionalCatalog name (
--catalog
, required for GetCatalog)
-
catalog_id
ConditionalCatalog ID (
--catalog-id
, required when querying databases/tables, e.g. clg-paimon-xxxx)
-
database
ConditionalDatabase name (
--database
)
-
table
ConditionalTable name (
--table
)
-
重要提示:参数确认 —— 在调用API前,必须与用户确认以下用户特定参数;请勿自行假设。 Region默认值为cn-hangzhou;若用户未指定,则使用默认值,无需询问。
参数是否必填描述默认值
region
地域IDcn-hangzhou
catalog_name
可选Catalog名称(
--catalog
,获取Catalog详情时必填)
-
catalog_id
可选Catalog ID(
--catalog-id
,查询数据库/表时必填,例如clg-paimon-xxxx)
-
database
可选数据库名称(
--database
-
table
可选表名称(
--table
-

Core Workflow

核心工作流

The script automatically reads AK/SK from environment variables and reports a clear error if they are missing. Region defaults to cn-hangzhou; use the default if the user does not specify one.
You MUST use
scripts/dlf_metadata_query.py
to query metadata. Do not use shell-based command-line clients or curl. Actions are in kebab-case.
CRITICAL — list vs. list-*-details: pick the lightest action that satisfies the request.
  • For listing names / IDs (including fuzzy search): use
    list-databases
    /
    list-tables
    . These call the
    ListDatabases
    /
    ListTables
    API.
  • For full attributes / Schema / properties: use
    list-database-details
    /
    list-table-details
    /
    get-database
    /
    get-table
    . These call the heavier
    *-details
    /
    Get*
    APIs.
  • Default to the lightweight
    list-*
    action
    unless the user explicitly asks for full configuration, Schema, or properties. Calling
    list-*-details
    when only names are needed is incorrect.
脚本会自动从环境变量读取AK/SK,若缺失则会返回清晰的错误提示。 Region默认值为cn-hangzhou;若用户未指定,则使用默认值。
必须使用
scripts/dlf_metadata_query.py
查询元数据。请勿使用基于Shell的命令行客户端或curl。操作采用短横线命名法(kebab-case)
重要提示 —— list与list-*-details的选择:选择能满足需求的最轻量操作。
  • 若仅需列出名称/ID(包括模糊搜索):使用
    list-databases
    /
    list-tables
    。这些操作调用
    ListDatabases
    /
    ListTables
    API。
  • 若需完整属性/Schema/配置:使用
    list-database-details
    /
    list-table-details
    /
    get-database
    /
    get-table
    。这些操作调用更重量级的
    *-details
    /
    Get*
    API。
  • 默认使用轻量的
    list-*
    操作
    ,除非用户明确要求完整配置、Schema或属性。仅需名称时调用
    list-*-details
    是错误的。

Query Operations

查询操作

bash
undefined
bash
undefined

---- Catalog ----

---- Catalog ----

1. List all Catalogs (names + minimal info — preferred for listing/searching)

1. 列出所有Catalog(名称+基础信息 —— 列出/搜索时首选)

python3 scripts/dlf_metadata_query.py list-catalogs
python3 scripts/dlf_metadata_query.py list-catalogs

2. Fuzzy-search Catalogs by name (uses ListCatalogs)

2. 按名称模糊搜索Catalog(调用ListCatalogs)

python3 scripts/dlf_metadata_query.py list-catalogs --pattern test
python3 scripts/dlf_metadata_query.py list-catalogs --pattern test

3. Get Catalog details (by name) — use only when full Catalog config is needed

3. 获取Catalog详情(按名称)—— 仅当需要完整Catalog配置时使用

python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>
python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>

4. Get Catalog details (by ID) — use only when full Catalog config is needed

4. 获取Catalog详情(按ID)—— 仅当需要完整Catalog配置时使用

python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>
python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>

---- Database ----

---- Database ----

5. List databases (NAMES only — DEFAULT for "list / show / which databases", calls ListDatabases)

5. 列出数据库(仅名称 —— “列出/显示/查看哪些数据库”的默认操作,调用ListDatabases)

python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>

6. List database details (full attributes, calls ListDatabaseDetails) — use ONLY when the user asks for properties / configs / location / owner

6. 列出数据库详情(完整属性,调用ListDatabaseDetails)—— 仅当用户询问属性/配置/存储位置/所有者时使用

python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>
python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>

7. Get a single database's details (calls GetDatabase) — use when the user asks for ONE specific database's full info

7. 获取单个数据库的详情(调用GetDatabase)—— 当用户询问某一特定数据库的完整信息时使用

python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>
python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>

---- Table ----

---- Table ----

8. List tables (NAMES only — DEFAULT for "list / show / which tables", calls ListTables)

8. 列出表(仅名称 —— “列出/显示/查看哪些表”的默认操作,调用ListTables)

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>

9. Fuzzy-search tables by name (DEFAULT for "search / find tables matching ...", calls ListTables)

9. 按名称模糊搜索表(“搜索/查找匹配...的表”的默认操作,调用ListTables)

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

10. List table details with Schema (calls ListTableDetails) — use ONLY when the user explicitly asks for Schema / columns / properties of all tables

10. 列出带Schema的表详情(调用ListTableDetails)—— 仅当用户明确要求所有表的Schema/列/属性时使用

python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>
python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>

11. Get a single table's details with Schema (calls GetTable) — use when the user asks for ONE specific table's Schema

11. 获取单个表的带Schema详情(调用GetTable)—— 当用户询问某一特定表的Schema时使用

python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>

Specify region (defaults to cn-hangzhou): add `--region cn-shanghai`
python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>

指定地域(默认cn-hangzhou):添加`--region cn-shanghai`参数

Typical Query Flow

典型查询流程

1. list-catalogs          → get catalog_name and catalog_id (names only)
2. list-databases         → use catalog_id to view available database names
3. list-tables            → use catalog_id + database to view available table names
4. get-table              → use catalog_id + database + table to view ONE table's Schema
Only step 4 (
get-table
) is a "details" call, because Schema is what the user actually asked for. Steps 1–3 stay on the lightweight
list-*
actions.
1. list-catalogs          → 获取catalog_name和catalog_id(仅名称)
2. list-databases         → 使用catalog_id查看可用数据库名称
3. list-tables            → 使用catalog_id + database查看可用表名称
4. get-table              → 使用catalog_id + database + table查看单个表的Schema
只有步骤4(
get-table
)是“详情”调用,因为用户实际需要的是Schema。步骤1–3使用轻量的
list-*
操作。

Fuzzy Search

模糊搜索

All list operations support the
--pattern
argument for fuzzy name matching, using
%
as the wildcard. Use the lightweight
list-*
action for pattern search unless the user explicitly asks for the full Schema / properties of every match.
bash
undefined
所有列出操作都支持
--pattern
参数进行名称模糊匹配,使用
%
作为通配符。除非用户明确要求每个匹配项的完整Schema/属性,否则请使用轻量的
list-*
操作进行模式搜索。
bash
undefined

Search Catalogs whose name contains "test"

搜索名称包含"test"的Catalog

python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%
python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%

Search databases whose name starts with "prod_"

搜索名称以"prod_"开头的数据库

python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%

Search tables whose name starts with "user" (DEFAULT — calls ListTables)

搜索名称以"user"开头的表(默认操作 —— 调用ListTables)

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

> **Anti-pattern**: do not use `list-table-details --pattern ...` to search by name. That calls `ListTableDetails` and is heavier than required. Reach for `list-table-details` only when the user has explicitly asked for the Schema / columns of every matching table.
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

> **反模式:** 请勿使用`list-table-details --pattern ...`按名称搜索。该操作调用`ListTableDetails`,比需求更重量级。仅当用户明确要求所有匹配表的Schema/列时,才使用`list-table-details`。

Output Format

输出格式

  • List operations:
    {"count": N, "items": [...]}
  • Get operations: a single JSON object
  • Errors:
    {"error": "...", "hint": "..."}
  • 列出操作
    {"count": N, "items": [...]}
  • 获取操作:单个JSON对象
  • 错误
    {"error": "...", "hint": "..."}

Verification

验证

If
list-catalogs
returns the Catalog list, the connection and permissions are working:
bash
python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou
See references/verification-method.md for detailed verification steps.
如果
list-catalogs
返回Catalog列表,则说明连接和权限正常:
bash
python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou
详细验证步骤请查看references/verification-method.md

Best Practices

最佳实践

  1. Prefer the lightweight
    list-*
    action over
    list-*-details
    /
    get-*
    .
    When the task only requires listing resource names, IDs, or fuzzy matching, you MUST use
    list-catalogs
    /
    list-databases
    /
    list-tables
    (which call
    ListCatalogs
    /
    ListDatabases
    /
    ListTables
    ). Only use
    list-*-details
    or
    get-*
    when the user explicitly asks for full configuration, Schema, columns, properties, owner, or location. Reaching for the heavier API when the lighter one suffices is incorrect.
  2. List before Get: use list-catalogs to obtain catalog_id first, then use catalog_id to query databases and tables.
  3. Use fuzzy search with the lightweight action: the
    --pattern
    argument supports fuzzy matching; use it on
    list-tables
    (not
    list-table-details
    ) unless full Schema is also requested.
  4. Pagination: use
    --max-results
    and
    --page-token
    for paginated queries when there is a lot of data.
  5. Catalog ID vs Name: when querying Database/Table, use
    catalog_id
    (e.g. clg-paimon-xxxx), not the catalog name.
  1. 优先使用轻量的
    list-*
    操作,而非
    list-*-details
    /
    get-*
    当任务仅需列出资源名称ID模糊匹配时,必须使用
    list-catalogs
    /
    list-databases
    /
    list-tables
    (调用
    ListCatalogs
    /
    ListDatabases
    /
    ListTables
    )。仅当用户明确要求完整配置、Schema、列、属性、所有者或存储位置时,才使用
    list-*-details
    get-*
    。在轻量操作可满足需求时调用重量级API是错误的。
  2. 先列出再获取:先使用list-catalogs获取catalog_id,再使用catalog_id查询数据库和表。
  3. 使用轻量操作进行模糊搜索
    --pattern
    参数支持模糊匹配;在
    list-tables
    上使用该参数(而非
    list-table-details
    ),除非同时需要完整Schema。
  4. 分页:当数据量较大时,使用
    --max-results
    --page-token
    进行分页查询。
  5. Catalog ID vs 名称:查询数据库/表时,使用
    catalog_id
    (例如clg-paimon-xxxx),而非catalog名称。

References

参考资料

ReferenceDescription
references/related-apis.mdFull API list and parameter descriptions
references/ram-policies.mdRAM permission policy
references/acceptance-criteria.mdAcceptance criteria
references/verification-method.mdVerification method
DLF API overviewOfficial API documentation
DLF product documentationProduct documentation
Python SDK PyPISDK version info
参考资料描述
references/related-apis.md完整API列表和参数说明
references/ram-policies.mdRAM权限策略
references/acceptance-criteria.md验收标准
references/verification-method.md验证方法
DLF API概览官方API文档
DLF产品文档产品文档
Python SDK PyPISDK版本信息