alibabacloud-dlf-manage

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DLF Data Lake Metadata Query

DLF数据湖元数据查询

Query Catalog, Database, and Table metadata resources in Alibaba Cloud Data Lake Formation (DLF).

CRITICAL: Use only the Python SDK script provided by this Skill. All operations go through the DLF Python SDK (
alibabacloud-dlfnext20250310
) via
scripts/dlf_metadata_query.py
. This Skill does not invoke any shell-based command-line client and does not require AI-Mode configuration.
DO NOT attempt access via any shell-based command-line client — DLF is not exposed through one in this Skill

DO NOT use curl, wget, or other HTTP clients to call the DLF API directly
MUST use the
scripts/dlf_metadata_query.py
script provided by this Skill, which wraps the DLF Python SDK
All query operations are executed via
python3 scripts/dlf_metadata_query.py <action> [options]

查询阿里云Data Lake Formation（DLF）中的Catalog、数据库和表元数据资源。

重要提示：仅使用本Skill提供的Python SDK脚本。 所有操作通过DLF Python SDK（
alibabacloud-dlfnext20250310
），经由
scripts/dlf_metadata_query.py
执行。本Skill不会调用任何基于Shell的命令行客户端，也不需要配置AI-Mode。
请勿尝试通过任何基于Shell的命令行客户端访问——本Skill未通过该方式暴露DLF

请勿使用curl、wget或其他HTTP客户端直接调用DLF API
必须使用本Skill提供的
scripts/dlf_metadata_query.py
脚本，该脚本封装了DLF Python SDK
所有查询操作通过
python3 scripts/dlf_metadata_query.py <action> [options]
执行

Architecture

架构

Catalog (Data Catalog)
  └── Database
        └── Table
              ├── Schema (column definitions)
              ├── PartitionKeys (partition keys)
              ├── PrimaryKeys (primary keys)
              └── Options (table properties)

Catalog (数据目录)
  └── Database
        └── Table
              ├── Schema (列定义)
              ├── PartitionKeys (分区键)
              ├── PrimaryKeys (主键)
              └── Options (表属性)

Installation

安装

bash

pip install -r requirements.txt

requirements.txt

pins the full transitive dependency closure (including

alibabacloud-dlfnext20250310==3.0.0

) for reproducible installs.

Pre-check: Python SDK dependency
bash
python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"
If not installed, run
pip install -r requirements.txt
.

bash

pip install -r requirements.txt

requirements.txt

固定了完整的传递依赖闭包（包括

alibabacloud-dlfnext20250310==3.0.0

），以实现可复现的安装。

预检查：Python SDK依赖
bash
python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"
如果未安装，请运行
pip install -r requirements.txt
。

Authentication

身份认证

Pre-check: Alibaba Cloud Credentials Required

Use the default credential chain (CredentialClient) to obtain credentials automatically. Supported sources (in priority order):

Environment variables (ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)

Configuration file (~/.alibabacloud/credentials)

ECS Instance RAM Role

OIDC Role ARN

Security Rules:

NEVER read, echo, or print AK/SK values

NEVER ask the user to input AK/SK directly in the conversation or command line

NEVER explicitly handle or pass AK/SK in code — rely on the default credential chain

See https://help.aliyun.com/document_detail/378659.html for credential configuration details.

预检查：需要阿里云凭证

使用默认凭证链（CredentialClient）自动获取凭证。支持的来源（按优先级排序）：

环境变量（ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET）

配置文件（~/.alibabacloud/credentials）

ECS实例RAM角色

OIDC角色ARN

安全规则：

绝对不要读取、回显或打印AK/SK值

绝对不要要求用户在对话或命令行中直接输入AK/SK

绝对不要在代码中显式处理或传递AK/SK——依赖默认凭证链

凭证配置详情请参考：https://help.aliyun.com/document_detail/378659.html

RAM Permissions

RAM权限

This Skill only involves read-only operations (List / Get). See references/ram-policies.md for the full permission list.

[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:
Read
references/ram-policies.md
to get the full list of permissions required by this SKILL
Pause and wait until the user confirms that the required permissions have been granted

本Skill仅涉及只读操作（列出/获取）。完整权限列表请查看references/ram-policies.md。

[必须] 权限失败处理： 若执行过程中任何命令或API调用因权限错误失败，请遵循以下流程：
查看
references/ram-policies.md
获取本Skill所需的完整权限列表
暂停操作，等待用户确认已授予所需权限

Parameter Confirmation

参数确认

IMPORTANT: Parameter Confirmation — Before invoking the API, the following user-specific parameters must be confirmed with the user; do not assume them. Region defaults to cn-hangzhou; if the user does not specify one, use the default without asking.

Parameter	Required	Description	Default
`region`	No	Region ID	cn-hangzhou
`catalog_name`	Conditional	Catalog name ( `--catalog` , required for GetCatalog)	-
`catalog_id`	Conditional	Catalog ID ( `--catalog-id` , required when querying databases/tables, e.g. clg-paimon-xxxx)	-
`database`	Conditional	Database name ( `--database` )	-
`table`	Conditional	Table name ( `--table` )	-

重要提示：参数确认 —— 在调用API前，必须与用户确认以下用户特定参数；请勿自行假设。 Region默认值为cn-hangzhou；若用户未指定，则使用默认值，无需询问。

参数	是否必填	描述	默认值
`region`	否	地域ID	cn-hangzhou
`catalog_name`	可选	Catalog名称（ `--catalog` ，获取Catalog详情时必填）	-
`catalog_id`	可选	Catalog ID（ `--catalog-id` ，查询数据库/表时必填，例如clg-paimon-xxxx）	-
`database`	可选	数据库名称（ `--database` ）	-
`table`	可选	表名称（ `--table` ）	-

Core Workflow

核心工作流

The script automatically reads AK/SK from environment variables and reports a clear error if they are missing. Region defaults to cn-hangzhou; use the default if the user does not specify one.

You MUST use

scripts/dlf_metadata_query.py

to query metadata. Do not use shell-based command-line clients or curl. Actions are in kebab-case.

CRITICAL — list vs. list-*-details: pick the lightest action that satisfies the request.
For listing names / IDs (including fuzzy search): use
list-databases
/
list-tables
. These call the
ListDatabases
/
ListTables
API.
For full attributes / Schema / properties: use
list-database-details
/
list-table-details
/
get-database
/
get-table
. These call the heavier
*-details
/
Get*
APIs.
Default to the lightweight
list-*
action unless the user explicitly asks for full configuration, Schema, or properties. Calling
list-*-details
when only names are needed is incorrect.

脚本会自动从环境变量读取AK/SK，若缺失则会返回清晰的错误提示。 Region默认值为cn-hangzhou；若用户未指定，则使用默认值。

必须使用

scripts/dlf_metadata_query.py

查询元数据。请勿使用基于Shell的命令行客户端或curl。操作采用短横线命名法（kebab-case）。

重要提示 —— list与list-*-details的选择：选择能满足需求的最轻量操作。
若仅需列出名称/ID（包括模糊搜索）：使用
list-databases
/
list-tables
。这些操作调用
ListDatabases
/
ListTables
API。
若需完整属性/Schema/配置：使用
list-database-details
/
list-table-details
/
get-database
/
get-table
。这些操作调用更重量级的
*-details
/
Get*
API。
默认使用轻量的
list-*
操作，除非用户明确要求完整配置、Schema或属性。仅需名称时调用
list-*-details
是错误的。

Query Operations

查询操作

bash

undefined

bash

undefined

---- Catalog ----

1. List all Catalogs (names + minimal info — preferred for listing/searching)

1. 列出所有Catalog（名称+基础信息 —— 列出/搜索时首选）

python3 scripts/dlf_metadata_query.py list-catalogs

2. Fuzzy-search Catalogs by name (uses ListCatalogs)

2. 按名称模糊搜索Catalog（调用ListCatalogs）

python3 scripts/dlf_metadata_query.py list-catalogs --pattern test

3. Get Catalog details (by name) — use only when full Catalog config is needed

3. 获取Catalog详情（按名称）—— 仅当需要完整Catalog配置时使用

python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>

4. Get Catalog details (by ID) — use only when full Catalog config is needed

4. 获取Catalog详情（按ID）—— 仅当需要完整Catalog配置时使用

python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>

---- Database ----

5. List databases (NAMES only — DEFAULT for "list / show / which databases", calls ListDatabases)

5. 列出数据库（仅名称 —— “列出/显示/查看哪些数据库”的默认操作，调用ListDatabases）

python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>

6. List database details (full attributes, calls ListDatabaseDetails) — use ONLY when the user asks for properties / configs / location / owner

6. 列出数据库详情（完整属性，调用ListDatabaseDetails）—— 仅当用户询问属性/配置/存储位置/所有者时使用

python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>

7. Get a single database's details (calls GetDatabase) — use when the user asks for ONE specific database's full info

7. 获取单个数据库的详情（调用GetDatabase）—— 当用户询问某一特定数据库的完整信息时使用

python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>

---- Table ----

8. List tables (NAMES only — DEFAULT for "list / show / which tables", calls ListTables)

8. 列出表（仅名称 —— “列出/显示/查看哪些表”的默认操作，调用ListTables）

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>

9. Fuzzy-search tables by name (DEFAULT for "search / find tables matching ...", calls ListTables)

9. 按名称模糊搜索表（“搜索/查找匹配...的表”的默认操作，调用ListTables）

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

10. List table details with Schema (calls ListTableDetails) — use ONLY when the user explicitly asks for Schema / columns / properties of all tables

10. 列出带Schema的表详情（调用ListTableDetails）—— 仅当用户明确要求所有表的Schema/列/属性时使用

python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>

11. Get a single table's details with Schema (calls GetTable) — use when the user asks for ONE specific table's Schema

11. 获取单个表的带Schema详情（调用GetTable）—— 当用户询问某一特定表的Schema时使用

python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>


Specify region (defaults to cn-hangzhou): add `--region cn-shanghai`

python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>


指定地域（默认cn-hangzhou）：添加`--region cn-shanghai`参数

Typical Query Flow

典型查询流程

1. list-catalogs          → get catalog_name and catalog_id (names only)
2. list-databases         → use catalog_id to view available database names
3. list-tables            → use catalog_id + database to view available table names
4. get-table              → use catalog_id + database + table to view ONE table's Schema

Only step 4 (
get-table
) is a "details" call, because Schema is what the user actually asked for. Steps 1–3 stay on the lightweight
list-*
actions.

1. list-catalogs          → 获取catalog_name和catalog_id（仅名称）
2. list-databases         → 使用catalog_id查看可用数据库名称
3. list-tables            → 使用catalog_id + database查看可用表名称
4. get-table              → 使用catalog_id + database + table查看单个表的Schema

只有步骤4（
get-table
）是“详情”调用，因为用户实际需要的是Schema。步骤1–3使用轻量的
list-*
操作。

Fuzzy Search

模糊搜索

All list operations support the

--pattern

argument for fuzzy name matching, using

as the wildcard. Use the lightweight
list-*
action for pattern search unless the user explicitly asks for the full Schema / properties of every match.

bash

undefined

所有列出操作都支持

--pattern

参数进行名称模糊匹配，使用

作为通配符。除非用户明确要求每个匹配项的完整Schema/属性，否则请使用轻量的
list-*
操作进行模式搜索。

bash

undefined

Search Catalogs whose name contains "test"

搜索名称包含"test"的Catalog

python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%

Search databases whose name starts with "prod_"

搜索名称以"prod_"开头的数据库

python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%

Search tables whose name starts with "user" (DEFAULT — calls ListTables)

搜索名称以"user"开头的表（默认操作 —— 调用ListTables）

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%


> **Anti-pattern**: do not use `list-table-details --pattern ...` to search by name. That calls `ListTableDetails` and is heavier than required. Reach for `list-table-details` only when the user has explicitly asked for the Schema / columns of every matching table.

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%


> **反模式：** 请勿使用`list-table-details --pattern ...`按名称搜索。该操作调用`ListTableDetails`，比需求更重量级。仅当用户明确要求所有匹配表的Schema/列时，才使用`list-table-details`。

Output Format

输出格式

List operations:
```
{"count": N, "items": [...]}
```
Get operations: a single JSON object
Errors:
```
{"error": "...", "hint": "..."}
```

列出操作：
```
{"count": N, "items": [...]}
```
获取操作：单个JSON对象
错误：
```
{"error": "...", "hint": "..."}
```

Verification

验证

list-catalogs

returns the Catalog list, the connection and permissions are working:

bash

python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou

See references/verification-method.md for detailed verification steps.

如果

list-catalogs

返回Catalog列表，则说明连接和权限正常：

bash

python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou

详细验证步骤请查看references/verification-method.md。

Best Practices

最佳实践

Prefer the lightweight
list-*
action over
list-*-details
/
get-*
. When the task only requires listing resource names, IDs, or fuzzy matching, you MUST use
```
list-catalogs
```
/
```
list-databases
```
/
```
list-tables
```
(which call
```
ListCatalogs
```
/
```
ListDatabases
```
/
```
ListTables
```
). Only use
```
list-*-details
```
or
```
get-*
```
when the user explicitly asks for full configuration, Schema, columns, properties, owner, or location. Reaching for the heavier API when the lighter one suffices is incorrect.
List before Get: use list-catalogs to obtain catalog_id first, then use catalog_id to query databases and tables.
Use fuzzy search with the lightweight action: the
```
--pattern
```
argument supports fuzzy matching; use it on
```
list-tables
```
(not
```
list-table-details
```
) unless full Schema is also requested.
Pagination: use
```
--max-results
```
and
```
--page-token
```
for paginated queries when there is a lot of data.
Catalog ID vs Name: when querying Database/Table, use
```
catalog_id
```
(e.g. clg-paimon-xxxx), not the catalog name.

优先使用轻量的
list-*
操作，而非
list-*-details
/
get-*
。当任务仅需列出资源名称、ID或模糊匹配时，必须使用
```
list-catalogs
```
/
```
list-databases
```
/
```
list-tables
```
（调用
```
ListCatalogs
```
/
```
ListDatabases
```
/
```
ListTables
```
）。仅当用户明确要求完整配置、Schema、列、属性、所有者或存储位置时，才使用
```
list-*-details
```
或
```
get-*
```
。在轻量操作可满足需求时调用重量级API是错误的。
先列出再获取：先使用list-catalogs获取catalog_id，再使用catalog_id查询数据库和表。
使用轻量操作进行模糊搜索：
```
--pattern
```
参数支持模糊匹配；在
```
list-tables
```
上使用该参数（而非
```
list-table-details
```
），除非同时需要完整Schema。
分页：当数据量较大时，使用
```
--max-results
```
和
```
--page-token
```
进行分页查询。
Catalog ID vs 名称：查询数据库/表时，使用
```
catalog_id
```
（例如clg-paimon-xxxx），而非catalog名称。

References

参考资料

Reference	Description
references/related-apis.md	Full API list and parameter descriptions
references/ram-policies.md	RAM permission policy
references/acceptance-criteria.md	Acceptance criteria
references/verification-method.md	Verification method
DLF API overview	Official API documentation
DLF product documentation	Product documentation
Python SDK PyPI	SDK version info

参考资料	描述
references/related-apis.md	完整API列表和参数说明
references/ram-policies.md	RAM权限策略
references/acceptance-criteria.md	验收标准
references/verification-method.md	验证方法
DLF API概览	官方API文档
DLF产品文档	产品文档
Python SDK PyPI	SDK版本信息