alibabacloud-dlf-manage
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDLF Data Lake Metadata Query
DLF数据湖元数据查询
Query Catalog, Database, and Table metadata resources in Alibaba Cloud Data Lake Formation (DLF).
CRITICAL: Use only the Python SDK script provided by this Skill. All operations go through the DLF Python SDK () viaalibabacloud-dlfnext20250310. This Skill does not invoke any shell-based command-line client and does not require AI-Mode configuration.scripts/dlf_metadata_query.py
- DO NOT attempt access via any shell-based command-line client — DLF is not exposed through one in this Skill
- DO NOT use curl, wget, or other HTTP clients to call the DLF API directly
- MUST use the
script provided by this Skill, which wraps the DLF Python SDKscripts/dlf_metadata_query.py- All query operations are executed via
python3 scripts/dlf_metadata_query.py <action> [options]
查询阿里云Data Lake Formation(DLF)中的Catalog、数据库和表元数据资源。
重要提示:仅使用本Skill提供的Python SDK脚本。 所有操作通过DLF Python SDK(),经由alibabacloud-dlfnext20250310执行。 本Skill不会调用任何基于Shell的命令行客户端,也不需要配置AI-Mode。scripts/dlf_metadata_query.py
- 请勿尝试通过任何基于Shell的命令行客户端访问——本Skill未通过该方式暴露DLF
- 请勿使用curl、wget或其他HTTP客户端直接调用DLF API
- 必须使用本Skill提供的
脚本,该脚本封装了DLF Python SDKscripts/dlf_metadata_query.py- 所有查询操作通过
执行python3 scripts/dlf_metadata_query.py <action> [options]
Architecture
架构
Catalog (Data Catalog)
└── Database
└── Table
├── Schema (column definitions)
├── PartitionKeys (partition keys)
├── PrimaryKeys (primary keys)
└── Options (table properties)Catalog (数据目录)
└── Database
└── Table
├── Schema (列定义)
├── PartitionKeys (分区键)
├── PrimaryKeys (主键)
└── Options (表属性)Installation
安装
bash
pip install -r requirements.txtrequirements.txtalibabacloud-dlfnext20250310==3.0.0Pre-check: Python SDK dependencybashpython3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"If not installed, run.pip install -r requirements.txt
bash
pip install -r requirements.txtrequirements.txtalibabacloud-dlfnext20250310==3.0.0预检查:Python SDK依赖bashpython3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"如果未安装,请运行。pip install -r requirements.txt
Authentication
身份认证
Pre-check: Alibaba Cloud Credentials RequiredUse the default credential chain (CredentialClient) to obtain credentials automatically. Supported sources (in priority order):
- Environment variables (ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)
- Configuration file (~/.alibabacloud/credentials)
- ECS Instance RAM Role
- OIDC Role ARN
Security Rules:
- NEVER read, echo, or print AK/SK values
- NEVER ask the user to input AK/SK directly in the conversation or command line
- NEVER explicitly handle or pass AK/SK in code — rely on the default credential chain
See https://help.aliyun.com/document_detail/378659.html for credential configuration details.
预检查:需要阿里云凭证使用默认凭证链(CredentialClient)自动获取凭证。支持的来源(按优先级排序):
- 环境变量(ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)
- 配置文件(~/.alibabacloud/credentials)
- ECS实例RAM角色
- OIDC角色ARN
安全规则:
- 绝对不要读取、回显或打印AK/SK值
- 绝对不要要求用户在对话或命令行中直接输入AK/SK
- 绝对不要在代码中显式处理或传递AK/SK——依赖默认凭证链
RAM Permissions
RAM权限
This Skill only involves read-only operations (List / Get). See references/ram-policies.md for the full permission list.
[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:
- Read
to get the full list of permissions required by this SKILLreferences/ram-policies.md- Pause and wait until the user confirms that the required permissions have been granted
本Skill仅涉及只读操作(列出/获取)。完整权限列表请查看references/ram-policies.md。
[必须] 权限失败处理: 若执行过程中任何命令或API调用因权限错误失败,请遵循以下流程:
- 查看
获取本Skill所需的完整权限列表references/ram-policies.md- 暂停操作,等待用户确认已授予所需权限
Parameter Confirmation
参数确认
IMPORTANT: Parameter Confirmation — Before invoking the API, the following user-specific parameters must be confirmed with the user; do not assume them. Region defaults to cn-hangzhou; if the user does not specify one, use the default without asking.
| Parameter | Required | Description | Default |
|---|---|---|---|
| No | Region ID | cn-hangzhou |
| Conditional | Catalog name ( | - |
| Conditional | Catalog ID ( | - |
| Conditional | Database name ( | - |
| Conditional | Table name ( | - |
重要提示:参数确认 —— 在调用API前,必须与用户确认以下用户特定参数;请勿自行假设。 Region默认值为cn-hangzhou;若用户未指定,则使用默认值,无需询问。
| 参数 | 是否必填 | 描述 | 默认值 |
|---|---|---|---|
| 否 | 地域ID | cn-hangzhou |
| 可选 | Catalog名称( | - |
| 可选 | Catalog ID( | - |
| 可选 | 数据库名称( | - |
| 可选 | 表名称( | - |
Core Workflow
核心工作流
The script automatically reads AK/SK from environment variables and reports a clear error if they are missing. Region defaults to cn-hangzhou; use the default if the user does not specify one.
You MUST use to query metadata. Do not use shell-based command-line clients or curl. Actions are in kebab-case.
scripts/dlf_metadata_query.pyCRITICAL — list vs. list-*-details: pick the lightest action that satisfies the request.
- For listing names / IDs (including fuzzy search): use
/list-databases. These call thelist-tables/ListDatabasesAPI.ListTables- For full attributes / Schema / properties: use
/list-database-details/list-table-details/get-database. These call the heavierget-table/*-detailsAPIs.Get*- Default to the lightweight
action unless the user explicitly asks for full configuration, Schema, or properties. Callinglist-*when only names are needed is incorrect.list-*-details
脚本会自动从环境变量读取AK/SK,若缺失则会返回清晰的错误提示。 Region默认值为cn-hangzhou;若用户未指定,则使用默认值。
必须使用 查询元数据。请勿使用基于Shell的命令行客户端或curl。操作采用短横线命名法(kebab-case)。
scripts/dlf_metadata_query.py重要提示 —— list与list-*-details的选择:选择能满足需求的最轻量操作。
- 若仅需列出名称/ID(包括模糊搜索):使用
/list-databases。这些操作调用list-tables/ListDatabasesAPI。ListTables- 若需完整属性/Schema/配置:使用
/list-database-details/list-table-details/get-database。这些操作调用更重量级的get-table/*-detailsAPI。Get*- 默认使用轻量的
操作,除非用户明确要求完整配置、Schema或属性。仅需名称时调用list-*是错误的。list-*-details
Query Operations
查询操作
bash
undefinedbash
undefined---- Catalog ----
---- Catalog ----
1. List all Catalogs (names + minimal info — preferred for listing/searching)
1. 列出所有Catalog(名称+基础信息 —— 列出/搜索时首选)
python3 scripts/dlf_metadata_query.py list-catalogs
python3 scripts/dlf_metadata_query.py list-catalogs
2. Fuzzy-search Catalogs by name (uses ListCatalogs)
2. 按名称模糊搜索Catalog(调用ListCatalogs)
python3 scripts/dlf_metadata_query.py list-catalogs --pattern test
python3 scripts/dlf_metadata_query.py list-catalogs --pattern test
3. Get Catalog details (by name) — use only when full Catalog config is needed
3. 获取Catalog详情(按名称)—— 仅当需要完整Catalog配置时使用
python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>
python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>
4. Get Catalog details (by ID) — use only when full Catalog config is needed
4. 获取Catalog详情(按ID)—— 仅当需要完整Catalog配置时使用
python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>
python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>
---- Database ----
---- Database ----
5. List databases (NAMES only — DEFAULT for "list / show / which databases", calls ListDatabases)
5. 列出数据库(仅名称 —— “列出/显示/查看哪些数据库”的默认操作,调用ListDatabases)
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>
6. List database details (full attributes, calls ListDatabaseDetails) — use ONLY when the user asks for properties / configs / location / owner
6. 列出数据库详情(完整属性,调用ListDatabaseDetails)—— 仅当用户询问属性/配置/存储位置/所有者时使用
python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>
python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>
7. Get a single database's details (calls GetDatabase) — use when the user asks for ONE specific database's full info
7. 获取单个数据库的详情(调用GetDatabase)—— 当用户询问某一特定数据库的完整信息时使用
python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>
python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>
---- Table ----
---- Table ----
8. List tables (NAMES only — DEFAULT for "list / show / which tables", calls ListTables)
8. 列出表(仅名称 —— “列出/显示/查看哪些表”的默认操作,调用ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>
9. Fuzzy-search tables by name (DEFAULT for "search / find tables matching ...", calls ListTables)
9. 按名称模糊搜索表(“搜索/查找匹配...的表”的默认操作,调用ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%
10. List table details with Schema (calls ListTableDetails) — use ONLY when the user explicitly asks for Schema / columns / properties of all tables
10. 列出带Schema的表详情(调用ListTableDetails)—— 仅当用户明确要求所有表的Schema/列/属性时使用
python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>
python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>
11. Get a single table's details with Schema (calls GetTable) — use when the user asks for ONE specific table's Schema
11. 获取单个表的带Schema详情(调用GetTable)—— 当用户询问某一特定表的Schema时使用
python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>
Specify region (defaults to cn-hangzhou): add `--region cn-shanghai`python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>
指定地域(默认cn-hangzhou):添加`--region cn-shanghai`参数Typical Query Flow
典型查询流程
1. list-catalogs → get catalog_name and catalog_id (names only)
2. list-databases → use catalog_id to view available database names
3. list-tables → use catalog_id + database to view available table names
4. get-table → use catalog_id + database + table to view ONE table's SchemaOnly step 4 () is a "details" call, because Schema is what the user actually asked for. Steps 1–3 stay on the lightweightget-tableactions.list-*
1. list-catalogs → 获取catalog_name和catalog_id(仅名称)
2. list-databases → 使用catalog_id查看可用数据库名称
3. list-tables → 使用catalog_id + database查看可用表名称
4. get-table → 使用catalog_id + database + table查看单个表的Schema只有步骤4()是“详情”调用,因为用户实际需要的是Schema。步骤1–3使用轻量的get-table操作。list-*
Fuzzy Search
模糊搜索
All list operations support the argument for fuzzy name matching, using as the wildcard. Use the lightweight action for pattern search unless the user explicitly asks for the full Schema / properties of every match.
--pattern%list-*bash
undefined所有列出操作都支持参数进行名称模糊匹配,使用作为通配符。除非用户明确要求每个匹配项的完整Schema/属性,否则请使用轻量的操作进行模式搜索。
--pattern%list-*bash
undefinedSearch Catalogs whose name contains "test"
搜索名称包含"test"的Catalog
python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%
python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%
Search databases whose name starts with "prod_"
搜索名称以"prod_"开头的数据库
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%
Search tables whose name starts with "user" (DEFAULT — calls ListTables)
搜索名称以"user"开头的表(默认操作 —— 调用ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%
> **Anti-pattern**: do not use `list-table-details --pattern ...` to search by name. That calls `ListTableDetails` and is heavier than required. Reach for `list-table-details` only when the user has explicitly asked for the Schema / columns of every matching table.python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%
> **反模式:** 请勿使用`list-table-details --pattern ...`按名称搜索。该操作调用`ListTableDetails`,比需求更重量级。仅当用户明确要求所有匹配表的Schema/列时,才使用`list-table-details`。Output Format
输出格式
- List operations:
{"count": N, "items": [...]} - Get operations: a single JSON object
- Errors:
{"error": "...", "hint": "..."}
- 列出操作:
{"count": N, "items": [...]} - 获取操作:单个JSON对象
- 错误:
{"error": "...", "hint": "..."}
Verification
验证
If returns the Catalog list, the connection and permissions are working:
list-catalogsbash
python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhouSee references/verification-method.md for detailed verification steps.
如果返回Catalog列表,则说明连接和权限正常:
list-catalogsbash
python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou详细验证步骤请查看references/verification-method.md。
Best Practices
最佳实践
- Prefer the lightweight action over
list-*/list-*-details. When the task only requires listing resource names, IDs, or fuzzy matching, you MUST useget-*/list-catalogs/list-databases(which calllist-tables/ListCatalogs/ListDatabases). Only useListTablesorlist-*-detailswhen the user explicitly asks for full configuration, Schema, columns, properties, owner, or location. Reaching for the heavier API when the lighter one suffices is incorrect.get-* - List before Get: use list-catalogs to obtain catalog_id first, then use catalog_id to query databases and tables.
- Use fuzzy search with the lightweight action: the argument supports fuzzy matching; use it on
--pattern(notlist-tables) unless full Schema is also requested.list-table-details - Pagination: use and
--max-resultsfor paginated queries when there is a lot of data.--page-token - Catalog ID vs Name: when querying Database/Table, use (e.g. clg-paimon-xxxx), not the catalog name.
catalog_id
- 优先使用轻量的操作,而非
list-*/list-*-details。 当任务仅需列出资源名称、ID或模糊匹配时,必须使用get-*/list-catalogs/list-databases(调用list-tables/ListCatalogs/ListDatabases)。仅当用户明确要求完整配置、Schema、列、属性、所有者或存储位置时,才使用ListTables或list-*-details。在轻量操作可满足需求时调用重量级API是错误的。get-* - 先列出再获取:先使用list-catalogs获取catalog_id,再使用catalog_id查询数据库和表。
- 使用轻量操作进行模糊搜索:参数支持模糊匹配;在
--pattern上使用该参数(而非list-tables),除非同时需要完整Schema。list-table-details - 分页:当数据量较大时,使用和
--max-results进行分页查询。--page-token - Catalog ID vs 名称:查询数据库/表时,使用(例如clg-paimon-xxxx),而非catalog名称。
catalog_id
References
参考资料
| Reference | Description |
|---|---|
| references/related-apis.md | Full API list and parameter descriptions |
| references/ram-policies.md | RAM permission policy |
| references/acceptance-criteria.md | Acceptance criteria |
| references/verification-method.md | Verification method |
| DLF API overview | Official API documentation |
| DLF product documentation | Product documentation |
| Python SDK PyPI | SDK version info |
| 参考资料 | 描述 |
|---|---|
| references/related-apis.md | 完整API列表和参数说明 |
| references/ram-policies.md | RAM权限策略 |
| references/acceptance-criteria.md | 验收标准 |
| references/verification-method.md | 验证方法 |
| DLF API概览 | 官方API文档 |
| DLF产品文档 | 产品文档 |
| Python SDK PyPI | SDK版本信息 |