DLF Data Lake Metadata Query

Query Catalog, Database, and Table metadata resources in Alibaba Cloud Data Lake Formation (DLF).

CRITICAL: Use only the Python SDK script provided by this Skill. All operations go through the DLF Python SDK (
alibabacloud-dlfnext20250310
) via
scripts/dlf_metadata_query.py
. This Skill does not invoke any shell-based command-line client and does not require AI-Mode configuration.
DO NOT attempt access via any shell-based command-line client — DLF is not exposed through one in this Skill

DO NOT use curl, wget, or other HTTP clients to call the DLF API directly
MUST use the
scripts/dlf_metadata_query.py
script provided by this Skill, which wraps the DLF Python SDK
All query operations are executed via
python3 scripts/dlf_metadata_query.py <action> [options]

Architecture

Catalog (Data Catalog)
  └── Database
        └── Table
              ├── Schema (column definitions)
              ├── PartitionKeys (partition keys)
              ├── PrimaryKeys (primary keys)
              └── Options (table properties)

Installation

bash

pip install -r requirements.txt

requirements.txt

pins the full transitive dependency closure (including

alibabacloud-dlfnext20250310==3.0.0

) for reproducible installs.

Pre-check: Python SDK dependency
bash
python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"
If not installed, run
pip install -r requirements.txt
.

Authentication

Pre-check: Alibaba Cloud Credentials Required

Use the default credential chain (CredentialClient) to obtain credentials automatically. Supported sources (in priority order):

Environment variables (ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)

Configuration file (~/.alibabacloud/credentials)

ECS Instance RAM Role

OIDC Role ARN

Security Rules:

NEVER read, echo, or print AK/SK values

NEVER ask the user to input AK/SK directly in the conversation or command line

NEVER explicitly handle or pass AK/SK in code — rely on the default credential chain

See https://help.aliyun.com/document_detail/378659.html for credential configuration details.

RAM Permissions

This Skill only involves read-only operations (List / Get). See references/ram-policies.md for the full permission list.

[MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:
Read
references/ram-policies.md
to get the full list of permissions required by this SKILL
Pause and wait until the user confirms that the required permissions have been granted

Parameter Confirmation

IMPORTANT: Parameter Confirmation — Before invoking the API, the following user-specific parameters must be confirmed with the user; do not assume them. Region defaults to cn-hangzhou; if the user does not specify one, use the default without asking.

Parameter	Required	Description	Default
`region`	No	Region ID	cn-hangzhou
`catalog_name`	Conditional	Catalog name ( `--catalog` , required for GetCatalog)	-
`catalog_id`	Conditional	Catalog ID ( `--catalog-id` , required when querying databases/tables, e.g. clg-paimon-xxxx)	-
`database`	Conditional	Database name ( `--database` )	-
`table`	Conditional	Table name ( `--table` )	-

Core Workflow

The script automatically reads AK/SK from environment variables and reports a clear error if they are missing. Region defaults to cn-hangzhou; use the default if the user does not specify one.

You MUST use

scripts/dlf_metadata_query.py

to query metadata. Do not use shell-based command-line clients or curl. Actions are in kebab-case.

CRITICAL — list vs. list-*-details: pick the lightest action that satisfies the request.
For listing names / IDs (including fuzzy search): use
list-databases
/
list-tables
. These call the
ListDatabases
/
ListTables
API.
For full attributes / Schema / properties: use
list-database-details
/
list-table-details
/
get-database
/
get-table
. These call the heavier
*-details
/
Get*
APIs.
Default to the lightweight
list-*
action unless the user explicitly asks for full configuration, Schema, or properties. Calling
list-*-details
when only names are needed is incorrect.

Query Operations

bash

# ---- Catalog ----

# 1. List all Catalogs (names + minimal info — preferred for listing/searching)
python3 scripts/dlf_metadata_query.py list-catalogs

# 2. Fuzzy-search Catalogs by name (uses ListCatalogs)
python3 scripts/dlf_metadata_query.py list-catalogs --pattern test

# 3. Get Catalog details (by name) — use only when full Catalog config is needed
python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>

# 4. Get Catalog details (by ID) — use only when full Catalog config is needed
python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>

# ---- Database ----

# 5. List databases (NAMES only — DEFAULT for "list / show / which databases", calls ListDatabases)
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>

# 6. List database details (full attributes, calls ListDatabaseDetails) — use ONLY when the user asks for properties / configs / location / owner
python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>

# 7. Get a single database's details (calls GetDatabase) — use when the user asks for ONE specific database's full info
python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>

# ---- Table ----

# 8. List tables (NAMES only — DEFAULT for "list / show / which tables", calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>

# 9. Fuzzy-search tables by name (DEFAULT for "search / find tables matching ...", calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

# 10. List table details with Schema (calls ListTableDetails) — use ONLY when the user explicitly asks for Schema / columns / properties of all tables
python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>

# 11. Get a single table's details with Schema (calls GetTable) — use when the user asks for ONE specific table's Schema
python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>

Specify region (defaults to cn-hangzhou): add

--region cn-shanghai

Typical Query Flow

1. list-catalogs          → get catalog_name and catalog_id (names only)
2. list-databases         → use catalog_id to view available database names
3. list-tables            → use catalog_id + database to view available table names
4. get-table              → use catalog_id + database + table to view ONE table's Schema

Only step 4 (
get-table
) is a "details" call, because Schema is what the user actually asked for. Steps 1–3 stay on the lightweight
list-*
actions.

Fuzzy Search

All list operations support the

--pattern

argument for fuzzy name matching, using

as the wildcard. Use the lightweight
list-*
action for pattern search unless the user explicitly asks for the full Schema / properties of every match.

bash

# Search Catalogs whose name contains "test"
python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%

# Search databases whose name starts with "prod_"
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%

# Search tables whose name starts with "user" (DEFAULT — calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

Anti-pattern: do not use
list-table-details --pattern ...
to search by name. That calls
ListTableDetails
and is heavier than required. Reach for
list-table-details
only when the user has explicitly asked for the Schema / columns of every matching table.

Output Format

List operations:
```
{"count": N, "items": [...]}
```
Get operations: a single JSON object
Errors:
```
{"error": "...", "hint": "..."}
```

Verification

list-catalogs

returns the Catalog list, the connection and permissions are working:

bash

python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou

See references/verification-method.md for detailed verification steps.

Best Practices

Prefer the lightweight
list-*
action over
list-*-details
/
get-*
. When the task only requires listing resource names, IDs, or fuzzy matching, you MUST use
```
list-catalogs
```
/
```
list-databases
```
/
```
list-tables
```
(which call
```
ListCatalogs
```
/
```
ListDatabases
```
/
```
ListTables
```
). Only use
```
list-*-details
```
or
```
get-*
```
when the user explicitly asks for full configuration, Schema, columns, properties, owner, or location. Reaching for the heavier API when the lighter one suffices is incorrect.
List before Get: use list-catalogs to obtain catalog_id first, then use catalog_id to query databases and tables.
Use fuzzy search with the lightweight action: the
```
--pattern
```
argument supports fuzzy matching; use it on
```
list-tables
```
(not
```
list-table-details
```
) unless full Schema is also requested.
Pagination: use
```
--max-results
```
and
```
--page-token
```
for paginated queries when there is a lot of data.
Catalog ID vs Name: when querying Database/Table, use
```
catalog_id
```
(e.g. clg-paimon-xxxx), not the catalog name.

References

Reference	Description
references/related-apis.md	Full API list and parameter descriptions
references/ram-policies.md	RAM permission policy
references/acceptance-criteria.md	Acceptance criteria
references/verification-method.md	Verification method
DLF API overview	Official API documentation
DLF product documentation	Product documentation
Python SDK PyPI	SDK version info

alibabacloud-dlf-manage

NPX Install

Tags

SKILL.md Content

DLF Data Lake Metadata Query

Architecture

Installation

Authentication

RAM Permissions

Parameter Confirmation

Core Workflow

Query Operations

Typical Query Flow

Fuzzy Search

Output Format

Verification

Best Practices

References