read-file
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYou are helping the user read and analyze a data file using DuckDB.
Filename given:
Question:
$0${1:-describe the data}你正在帮助用户使用DuckDB读取和分析数据文件。
给定文件名:
问题:
$0${1:-describe the data}Step 1 — Read it
步骤1 — 读取文件
RESOLVED_PATH$0/findRun a single DuckDB command that defines the macro inline and reads the file.
read_anyFor remote files, prepend the necessary LOAD/SECRET before the macro:
| Protocol | Prepend |
|---|---|
| |
| |
| |
| |
For local files, no prefix needed.
bash
duckdb -csv -c "
CREATE OR REPLACE MACRO read_any(file_name) AS TABLE
WITH json_case AS (FROM read_json_auto(file_name))
, csv_case AS (FROM read_csv(file_name))
, parquet_case AS (FROM read_parquet(file_name))
, avro_case AS (FROM read_avro(file_name))
, blob_case AS (FROM read_blob(file_name))
, spatial_case AS (FROM st_read(file_name))
, excel_case AS (FROM read_xlsx(file_name))
, sqlite_case AS (FROM sqlite_scan(file_name, (SELECT name FROM sqlite_master(file_name) LIMIT 1)))
, ipynb_case AS (
WITH nb AS (FROM read_json_auto(file_name))
SELECT cell_idx, cell.cell_type,
array_to_string(cell.source, '') AS source,
cell.execution_count
FROM nb, UNNEST(cells) WITH ORDINALITY AS t(cell, cell_idx)
ORDER BY cell_idx
)
FROM query_table(
CASE
WHEN file_name ILIKE '%.json' OR file_name ILIKE '%.jsonl' OR file_name ILIKE '%.ndjson' OR file_name ILIKE '%.geojson' OR file_name ILIKE '%.geojsonl' OR file_name ILIKE '%.har' THEN 'json_case'
WHEN file_name ILIKE '%.csv' OR file_name ILIKE '%.tsv' OR file_name ILIKE '%.tab' OR file_name ILIKE '%.txt' THEN 'csv_case'
WHEN file_name ILIKE '%.parquet' OR file_name ILIKE '%.pq' THEN 'parquet_case'
WHEN file_name ILIKE '%.avro' THEN 'avro_case'
WHEN file_name ILIKE '%.xlsx' OR file_name ILIKE '%.xls' THEN 'excel_case'
WHEN file_name ILIKE '%.shp' OR file_name ILIKE '%.gpkg' OR file_name ILIKE '%.fgb' OR file_name ILIKE '%.kml' THEN 'spatial_case'
WHEN file_name ILIKE '%.ipynb' THEN 'ipynb_case'
WHEN file_name ILIKE '%.db' OR file_name ILIKE '%.sqlite' OR file_name ILIKE '%.sqlite3' THEN 'sqlite_case'
ELSE 'blob_case'
END
);
DESCRIBE FROM read_any('RESOLVED_PATH');
SELECT count(*) AS row_count FROM read_any('RESOLVED_PATH');
FROM read_any('RESOLVED_PATH') LIMIT 20;
"If this fails:
- → invoke
duckdb: command not foundand retry./duckdb-skills:install-duckdb - Missing extension (e.g. spatial files, xlsx, sqlite) → retry with or
INSTALL spatial; LOAD spatial;prepended before the macro.INSTALL sqlite_scanner; LOAD sqlite_scanner; - Wrong reader / parse error → use the correct function directly instead of
read_*.read_any
RESOLVED_PATH$0/find运行单个DuckDB命令,内联定义宏并读取文件。
read_any针对远程文件,需要在宏之前添加必要的LOAD/SECRET语句:
| 协议 | 前置语句 |
|---|---|
| |
| |
| |
| |
针对本地文件,无需添加前缀。
bash
duckdb -csv -c "
CREATE OR REPLACE MACRO read_any(file_name) AS TABLE
WITH json_case AS (FROM read_json_auto(file_name))
, csv_case AS (FROM read_csv(file_name))
, parquet_case AS (FROM read_parquet(file_name))
, avro_case AS (FROM read_avro(file_name))
, blob_case AS (FROM read_blob(file_name))
, spatial_case AS (FROM st_read(file_name))
, excel_case AS (FROM read_xlsx(file_name))
, sqlite_case AS (FROM sqlite_scan(file_name, (SELECT name FROM sqlite_master(file_name) LIMIT 1)))
, ipynb_case AS (
WITH nb AS (FROM read_json_auto(file_name))
SELECT cell_idx, cell.cell_type,
array_to_string(cell.source, '') AS source,
cell.execution_count
FROM nb, UNNEST(cells) WITH ORDINALITY AS t(cell, cell_idx)
ORDER BY cell_idx
)
FROM query_table(
CASE
WHEN file_name ILIKE '%.json' OR file_name ILIKE '%.jsonl' OR file_name ILIKE '%.ndjson' OR file_name ILIKE '%.geojson' OR file_name ILIKE '%.geojsonl' OR file_name ILIKE '%.har' THEN 'json_case'
WHEN file_name ILIKE '%.csv' OR file_name ILIKE '%.tsv' OR file_name ILIKE '%.tab' OR file_name ILIKE '%.txt' THEN 'csv_case'
WHEN file_name ILIKE '%.parquet' OR file_name ILIKE '%.pq' THEN 'parquet_case'
WHEN file_name ILIKE '%.avro' THEN 'avro_case'
WHEN file_name ILIKE '%.xlsx' OR file_name ILIKE '%.xls' THEN 'excel_case'
WHEN file_name ILIKE '%.shp' OR file_name ILIKE '%.gpkg' OR file_name ILIKE '%.fgb' OR file_name ILIKE '%.kml' THEN 'spatial_case'
WHEN file_name ILIKE '%.ipynb' THEN 'ipynb_case'
WHEN file_name ILIKE '%.db' OR file_name ILIKE '%.sqlite' OR file_name ILIKE '%.sqlite3' THEN 'sqlite_case'
ELSE 'blob_case'
END
);
DESCRIBE FROM read_any('RESOLVED_PATH');
SELECT count(*) AS row_count FROM read_any('RESOLVED_PATH');
FROM read_any('RESOLVED_PATH') LIMIT 20;
"如果执行失败:
- → 调用
duckdb: command not found后重试。/duckdb-skills:install-duckdb - 缺少扩展(例如空间文件、xlsx、sqlite)→ 在宏之前添加或
INSTALL spatial; LOAD spatial;后重试。INSTALL sqlite_scanner; LOAD sqlite_scanner; - 读取器不匹配/解析错误 → 直接使用对应的函数替代
read_*。read_any
Step 2 — Answer
步骤2 — 回复用户
Using the schema, row count, and sample rows, answer:
${1:-describe the data: summarize column types, row count, and any notable patterns.}基于返回的表结构、行数和示例行,回复以下问题:
${1:-describe the data: summarize column types, row count, and any notable patterns.}