convert-file

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
You are helping the user convert a data file from one format to another using DuckDB.
Input file:
$0
Output file:
${1:-}
你将协助用户使用DuckDB将数据文件从一种格式转换为另一种格式。
输入文件:
$0
输出文件:
${1:-}

Step 1 — Resolve input and output

步骤1 — 解析输入与输出

Input:
$0
. If it's a bare filename (no
/
), resolve to a full path with
find "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null | head -1
.
Output: If
$1
is provided, use it as the output path. If not, default to the same stem as the input with a
.parquet
extension (e.g.,
data.csv
data.parquet
).
Infer the output format from the output file extension:
ExtensionFormat clause
.parquet
,
.pq
(default, no clause needed)
.csv
(FORMAT csv, HEADER)
.tsv
(FORMAT csv, HEADER, DELIMITER '\t')
.json
(FORMAT json, ARRAY true)
.jsonl
,
.ndjson
(FORMAT json, ARRAY false)
.xlsx
(FORMAT xlsx)
— requires
INSTALL excel; LOAD excel;
.geojson
(FORMAT GDAL, DRIVER 'GeoJSON')
— requires
LOAD spatial;
.gpkg
(FORMAT GDAL, DRIVER 'GPKG')
— requires
LOAD spatial;
.shp
(FORMAT GDAL, DRIVER 'ESRI Shapefile')
— requires
LOAD spatial;
输入
$0
。如果是纯文件名(无
/
),使用
find "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null | head -1
解析为完整路径。
输出:如果提供了
$1
,则将其作为输出路径。如果未提供,默认使用与输入文件相同的文件名前缀,后缀改为
.parquet
(例如:
data.csv
data.parquet
)。
根据输出文件的扩展名推断输出格式:
扩展名格式子句
.parquet
,
.pq
(默认,无需子句)
.csv
(FORMAT csv, HEADER)
.tsv
(FORMAT csv, HEADER, DELIMITER '\t')
.json
(FORMAT json, ARRAY true)
.jsonl
,
.ndjson
(FORMAT json, ARRAY false)
.xlsx
(FORMAT xlsx)
— 需要执行
INSTALL excel; LOAD excel;
.geojson
(FORMAT GDAL, DRIVER 'GeoJSON')
— 需要执行
LOAD spatial;
.gpkg
(FORMAT GDAL, DRIVER 'GPKG')
— 需要执行
LOAD spatial;
.shp
(FORMAT GDAL, DRIVER 'ESRI Shapefile')
— 需要执行
LOAD spatial;

Step 2 — Convert

步骤2 — 执行转换

Run a single DuckDB command. Prepend extension loads as needed based on both the input and output formats.
bash
duckdb -c "
<EXTENSION_LOADS>
COPY (FROM '<INPUT_PATH>') TO '<OUTPUT_PATH>' <FORMAT_CLAUSE>;
"
For remote inputs (
s3://
,
https://
, etc.), prepend the same protocol setup as
read-file
:
ProtocolPrepend
s3://
LOAD httpfs; CREATE SECRET (TYPE S3, PROVIDER credential_chain);
gs://
/
gcs://
LOAD httpfs; CREATE SECRET (TYPE GCS, PROVIDER credential_chain);
https://
/
http://
LOAD httpfs;
If the user mentions partitioning (e.g., "partition by year"), add
PARTITION_BY (col)
to the format clause. This only works with Parquet and CSV output.
If the user mentions compression (e.g., "use zstd"), add
CODEC 'zstd'
for Parquet output.
运行单个DuckDB命令。根据输入和输出格式的需要,预先加载对应的扩展。
bash
duckdb -c "
<EXTENSION_LOADS>
COPY (FROM '<INPUT_PATH>') TO '<OUTPUT_PATH>' <FORMAT_CLAUSE>;
"
对于远程输入(如
s3://
https://
等),预先添加与
read-file
相同的协议配置:
协议预添加内容
s3://
LOAD httpfs; CREATE SECRET (TYPE S3, PROVIDER credential_chain);
gs://
/
gcs://
LOAD httpfs; CREATE SECRET (TYPE GCS, PROVIDER credential_chain);
https://
/
http://
LOAD httpfs;
如果用户提及分区(例如:“按年份分区”),在格式子句中添加
PARTITION_BY (col)
。此功能仅适用于Parquet和CSV输出格式。
如果用户提及压缩(例如:“使用zstd”),对于Parquet输出,添加
CODEC 'zstd'

Step 3 — Report

步骤3 — 结果报告

On success, report:
  • Input file and detected format
  • Output file, format, and size (
    ls -lh
    )
  • Row count if quick to compute
On failure:
  • duckdb: command not found
    → delegate to
    /duckdb-skills:install-duckdb
  • Missing extension → install it and retry
  • Input parse error → suggest the user check the input format or try
    /duckdb-skills:read-file
    first to inspect it
转换成功时,报告以下内容:
  • 输入文件及其检测到的格式
  • 输出文件、格式及大小(使用
    ls -lh
    命令获取)
  • 如果可快速计算,还需报告行数
转换失败时:
  • duckdb: command not found
    → 调用
    /duckdb-skills:install-duckdb
    进行处理
  • 缺少扩展 → 安装对应扩展并重试
  • 输入解析错误 → 建议用户检查输入格式,或先使用
    /duckdb-skills:read-file
    工具查看文件内容