convert-file

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

You are helping the user convert a data file from one format to another using DuckDB.

Input file:

$0

Output file:

${1:-}

你将协助用户使用DuckDB将数据文件从一种格式转换为另一种格式。

输入文件：

$0

输出文件：

${1:-}

Step 1 — Resolve input and output

步骤1 — 解析输入与输出

Input:

$0

. If it's a bare filename (no

), resolve to a full path with

find "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null | head -1

Output: If

$1

is provided, use it as the output path. If not, default to the same stem as the input with a

.parquet

extension (e.g.,

data.csv

→

data.parquet

Infer the output format from the output file extension:

Extension	Format clause
`.parquet` , `.pq`	(default, no clause needed)
`.csv`	`(FORMAT csv, HEADER)`
`.tsv`	`(FORMAT csv, HEADER, DELIMITER '\t')`
`.json`	`(FORMAT json, ARRAY true)`
`.jsonl` , `.ndjson`	`(FORMAT json, ARRAY false)`
`.xlsx`	`(FORMAT xlsx)` — requires `INSTALL excel; LOAD excel;`
`.geojson`	`(FORMAT GDAL, DRIVER 'GeoJSON')` — requires `LOAD spatial;`
`.gpkg`	`(FORMAT GDAL, DRIVER 'GPKG')` — requires `LOAD spatial;`
`.shp`	`(FORMAT GDAL, DRIVER 'ESRI Shapefile')` — requires `LOAD spatial;`

输入：

$0

。如果是纯文件名（无

），使用

find "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null | head -1

解析为完整路径。

输出：如果提供了

$1

，则将其作为输出路径。如果未提供，默认使用与输入文件相同的文件名前缀，后缀改为

.parquet

（例如：

data.csv

→

data.parquet

）。

根据输出文件的扩展名推断输出格式：

扩展名	格式子句
`.parquet` , `.pq`	（默认，无需子句）
`.csv`	`(FORMAT csv, HEADER)`
`.tsv`	`(FORMAT csv, HEADER, DELIMITER '\t')`
`.json`	`(FORMAT json, ARRAY true)`
`.jsonl` , `.ndjson`	`(FORMAT json, ARRAY false)`
`.xlsx`	`(FORMAT xlsx)` — 需要执行 `INSTALL excel; LOAD excel;`
`.geojson`	`(FORMAT GDAL, DRIVER 'GeoJSON')` — 需要执行 `LOAD spatial;`
`.gpkg`	`(FORMAT GDAL, DRIVER 'GPKG')` — 需要执行 `LOAD spatial;`
`.shp`	`(FORMAT GDAL, DRIVER 'ESRI Shapefile')` — 需要执行 `LOAD spatial;`

Step 2 — Convert

步骤2 — 执行转换

Run a single DuckDB command. Prepend extension loads as needed based on both the input and output formats.

bash

duckdb -c "
<EXTENSION_LOADS>
COPY (FROM '<INPUT_PATH>') TO '<OUTPUT_PATH>' <FORMAT_CLAUSE>;
"

For remote inputs (

s3://

https://

, etc.), prepend the same protocol setup as

read-file

Protocol	Prepend
`s3://`	`LOAD httpfs; CREATE SECRET (TYPE S3, PROVIDER credential_chain);`
`gs://` / `gcs://`	`LOAD httpfs; CREATE SECRET (TYPE GCS, PROVIDER credential_chain);`
`https://` / `http://`	`LOAD httpfs;`

If the user mentions partitioning (e.g., "partition by year"), add

PARTITION_BY (col)

to the format clause. This only works with Parquet and CSV output.

If the user mentions compression (e.g., "use zstd"), add

CODEC 'zstd'

for Parquet output.

运行单个DuckDB命令。根据输入和输出格式的需要，预先加载对应的扩展。

bash

duckdb -c "
<EXTENSION_LOADS>
COPY (FROM '<INPUT_PATH>') TO '<OUTPUT_PATH>' <FORMAT_CLAUSE>;
"

对于远程输入（如

s3://

、

https://

等），预先添加与

read-file

相同的协议配置：

协议	预添加内容
`s3://`	`LOAD httpfs; CREATE SECRET (TYPE S3, PROVIDER credential_chain);`
`gs://` / `gcs://`	`LOAD httpfs; CREATE SECRET (TYPE GCS, PROVIDER credential_chain);`
`https://` / `http://`	`LOAD httpfs;`

如果用户提及分区（例如：“按年份分区”），在格式子句中添加

PARTITION_BY (col)

。此功能仅适用于Parquet和CSV输出格式。

如果用户提及压缩（例如：“使用zstd”），对于Parquet输出，添加

CODEC 'zstd'

。

Step 3 — Report

步骤3 — 结果报告

On success, report:

Input file and detected format
Output file, format, and size (
```
ls -lh
```
)
Row count if quick to compute

On failure:

duckdb: command not found
→ delegate to

/duckdb-skills:install-duckdb

Missing extension → install it and retry
Input parse error → suggest the user check the input format or try
```
/duckdb-skills:read-file
```
first to inspect it

转换成功时，报告以下内容：

输入文件及其检测到的格式
输出文件、格式及大小（使用
```
ls -lh
```
命令获取）
如果可快速计算，还需报告行数

转换失败时：

duckdb: command not found
→ 调用

/duckdb-skills:install-duckdb

进行处理

缺少扩展 → 安装对应扩展并重试
输入解析错误 → 建议用户检查输入格式，或先使用
```
/duckdb-skills:read-file
```
工具查看文件内容