duckdb-docs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

You are helping the user find relevant DuckDB or DuckLake documentation.

Query:

$@

Follow these steps in order.

你正在帮助用户查找相关的DuckDB或DuckLake文档。

查询：

$@

请按顺序遵循以下步骤。

Step 1 — Check DuckDB is installed

步骤1 — 检查DuckDB是否已安装

bash

command -v duckdb

If not found, delegate to

/duckdb-skills:install-duckdb

and then continue.

bash

command -v duckdb

如果未找到，调用

/duckdb-skills:install-duckdb

后继续执行。

Step 2 — Ensure required extensions are installed

步骤2 — 确保已安装所需扩展

bash

duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"

If this fails, report the error and stop.

bash

duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"

如果执行失败，上报错误并终止。

Step 3 — Choose the data source and extract search terms

步骤3 — 选择数据源并提取搜索词

The query is:

$@

查询内容为：

$@

Data source selection

数据源选择

There are two search indexes available:

Index	Remote URL	Local cache filename	Versions	Use when
DuckDB docs + blog	`https://duckdb.org/data/docs-search.duckdb`	`duckdb-docs.duckdb`	`lts` , `current` , `blog`	Default — any DuckDB question
DuckLake docs	`https://ducklake.select/data/docs-search.duckdb`	`ducklake-docs.duckdb`	`stable` , `preview`	Query mentions DuckLake, catalogs, or DuckLake-specific features

Both indexes share the same schema:

Column	Type	Description
`chunk_id`	`VARCHAR` (PK)	e.g. `stable/sql/functions/numeric#absx`
`page_title`	`VARCHAR`	Page title from front matter
`section`	`VARCHAR`	Section heading (null for page intros)
`breadcrumb`	`VARCHAR`	e.g. `SQL > Functions > Numeric`
`url`	`VARCHAR`	URL path with anchor
`version`	`VARCHAR`	See table above
`text`	`TEXT`	Full markdown of the chunk

By default, search DuckDB docs and filter to

version = 'lts'

. Use different versions when:

The user explicitly asks about
```
current
```
/nightly features →
```
version = 'current'
```
The user asks about a blog post or wants background/motivation →
```
version = 'blog'
```
The user asks about DuckLake → search the DuckLake index with
```
version = 'stable'
```
When unsure, omit the version filter to search across all versions.

有两个可用的搜索索引：

索引	远程URL	本地缓存文件名	版本	适用场景
DuckDB文档 + 博客	`https://duckdb.org/data/docs-search.duckdb`	`duckdb-docs.duckdb`	`lts` , `current` , `blog`	默认选项 — 所有DuckDB相关问题
DuckLake文档	`https://ducklake.select/data/docs-search.duckdb`	`ducklake-docs.duckdb`	`stable` , `preview`	查询提及DuckLake、catalog或DuckLake专属功能时

两个索引的schema完全一致：

列名	类型	描述
`chunk_id`	`VARCHAR` (主键)	例如 `stable/sql/functions/numeric#absx`
`page_title`	`VARCHAR`	前言中定义的页面标题
`section`	`VARCHAR`	章节标题（页面简介部分为null）
`breadcrumb`	`VARCHAR`	例如 `SQL > Functions > Numeric`
`url`	`VARCHAR`	带锚点的URL路径
`version`	`VARCHAR`	参考上方表格
`text`	`TEXT`	片段的完整markdown内容

默认情况下，搜索 DuckDB文档 并过滤

version = 'lts'

。满足以下条件时使用其他版本：

用户明确询问
```
current
```
/每日构建版本的功能 →
```
version = 'current'
```
用户询问博客文章或需要背景/设计动机 →
```
version = 'blog'
```
用户询问DuckLake相关内容 → 搜索DuckLake索引，使用
```
version = 'stable'
```
不确定时，去掉版本过滤条件，搜索所有版本。

Search terms

搜索词

If the input is a natural language question (e.g. "how do I find the most frequent value"), extract the key technical terms (nouns, function names, SQL keywords) to form a compact BM25 query string. Drop stop words like "how", "do", "I", "the".

If the input is already a function name or technical term (e.g.

arg_max

GROUP BY ALL

), use it as-is.

Use the extracted terms as

SEARCH_QUERY

in the next step.

如果输入是自然语言问题（例如"how do I find the most frequent value"），提取关键技术术语（名词、函数名、SQL关键字）组成精简的BM25查询字符串。去掉停用词，例如"how"、"do"、"I"、"the"等。

如果输入本身就是函数名或技术术语（例如

arg_max

、

GROUP BY ALL

），直接使用即可。

将提取的术语作为下一步的

SEARCH_QUERY

。

Step 4 — Ensure local cache is fresh

步骤4 — 确保本地缓存是最新的

The cache lives at

$HOME/.duckdb/docs/CACHE_FILENAME

(where

CACHE_FILENAME

duckdb-docs.duckdb

ducklake-docs.duckdb

per Step 3).

First, ensure the directory exists:

bash

mkdir -p "$HOME/.duckdb/docs"

Then check whether the cache file exists and is fresh (≤2 days old):

bash

CACHE_FILE="$HOME/.duckdb/docs/CACHE_FILENAME"
if [ -f "$CACHE_FILE" ]; then
    MTIME=$(stat -f %m "$CACHE_FILE" 2>/dev/null || stat -c %Y "$CACHE_FILE")
    CACHE_AGE_DAYS=$(( ( $(date +%s) - MTIME ) / 86400 ))
else
    CACHE_AGE_DAYS=999
fi
echo "Cache age: $CACHE_AGE_DAYS days"

If
CACHE_AGE_DAYS
≤ 2 → skip to Step 5.

Otherwise (stale or missing) → fetch the index:

bash

duckdb -c "
LOAD httpfs;
LOAD fts;
ATTACH 'REMOTE_URL' AS remote (READ_ONLY);
ATTACH '$HOME/.duckdb/docs/CACHE_FILENAME.tmp' AS tmp;
COPY FROM DATABASE remote TO tmp;
" && mv "$HOME/.duckdb/docs/CACHE_FILENAME.tmp" "$HOME/.duckdb/docs/CACHE_FILENAME"

Replace

REMOTE_URL

and

CACHE_FILENAME

per Step 3. If the fetch fails (network error), report the error and stop.

缓存位于

$HOME/.duckdb/docs/CACHE_FILENAME

（其中

CACHE_FILENAME

根据步骤3的选择为

duckdb-docs.duckdb

或

ducklake-docs.duckdb

）。

首先确保目录存在：

bash

mkdir -p "$HOME/.duckdb/docs"

然后检查缓存文件是否存在且未过期（≤2天）：

bash

CACHE_FILE="$HOME/.duckdb/docs/CACHE_FILENAME"
if [ -f "$CACHE_FILE" ]; then
    MTIME=$(stat -f %m "$CACHE_FILE" 2>/dev/null || stat -c %Y "$CACHE_FILE")
    CACHE_AGE_DAYS=$(( ( $(date +%s) - MTIME ) / 86400 ))
else
    CACHE_AGE_DAYS=999
fi
echo "Cache age: $CACHE_AGE_DAYS days"

如果
CACHE_AGE_DAYS
≤ 2 → 直接跳转到步骤5。

否则（缓存过期或不存在）→ 拉取索引：

bash

duckdb -c "
LOAD httpfs;
LOAD fts;
ATTACH 'REMOTE_URL' AS remote (READ_ONLY);
ATTACH '$HOME/.duckdb/docs/CACHE_FILENAME.tmp' AS tmp;
COPY FROM DATABASE remote TO tmp;
" && mv "$HOME/.duckdb/docs/CACHE_FILENAME.tmp" "$HOME/.duckdb/docs/CACHE_FILENAME"

根据步骤3替换

REMOTE_URL

和

CACHE_FILENAME

。如果拉取失败（网络错误），上报错误并终止。

Step 5 — Search the docs

步骤5 — 搜索文档

bash

duckdb "$HOME/.duckdb/docs/CACHE_FILENAME" -readonly -json -c "
LOAD fts;
SELECT
    chunk_id, page_title, section, breadcrumb, url, version, text,
    fts_main_docs_chunks.match_bm25(chunk_id, 'SEARCH_QUERY') AS score
FROM docs_chunks
WHERE score IS NOT NULL
  AND version = 'VERSION'
ORDER BY score DESC
LIMIT 8;
"

Replace

CACHE_FILENAME

SEARCH_QUERY

, and

VERSION

per Step 3. Remove the

AND version = 'VERSION'

line if searching across all versions.

If the user's question could benefit from both DuckDB docs and blog results, run two queries (one with

version = 'stable'

, one with

version = 'blog'

) or omit the version filter entirely.

bash

duckdb "$HOME/.duckdb/docs/CACHE_FILENAME" -readonly -json -c "
LOAD fts;
SELECT
    chunk_id, page_title, section, breadcrumb, url, version, text,
    fts_main_docs_chunks.match_bm25(chunk_id, 'SEARCH_QUERY') AS score
FROM docs_chunks
WHERE score IS NOT NULL
  AND version = 'VERSION'
ORDER BY score DESC
LIMIT 8;
"

根据步骤3替换

CACHE_FILENAME

、

SEARCH_QUERY

和

VERSION

。如果需要搜索所有版本，删除

AND version = 'VERSION'

这一行。

如果用户的问题可以同时从DuckDB文档和博客结果中获益，可以运行两次查询（一次使用

version = 'stable'

，一次使用

version = 'blog'

），或者直接去掉版本过滤条件。

Step 6 — Handle errors

步骤6 — 处理错误

Extension not installed (

httpfs

fts

not found): run

duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"

and retry.

ATTACH fails / network unreachable: inform the user that the docs index is unavailable and suggest checking their internet connection. The DuckDB index is hosted at
```
https://duckdb.org/data/docs-search.duckdb
```
and the DuckLake index at
```
https://ducklake.select/data/docs-search.duckdb
```
.
No results (all scores NULL or empty result set): try broadening the query — drop the least specific term, or try a single-word version of the query — then retry Step 5. If still no results, tell the user no matching documentation was found and suggest visiting https://duckdb.org/docs or https://ducklake.select/docs directly.

扩展未安装（找不到

httpfs

或

fts

）：执行

duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"

后重试。

ATTACH失败/网络不可达：告知用户文档索引不可用，建议检查网络连接。DuckDB索引托管在
```
https://duckdb.org/data/docs-search.duckdb
```
，DuckLake索引托管在
```
https://ducklake.select/data/docs-search.duckdb
```
。
无结果（所有score为NULL或结果集为空）：尝试放宽查询条件 — 去掉最不具体的术语，或者尝试单关键词查询 — 然后重试步骤5。如果仍然没有结果，告知用户未找到匹配的文档，建议直接访问https://duckdb.org/docs 或 https://ducklake.select/docs。

Step 7 — Present results

步骤7 — 展示结果

For each result chunk returned (ordered by score descending), format as:

undefined

对于返回的每个结果片段（按score降序排列），格式如下：

undefined

{section} — {page_title}

{url}

{text}


After presenting all chunks, synthesize a concise answer to the user's original question (`$@`) based on the retrieved documentation. If the chunks directly answer the question, lead with the answer before showing the sources.

{url}

{text}


展示完所有片段后，基于检索到的文档，对用户的原始问题（`$@`）生成简洁的回答。如果片段直接回答了问题，在展示来源之前先给出答案。