duckdb-docs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
You are helping the user find relevant DuckDB or DuckLake documentation.
Query:
$@
Follow these steps in order.
你正在帮助用户查找相关的DuckDB或DuckLake文档。
查询:
$@
请按顺序遵循以下步骤。

Step 1 — Check DuckDB is installed

步骤1 — 检查DuckDB是否已安装

bash
command -v duckdb
If not found, delegate to
/duckdb-skills:install-duckdb
and then continue.
bash
command -v duckdb
如果未找到,调用
/duckdb-skills:install-duckdb
后继续执行。

Step 2 — Ensure required extensions are installed

步骤2 — 确保已安装所需扩展

bash
duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"
If this fails, report the error and stop.
bash
duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"
如果执行失败,上报错误并终止。

Step 3 — Choose the data source and extract search terms

步骤3 — 选择数据源并提取搜索词

The query is:
$@
查询内容为:
$@

Data source selection

数据源选择

There are two search indexes available:
IndexRemote URLLocal cache filenameVersionsUse when
DuckDB docs + blog
https://duckdb.org/data/docs-search.duckdb
duckdb-docs.duckdb
lts
,
current
,
blog
Default — any DuckDB question
DuckLake docs
https://ducklake.select/data/docs-search.duckdb
ducklake-docs.duckdb
stable
,
preview
Query mentions DuckLake, catalogs, or DuckLake-specific features
Both indexes share the same schema:
ColumnTypeDescription
chunk_id
VARCHAR
(PK)
e.g.
stable/sql/functions/numeric#absx
page_title
VARCHAR
Page title from front matter
section
VARCHAR
Section heading (null for page intros)
breadcrumb
VARCHAR
e.g.
SQL > Functions > Numeric
url
VARCHAR
URL path with anchor
version
VARCHAR
See table above
text
TEXT
Full markdown of the chunk
By default, search DuckDB docs and filter to
version = 'lts'
. Use different versions when:
  • The user explicitly asks about
    current
    /nightly features →
    version = 'current'
  • The user asks about a blog post or wants background/motivation →
    version = 'blog'
  • The user asks about DuckLake → search the DuckLake index with
    version = 'stable'
  • When unsure, omit the version filter to search across all versions.
有两个可用的搜索索引:
索引远程URL本地缓存文件名版本适用场景
DuckDB文档 + 博客
https://duckdb.org/data/docs-search.duckdb
duckdb-docs.duckdb
lts
,
current
,
blog
默认选项 — 所有DuckDB相关问题
DuckLake文档
https://ducklake.select/data/docs-search.duckdb
ducklake-docs.duckdb
stable
,
preview
查询提及DuckLake、catalog或DuckLake专属功能时
两个索引的schema完全一致:
列名类型描述
chunk_id
VARCHAR
(主键)
例如
stable/sql/functions/numeric#absx
page_title
VARCHAR
前言中定义的页面标题
section
VARCHAR
章节标题(页面简介部分为null)
breadcrumb
VARCHAR
例如
SQL > Functions > Numeric
url
VARCHAR
带锚点的URL路径
version
VARCHAR
参考上方表格
text
TEXT
片段的完整markdown内容
默认情况下,搜索 DuckDB文档 并过滤
version = 'lts'
。满足以下条件时使用其他版本:
  • 用户明确询问
    current
    /每日构建版本的功能 →
    version = 'current'
  • 用户询问博客文章或需要背景/设计动机 →
    version = 'blog'
  • 用户询问DuckLake相关内容 → 搜索DuckLake索引,使用
    version = 'stable'
  • 不确定时,去掉版本过滤条件,搜索所有版本。

Search terms

搜索词

If the input is a natural language question (e.g. "how do I find the most frequent value"), extract the key technical terms (nouns, function names, SQL keywords) to form a compact BM25 query string. Drop stop words like "how", "do", "I", "the".
If the input is already a function name or technical term (e.g.
arg_max
,
GROUP BY ALL
), use it as-is.
Use the extracted terms as
SEARCH_QUERY
in the next step.
如果输入是自然语言问题(例如"how do I find the most frequent value"),提取关键技术术语(名词、函数名、SQL关键字)组成精简的BM25查询字符串。去掉停用词,例如"how"、"do"、"I"、"the"等。
如果输入本身就是函数名或技术术语(例如
arg_max
GROUP BY ALL
),直接使用即可。
将提取的术语作为下一步的
SEARCH_QUERY

Step 4 — Ensure local cache is fresh

步骤4 — 确保本地缓存是最新的

The cache lives at
$HOME/.duckdb/docs/CACHE_FILENAME
(where
CACHE_FILENAME
is
duckdb-docs.duckdb
or
ducklake-docs.duckdb
per Step 3).
First, ensure the directory exists:
bash
mkdir -p "$HOME/.duckdb/docs"
Then check whether the cache file exists and is fresh (≤2 days old):
bash
CACHE_FILE="$HOME/.duckdb/docs/CACHE_FILENAME"
if [ -f "$CACHE_FILE" ]; then
    MTIME=$(stat -f %m "$CACHE_FILE" 2>/dev/null || stat -c %Y "$CACHE_FILE")
    CACHE_AGE_DAYS=$(( ( $(date +%s) - MTIME ) / 86400 ))
else
    CACHE_AGE_DAYS=999
fi
echo "Cache age: $CACHE_AGE_DAYS days"
If
CACHE_AGE_DAYS
≤ 2
→ skip to Step 5.
Otherwise (stale or missing) → fetch the index:
bash
duckdb -c "
LOAD httpfs;
LOAD fts;
ATTACH 'REMOTE_URL' AS remote (READ_ONLY);
ATTACH '$HOME/.duckdb/docs/CACHE_FILENAME.tmp' AS tmp;
COPY FROM DATABASE remote TO tmp;
" && mv "$HOME/.duckdb/docs/CACHE_FILENAME.tmp" "$HOME/.duckdb/docs/CACHE_FILENAME"
Replace
REMOTE_URL
and
CACHE_FILENAME
per Step 3. If the fetch fails (network error), report the error and stop.
缓存位于
$HOME/.duckdb/docs/CACHE_FILENAME
(其中
CACHE_FILENAME
根据步骤3的选择为
duckdb-docs.duckdb
ducklake-docs.duckdb
)。
首先确保目录存在:
bash
mkdir -p "$HOME/.duckdb/docs"
然后检查缓存文件是否存在且未过期(≤2天):
bash
CACHE_FILE="$HOME/.duckdb/docs/CACHE_FILENAME"
if [ -f "$CACHE_FILE" ]; then
    MTIME=$(stat -f %m "$CACHE_FILE" 2>/dev/null || stat -c %Y "$CACHE_FILE")
    CACHE_AGE_DAYS=$(( ( $(date +%s) - MTIME ) / 86400 ))
else
    CACHE_AGE_DAYS=999
fi
echo "Cache age: $CACHE_AGE_DAYS days"
如果
CACHE_AGE_DAYS
≤ 2
→ 直接跳转到步骤5。
否则(缓存过期或不存在)→ 拉取索引:
bash
duckdb -c "
LOAD httpfs;
LOAD fts;
ATTACH 'REMOTE_URL' AS remote (READ_ONLY);
ATTACH '$HOME/.duckdb/docs/CACHE_FILENAME.tmp' AS tmp;
COPY FROM DATABASE remote TO tmp;
" && mv "$HOME/.duckdb/docs/CACHE_FILENAME.tmp" "$HOME/.duckdb/docs/CACHE_FILENAME"
根据步骤3替换
REMOTE_URL
CACHE_FILENAME
。如果拉取失败(网络错误),上报错误并终止。

Step 5 — Search the docs

步骤5 — 搜索文档

bash
duckdb "$HOME/.duckdb/docs/CACHE_FILENAME" -readonly -json -c "
LOAD fts;
SELECT
    chunk_id, page_title, section, breadcrumb, url, version, text,
    fts_main_docs_chunks.match_bm25(chunk_id, 'SEARCH_QUERY') AS score
FROM docs_chunks
WHERE score IS NOT NULL
  AND version = 'VERSION'
ORDER BY score DESC
LIMIT 8;
"
Replace
CACHE_FILENAME
,
SEARCH_QUERY
, and
VERSION
per Step 3. Remove the
AND version = 'VERSION'
line if searching across all versions.
If the user's question could benefit from both DuckDB docs and blog results, run two queries (one with
version = 'stable'
, one with
version = 'blog'
) or omit the version filter entirely.
bash
duckdb "$HOME/.duckdb/docs/CACHE_FILENAME" -readonly -json -c "
LOAD fts;
SELECT
    chunk_id, page_title, section, breadcrumb, url, version, text,
    fts_main_docs_chunks.match_bm25(chunk_id, 'SEARCH_QUERY') AS score
FROM docs_chunks
WHERE score IS NOT NULL
  AND version = 'VERSION'
ORDER BY score DESC
LIMIT 8;
"
根据步骤3替换
CACHE_FILENAME
SEARCH_QUERY
VERSION
。如果需要搜索所有版本,删除
AND version = 'VERSION'
这一行。
如果用户的问题可以同时从DuckDB文档和博客结果中获益,可以运行两次查询(一次使用
version = 'stable'
,一次使用
version = 'blog'
),或者直接去掉版本过滤条件。

Step 6 — Handle errors

步骤6 — 处理错误

  • Extension not installed (
    httpfs
    or
    fts
    not found): run
    duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"
    and retry.
  • ATTACH fails / network unreachable: inform the user that the docs index is unavailable and suggest checking their internet connection. The DuckDB index is hosted at
    https://duckdb.org/data/docs-search.duckdb
    and the DuckLake index at
    https://ducklake.select/data/docs-search.duckdb
    .
  • No results (all scores NULL or empty result set): try broadening the query — drop the least specific term, or try a single-word version of the query — then retry Step 5. If still no results, tell the user no matching documentation was found and suggest visiting https://duckdb.org/docs or https://ducklake.select/docs directly.
  • 扩展未安装(找不到
    httpfs
    fts
    ):执行
    duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"
    后重试。
  • ATTACH失败/网络不可达:告知用户文档索引不可用,建议检查网络连接。DuckDB索引托管在
    https://duckdb.org/data/docs-search.duckdb
    ,DuckLake索引托管在
    https://ducklake.select/data/docs-search.duckdb
  • 无结果(所有score为NULL或结果集为空):尝试放宽查询条件 — 去掉最不具体的术语,或者尝试单关键词查询 — 然后重试步骤5。如果仍然没有结果,告知用户未找到匹配的文档,建议直接访问https://duckdb.org/docshttps://ducklake.select/docs。

Step 7 — Present results

步骤7 — 展示结果

For each result chunk returned (ordered by score descending), format as:
undefined
对于返回的每个结果片段(按score降序排列),格式如下:
undefined

{section} — {page_title}

{section} — {page_title}

{url}
{text}


After presenting all chunks, synthesize a concise answer to the user's original question (`$@`) based on the retrieved documentation. If the chunks directly answer the question, lead with the answer before showing the sources.
{url}
{text}


展示完所有片段后,基于检索到的文档,对用户的原始问题(`$@`)生成简洁的回答。如果片段直接回答了问题,在展示来源之前先给出答案。