ktx-ai-data-context-layer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ktx AI Data Context Layer Skill

ktx AI 数据上下文层 Skill

Skill by ara.so — AI Agent Skills collection.
ara.so 提供的 Skill — AI Agent Skills 集合。

Overview

概述

ktx is a self-improving context layer that teaches AI agents how to query data warehouses accurately. It automatically builds and maintains approved metric definitions, detects joinable columns, absorbs business knowledge from wikis and semantic layers, and exposes everything through CLI and MCP (Model Context Protocol) tools.
Key capabilities:
  • Ingests from dbt, MetricFlow, LookML, Looker, Metabase, Notion
  • Auto-detects joinable columns and resolves fan/chasm traps
  • Combines warehouse metadata, semantic layers, and wiki knowledge
  • Exposes search and query tools via MCP for agent execution
  • Read-only by design — never writes to your warehouse
Supported warehouses: PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite
ktx 是一个自我优化的上下文层,能够教会AI Agent如何精准查询数据仓库。它会自动构建并维护已核准的指标定义,检测可关联列,从知识库和语义层中吸收业务知识,并通过CLI和MCP(Model Context Protocol)工具对外提供所有功能。
核心功能:
  • 支持从dbt、MetricFlow、LookML、Looker、Metabase、Notion导入数据
  • 自动检测可关联列并解决扇形/鸿沟陷阱问题
  • 整合仓库元数据、语义层和知识库内容
  • 通过MCP提供搜索和查询工具供Agent调用
  • 设计为只读模式——绝不会向你的数据仓库写入内容
支持的数据仓库: PostgreSQL、Snowflake、BigQuery、ClickHouse、MySQL、SQL Server、SQLite

Installation

安装

Global CLI Install

全局CLI安装

bash
npm install -g @kaelio/ktx
bash
npm install -g @kaelio/ktx

Project-Scoped Install

项目级安装

bash
npm install --save-dev @kaelio/ktx
npx ktx setup
bash
npm install --save-dev @kaelio/ktx
npx ktx setup

Verify Installation

验证安装

bash
ktx --version
ktx status
bash
ktx --version
ktx status

Initial Setup

初始设置

Interactive Setup Wizard

交互式设置向导

bash
ktx setup
The setup wizard will:
  1. Create or resume a
    ktx.yaml
    project configuration
  2. Configure LLM provider (Anthropic API, Vertex AI, AI Gateway, or Claude Code)
  3. Configure embedding provider (OpenAI, Voyage AI, or Google)
  4. Set up database connections (warehouse credentials)
  5. Configure context sources (dbt, Looker, Metabase, Notion, etc.)
  6. Run initial ingestion to build context
  7. Install agent integration (Codex, Claude Code, Cursor, OpenCode)
bash
ktx setup
设置向导将完成以下操作:
  1. 创建或恢复
    ktx.yaml
    项目配置文件
  2. 配置LLM提供商(Anthropic API、Vertex AI、AI Gateway或Claude Code)
  3. 配置嵌入提供商(OpenAI、Voyage AI或Google)
  4. 设置数据库连接(数据仓库凭证)
  5. 配置上下文源(dbt、Looker、Metabase、Notion等)
  6. 运行初始导入以构建上下文
  7. 安装Agent集成(Codex、Claude Code、Cursor、OpenCode)

Project Structure

项目结构

my-analytics-project/
├── ktx.yaml                         # Main configuration (commit this)
├── semantic-layer/warehouse/        # Generated semantic sources (commit)
├── wiki/global/                     # Shared business context (commit)
├── wiki/user/alice/                 # User-scoped notes (commit)
├── raw-sources/warehouse/           # Ingest artifacts (commit)
└── .ktx/                            # Local state and secrets (git-ignore)
my-analytics-project/
├── ktx.yaml                         # 主配置文件(需提交到版本控制)
├── semantic-layer/warehouse/        # 生成的语义源文件(需提交)
├── wiki/global/                     # 共享业务上下文(需提交)
├── wiki/user/alice/                 # 用户专属笔记(需提交)
├── raw-sources/warehouse/           # 导入产物(需提交)
└── .ktx/                            # 本地状态和密钥文件(需忽略git提交)

Manual Configuration

手动配置

Create
ktx.yaml
:
yaml
version: 1
project:
  name: my-analytics
  llm:
    provider: anthropic
    model: claude-sonnet-4-6
  embeddings:
    provider: openai
    model: text-embedding-3-small

databases:
  warehouse:
    type: postgres
    host: localhost
    port: 5432
    database: analytics
    schema: public
    # Credentials from env vars or .ktx/secrets.yaml

context_sources:
  dbt_main:
    type: dbt
    project_dir: ../dbt-project
    profiles_dir: ~/.dbt
    target: prod
Store secrets in
.ktx/secrets.yaml
(never commit):
yaml
databases:
  warehouse:
    user: ${POSTGRES_USER}
    password: ${POSTGRES_PASSWORD}

llm:
  anthropic_api_key: ${ANTHROPIC_API_KEY}

embeddings:
  openai_api_key: ${OPENAI_API_KEY}
创建
ktx.yaml
文件:
yaml
version: 1
project:
  name: my-analytics
  llm:
    provider: anthropic
    model: claude-sonnet-4-6
  embeddings:
    provider: openai
    model: text-embedding-3-small

databases:
  warehouse:
    type: postgres
    host: localhost
    port: 5432
    database: analytics
    schema: public
    # 凭证来自环境变量或 .ktx/secrets.yaml

context_sources:
  dbt_main:
    type: dbt
    project_dir: ../dbt-project
    profiles_dir: ~/.dbt
    target: prod
将密钥存储在
.ktx/secrets.yaml
(绝不要提交到版本控制):
yaml
databases:
  warehouse:
    user: ${POSTGRES_USER}
    password: ${POSTGRES_PASSWORD}

llm:
  anthropic_api_key: ${ANTHROPIC_API_KEY}

embeddings:
  openai_api_key: ${OPENAI_API_KEY}

Core Commands

核心命令

Check Project Status

检查项目状态

bash
ktx status
Example output:
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main, looker_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
bash
ktx status
示例输出:
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main, looker_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Build Context (Ingestion)

构建上下文(数据导入)

Ingest from all configured sources:
bash
ktx ingest
Ingest specific connection:
bash
ktx ingest --connection warehouse
Ingest specific context source:
bash
ktx ingest --source dbt_main
Force re-ingestion:
bash
ktx ingest --force
从所有已配置的源导入数据:
bash
ktx ingest
从指定连接导入数据:
bash
ktx ingest --connection warehouse
从指定上下文源导入数据:
bash
ktx ingest --source dbt_main
强制重新导入:
bash
ktx ingest --force

Search Semantic Layer

搜索语义层

bash
undefined
bash
undefined

Search for metrics and dimensions

搜索指标和维度

ktx sl "revenue" ktx sl "customer lifetime value"
ktx sl "revenue" ktx sl "customer lifetime value"

JSON output for scripting

输出JSON格式用于脚本开发

ktx sl "churn rate" --json
undefined
ktx sl "churn rate" --json
undefined

Search Wiki

搜索知识库

bash
undefined
bash
undefined

Search business knowledge

搜索业务知识

ktx wiki "refund policy" ktx wiki "how to calculate mrr"
ktx wiki "refund policy" ktx wiki "how to calculate mrr"

JSON output

输出JSON格式

ktx wiki "data retention" --json
undefined
ktx wiki "data retention" --json
undefined

Start MCP Server

启动MCP服务器

bash
undefined
bash
undefined

Auto-detect project and start MCP server

自动检测项目并启动MCP服务器

ktx mcp start
ktx mcp start

Specify project directory

指定项目目录

ktx mcp start --project-dir /path/to/project
ktx mcp start --project-dir /path/to/project

Custom port

自定义端口

ktx mcp start --port 3000
undefined
ktx mcp start --port 3000
undefined

LLM Configuration

LLM配置

Anthropic API

Anthropic API

yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
  api_key: ${ANTHROPIC_API_KEY}  # In .ktx/secrets.yaml
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
  api_key: ${ANTHROPIC_API_KEY}  # 配置在 .ktx/secrets.yaml

Google Vertex AI

Google Vertex AI

yaml
llm:
  provider: vertex
  model: claude-sonnet-4-6@20250514
  project_id: my-gcp-project
  region: us-central1
  # Uses Application Default Credentials
yaml
llm:
  provider: vertex
  model: claude-sonnet-4-6@20250514
  project_id: my-gcp-project
  region: us-central1
  # 使用应用默认凭证

AI Gateway

AI Gateway

yaml
llm:
  provider: ai-gateway
  model: claude-sonnet-4-6
  base_url: https://gateway.example.com/v1
  api_key: ${GATEWAY_API_KEY}
yaml
llm:
  provider: ai-gateway
  model: claude-sonnet-4-6
  base_url: https://gateway.example.com/v1
  api_key: ${GATEWAY_API_KEY}

Claude Code Session (Local)

Claude Code Session(本地)

yaml
llm:
  provider: claude-code
  # Uses local Claude Code session via SDK
yaml
llm:
  provider: claude-code
  # 通过SDK使用本地Claude Code会话

Database Connections

数据库连接

PostgreSQL

PostgreSQL

yaml
databases:
  analytics:
    type: postgres
    host: localhost
    port: 5432
    database: analytics
    schema: public
    user: ${PG_USER}
    password: ${PG_PASSWORD}
yaml
databases:
  analytics:
    type: postgres
    host: localhost
    port: 5432
    database: analytics
    schema: public
    user: ${PG_USER}
    password: ${PG_PASSWORD}

Snowflake

Snowflake

yaml
databases:
  warehouse:
    type: snowflake
    account: xy12345.us-east-1
    warehouse: COMPUTE_WH
    database: ANALYTICS
    schema: PUBLIC
    user: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
    role: ANALYST
yaml
databases:
  warehouse:
    type: snowflake
    account: xy12345.us-east-1
    warehouse: COMPUTE_WH
    database: ANALYTICS
    schema: PUBLIC
    user: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
    role: ANALYST

BigQuery

BigQuery

yaml
databases:
  bigquery:
    type: bigquery
    project_id: my-gcp-project
    dataset: analytics
    credentials_path: ${GOOGLE_APPLICATION_CREDENTIALS}
yaml
databases:
  bigquery:
    type: bigquery
    project_id: my-gcp-project
    dataset: analytics
    credentials_path: ${GOOGLE_APPLICATION_CREDENTIALS}

ClickHouse

ClickHouse

yaml
databases:
  events:
    type: clickhouse
    host: localhost
    port: 8123
    database: analytics
    user: ${CLICKHOUSE_USER}
    password: ${CLICKHOUSE_PASSWORD}
yaml
databases:
  events:
    type: clickhouse
    host: localhost
    port: 8123
    database: analytics
    user: ${CLICKHOUSE_USER}
    password: ${CLICKHOUSE_PASSWORD}

Context Sources

上下文源

dbt

dbt

yaml
context_sources:
  dbt_prod:
    type: dbt
    project_dir: ../dbt-project
    profiles_dir: ~/.dbt
    target: prod
    include_tests: true
    include_docs: true
yaml
context_sources:
  dbt_prod:
    type: dbt
    project_dir: ../dbt-project
    profiles_dir: ~/.dbt
    target: prod
    include_tests: true
    include_docs: true

Looker

Looker

yaml
context_sources:
  looker_main:
    type: looker
    base_url: https://company.looker.com
    client_id: ${LOOKER_CLIENT_ID}
    client_secret: ${LOOKER_CLIENT_SECRET}
    project: analytics
yaml
context_sources:
  looker_main:
    type: looker
    base_url: https://company.looker.com
    client_id: ${LOOKER_CLIENT_ID}
    client_secret: ${LOOKER_CLIENT_SECRET}
    project: analytics

Metabase

Metabase

yaml
context_sources:
  metabase:
    type: metabase
    base_url: https://metabase.company.com
    username: ${METABASE_USER}
    password: ${METABASE_PASSWORD}
    database_id: 1
yaml
context_sources:
  metabase:
    type: metabase
    base_url: https://metabase.company.com
    username: ${METABASE_USER}
    password: ${METABASE_PASSWORD}
    database_id: 1

Notion

Notion

yaml
context_sources:
  notion_wiki:
    type: notion
    api_key: ${NOTION_API_KEY}
    database_ids:
      - abc123def456
      - ghi789jkl012
yaml
context_sources:
  notion_wiki:
    type: notion
    api_key: ${NOTION_API_KEY}
    database_ids:
      - abc123def456
      - ghi789jkl012

Semantic Layer Usage

语义层使用

Search for Metrics

搜索指标

typescript
// Via CLI
$ ktx sl "monthly recurring revenue"
Output structure:
json
{
  "results": [
    {
      "type": "metric",
      "name": "mrr",
      "description": "Monthly Recurring Revenue",
      "calculation": "SUM(subscription_amount)",
      "dimensions": ["customer_id", "plan_type"],
      "filters": ["status = 'active'"],
      "source": "dbt_prod",
      "score": 0.95
    }
  ]
}
typescript
// 通过CLI执行
$ ktx sl "monthly recurring revenue"
输出结构:
json
{
  "results": [
    {
      "type": "metric",
      "name": "mrr",
      "description": "Monthly Recurring Revenue",
      "calculation": "SUM(subscription_amount)",
      "dimensions": ["customer_id", "plan_type"],
      "filters": ["status = 'active'"],
      "source": "dbt_prod",
      "score": 0.95
    }
  ]
}

Query Planning (Python API)

查询规划(Python API)

python
undefined
python
undefined

Install ktx semantic layer engine

安装ktx语义层引擎

uv add ktx-sl

uv add ktx-sl

from ktx_sl import SemanticLayer, Query
from ktx_sl import SemanticLayer, Query

Load semantic layer from ktx project

从ktx项目加载语义层

sl = SemanticLayer.from_project("/path/to/ktx-project")
sl = SemanticLayer.from_project("/path/to/ktx-project")

Define query

定义查询

query = Query( metrics=["mrr", "customer_count"], dimensions=["plan_type", "region"], filters=[{"field": "signup_date", "op": ">=", "value": "2024-01-01"}], order_by=[{"field": "mrr", "direction": "desc"}] )
query = Query( metrics=["mrr", "customer_count"], dimensions=["plan_type", "region"], filters=[{"field": "signup_date", "op": ">=", "value": "2024-01-01"}], order_by=[{"field": "mrr", "direction": "desc"}] )

Generate SQL

生成SQL语句

sql = sl.plan(query) print(sql)
undefined
sql = sl.plan(query) print(sql)
undefined

Semantic Layer YAML Format

语义层YAML格式

Example
semantic-layer/warehouse/subscriptions.yaml
:
yaml
version: 1
source: warehouse
table: public.subscriptions

metrics:
  - name: mrr
    description: Monthly Recurring Revenue
    type: sum
    sql: amount
    filters:
      - status = 'active'
      - billing_period = 'monthly'
    
  - name: customer_count
    description: Distinct active customers
    type: count_distinct
    sql: customer_id
    filters:
      - status = 'active'

dimensions:
  - name: customer_id
    type: string
    sql: customer_id
    
  - name: plan_type
    type: string
    sql: plan_type
    
  - name: region
    type: string
    sql: customer_region

joins:
  - to: customers
    type: left
    on: subscriptions.customer_id = customers.id
示例
semantic-layer/warehouse/subscriptions.yaml
yaml
version: 1
source: warehouse
table: public.subscriptions

metrics:
  - name: mrr
    description: Monthly Recurring Revenue
    type: sum
    sql: amount
    filters:
      - status = 'active'
      - billing_period = 'monthly'
    
  - name: customer_count
    description: Distinct active customers
    type: count_distinct
    sql: customer_id
    filters:
      - status = 'active'

dimensions:
  - name: customer_id
    type: string
    sql: customer_id
    
  - name: plan_type
    type: string
    sql: plan_type
    
  - name: region
    type: string
    sql: customer_region

joins:
  - to: customers
    type: left
    on: subscriptions.customer_id = customers.id

Wiki Management

知识库管理

Add Wiki Content Manually

手动添加知识库内容

Create
wiki/global/refund-policy.md
:
markdown
---
title: Refund Policy
tags: [finance, customer-success]
---
创建
wiki/global/refund-policy.md
markdown
---
title: Refund Policy
tags: [finance, customer-success]
---

Refund Policy

退款政策

Customers can request refunds within 30 days of purchase.
客户可在购买后30天内申请退款。

Calculation Rules

计算规则

  • Full refund: within 7 days
  • Prorated refund: 8-30 days
  • No refund: after 30 days
  • 全额退款:购买后7天内
  • 按比例退款:购买后8-30天
  • 不予退款:购买30天后

Impact on Metrics

对指标的影响

Refunds reduce
net_revenue
but not
gross_revenue
.
undefined
退款会减少
net_revenue
但不会影响
gross_revenue
undefined

Search Wiki from CLI

从CLI搜索知识库

bash
ktx wiki "refund policy"
ktx wiki "how to calculate churn"
bash
ktx wiki "refund policy"
ktx wiki "how to calculate churn"

Wiki Structure

知识库结构

wiki/
├── global/                    # Shared team knowledge
│   ├── metrics-glossary.md
│   ├── refund-policy.md
│   └── data-quality-rules.md
└── user/alice/                # User-scoped notes
    └── analysis-notes.md
wiki/
├── global/                    # 团队共享知识
│   ├── metrics-glossary.md
│   ├── refund-policy.md
│   └── data-quality-rules.md
└── user/alice/                # 用户专属笔记
    └── analysis-notes.md

MCP Integration

MCP集成

Start MCP Server

启动MCP服务器

bash
undefined
bash
undefined

Auto-detect project

自动检测项目

ktx mcp start
ktx mcp start

Explicit project path

指定项目路径

ktx mcp start --project-dir /path/to/analytics
undefined
ktx mcp start --project-dir /path/to/analytics
undefined

Claude Desktop Configuration

Claude Desktop配置

Add to
~/Library/Application Support/Claude/claude_desktop_config.json
:
json
{
  "mcpServers": {
    "ktx-analytics": {
      "command": "ktx",
      "args": [
        "mcp",
        "start",
        "--project-dir",
        "/path/to/analytics"
      ]
    }
  }
}
添加配置到
~/Library/Application Support/Claude/claude_desktop_config.json
json
{
  "mcpServers": {
    "ktx-analytics": {
      "command": "ktx",
      "args": [
        "mcp",
        "start",
        "--project-dir",
        "/path/to/analytics"
      ]
    }
  }
}

Codex Integration

Codex集成

bash
undefined
bash
undefined

Install ktx skill in Codex project

在Codex项目中安装ktx skill

npx skills add Kaelio/ktx --skill ktx
npx skills add Kaelio/ktx --skill ktx

Or via natural language

或通过自然语言指令

"Run npx skills add Kaelio/ktx --skill ktx and configure ktx for this project"

"Run npx skills add Kaelio/ktx --skill ktx and configure ktx for this project"

undefined
undefined

Available MCP Tools

可用的MCP工具

Once MCP server is running, agents can use:
  • ktx_search_semantic_layer
    — Search metrics and dimensions
  • ktx_search_wiki
    — Search business knowledge
  • ktx_get_metric_definition
    — Get detailed metric spec
  • ktx_list_connections
    — List configured warehouses
  • ktx_validate_query
    — Validate SQL against semantic layer
MCP服务器启动后,Agent可使用以下工具:
  • ktx_search_semantic_layer
    — 搜索指标和维度
  • ktx_search_wiki
    — 搜索业务知识
  • ktx_get_metric_definition
    — 获取详细指标规范
  • ktx_list_connections
    — 列出已配置的数据仓库
  • ktx_validate_query
    — 根据语义层验证SQL语句

Common Patterns

常用模式

Full Setup Workflow

完整设置流程

bash
undefined
bash
undefined

1. Install ktx

1. 安装ktx

npm install -g @kaelio/ktx
npm install -g @kaelio/ktx

2. Create project

2. 创建项目

cd /path/to/analytics ktx setup
cd /path/to/analytics ktx setup

3. Configure via interactive prompts:

3. 通过交互式提示完成配置:

- LLM: Anthropic API with claude-sonnet-4-6

- LLM:使用claude-sonnet-4-6的Anthropic API

- Embeddings: OpenAI text-embedding-3-small

- 嵌入:OpenAI text-embedding-3-small

- Database: PostgreSQL connection

- 数据库:PostgreSQL连接

- Sources: dbt project

- 源:dbt项目

4. Verify setup

4. 验证设置

ktx status
ktx status

5. Build context

5. 构建上下文

ktx ingest
ktx ingest

6. Test search

6. 测试搜索功能

ktx sl "revenue" ktx wiki "metric definitions"
ktx sl "revenue" ktx wiki "metric definitions"

7. Start MCP for agents

7. 启动MCP供Agent使用

ktx mcp start
undefined
ktx mcp start
undefined

Incremental Updates

增量更新

bash
undefined
bash
undefined

Add new dbt models

添加新的dbt模型

cd ../dbt-project dbt run --models new_model
cd ../dbt-project dbt run --models new_model

Ingest updates into ktx

将更新导入ktx

cd ../analytics ktx ingest --source dbt_prod
cd ../analytics ktx ingest --source dbt_prod

Verify new metrics are available

验证新指标是否可用

ktx sl "new_metric"
undefined
ktx sl "new_metric"
undefined

Multi-Warehouse Setup

多数据仓库设置

yaml
databases:
  production:
    type: postgres
    host: prod.db.company.com
    database: analytics
    schema: public
    user: ${PROD_DB_USER}
    password: ${PROD_DB_PASSWORD}
    
  events:
    type: clickhouse
    host: events.company.com
    database: analytics
    user: ${CLICKHOUSE_USER}
    password: ${CLICKHOUSE_PASSWORD}

context_sources:
  dbt_prod:
    type: dbt
    project_dir: ../dbt-project
    target: prod
    connection: production
    
  events_raw:
    type: raw
    connection: events
yaml
databases:
  production:
    type: postgres
    host: prod.db.company.com
    database: analytics
    schema: public
    user: ${PROD_DB_USER}
    password: ${PROD_DB_PASSWORD}
    
  events:
    type: clickhouse
    host: events.company.com
    database: analytics
    user: ${CLICKHOUSE_USER}
    password: ${CLICKHOUSE_PASSWORD}

context_sources:
  dbt_prod:
    type: dbt
    project_dir: ../dbt-project
    target: prod
    connection: production
    
  events_raw:
    type: raw
    connection: events

CI/CD Integration

CI/CD集成

yaml
undefined
yaml
undefined

.github/workflows/ktx-ingest.yml

.github/workflows/ktx-ingest.yml

name: Update ktx Context on: push: branches: [main] paths: - 'dbt-project/' - 'wiki/'
jobs: ingest: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
  - uses: actions/setup-node@v4
    with:
      node-version: 20
      
  - name: Install ktx
    run: npm install -g @kaelio/ktx
    
  - name: Configure secrets
    run: |
      mkdir -p .ktx
      echo "databases:" > .ktx/secrets.yaml
      echo "  warehouse:" >> .ktx/secrets.yaml
      echo "    user: ${{ secrets.DB_USER }}" >> .ktx/secrets.yaml
      echo "    password: ${{ secrets.DB_PASSWORD }}" >> .ktx/secrets.yaml
      
  - name: Ingest context
    run: ktx ingest --project-dir .
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      
  - name: Commit updates
    run: |
      git config user.name "ktx-bot"
      git config user.email "bot@company.com"
      git add semantic-layer/ raw-sources/
      git commit -m "Update ktx context" || exit 0
      git push
undefined
name: Update ktx Context on: push: branches: [main] paths: - 'dbt-project/' - 'wiki/'
jobs: ingest: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
  - uses: actions/setup-node@v4
    with:
      node-version: 20
      
  - name: Install ktx
    run: npm install -g @kaelio/ktx
    
  - name: Configure secrets
    run: |
      mkdir -p .ktx
      echo "databases:" > .ktx/secrets.yaml
      echo "  warehouse:" >> .ktx/secrets.yaml
      echo "    user: ${{ secrets.DB_USER }}" >> .ktx/secrets.yaml
      echo "    password: ${{ secrets.DB_PASSWORD }}" >> .ktx/secrets.yaml
      
  - name: Ingest context
    run: ktx ingest --project-dir .
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      
  - name: Commit updates
    run: |
      git config user.name "ktx-bot"
      git config user.email "bot@company.com"
      git add semantic-layer/ raw-sources/
      git commit -m "Update ktx context" || exit 0
      git push
undefined

Troubleshooting

故障排除

"No LLM provider configured"

"No LLM provider configured"

Solution: Run
ktx setup
to configure LLM provider, or manually add to
ktx.yaml
:
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
Add API key to
.ktx/secrets.yaml
:
yaml
llm:
  anthropic_api_key: ${ANTHROPIC_API_KEY}
解决方案: 运行
ktx setup
配置LLM提供商,或手动添加到
ktx.yaml
yaml
llm:
  provider: anthropic
  model: claude-sonnet-4-6
将API密钥添加到
.ktx/secrets.yaml
yaml
llm:
  anthropic_api_key: ${ANTHROPIC_API_KEY}

"Database connection failed"

"Database connection failed"

Check:
  1. Credentials in
    .ktx/secrets.yaml
    are correct
  2. Network access to database host
  3. Read-only user has SELECT permissions
Test connection:
bash
undefined
检查项:
  1. .ktx/secrets.yaml
    中的凭证是否正确
  2. 是否能够网络访问数据库主机
  3. 只读用户是否拥有SELECT权限
测试连接:
bash
undefined

PostgreSQL example

PostgreSQL示例

psql -h localhost -U ${POSTGRES_USER} -d analytics -c "SELECT 1"
undefined
psql -h localhost -U ${POSTGRES_USER} -d analytics -c "SELECT 1"
undefined

"ktx context built: no"

"ktx context built: no"

Solution: Run ingestion:
bash
ktx ingest --force
Check for errors in output. Common issues:
  • Missing database permissions
  • Invalid dbt profiles.yml
  • Unreachable context source URLs
解决方案: 运行导入命令:
bash
ktx ingest --force
检查输出中的错误信息。常见问题:
  • 缺少数据库权限
  • dbt profiles.yml无效
  • 上下文源URL无法访问

"No results found" when searching

"No results found" 搜索无结果

Check:
  1. Ingestion completed successfully:
    ktx status
  2. Semantic layer YAML files exist:
    ls semantic-layer/
  3. Wiki content exists:
    ls wiki/global/
Re-run ingestion:
bash
ktx ingest --source dbt_prod --force
检查项:
  1. 导入是否成功完成:
    ktx status
  2. 语义层YAML文件是否存在:
    ls semantic-layer/
  3. 知识库内容是否存在:
    ls wiki/global/
重新运行导入:
bash
ktx ingest --source dbt_prod --force

MCP Server Not Starting

MCP服务器无法启动

Check:
  1. No other process on port 3000:
    lsof -i :3000
  2. Project directory exists and has
    ktx.yaml
  3. Agent client configuration points to correct project path
Specify explicit project:
bash
ktx mcp start --project-dir /full/path/to/project
检查项:
  1. 端口3000是否被其他进程占用:
    lsof -i :3000
  2. 项目目录是否存在且包含
    ktx.yaml
  3. Agent客户端配置是否指向正确的项目路径
指定明确的项目路径:
bash
ktx mcp start --project-dir /full/path/to/project

dbt Ingestion Fails

dbt导入失败

Common issues:
  1. profiles.yml not found: Set
    profiles_dir
    in
    ktx.yaml
  2. Target not found: Verify
    target
    name matches profiles.yml
  3. Compilation errors: Run
    dbt compile
    manually to verify dbt project
Debug:
bash
undefined
常见问题:
  1. profiles.yml未找到:
    ktx.yaml
    中设置
    profiles_dir
  2. Target未找到: 验证
    target
    名称与profiles.yml中的一致
  3. 编译错误: 手动运行
    dbt compile
    验证dbt项目
调试:
bash
undefined

Verify dbt configuration

验证dbt配置

cd ../dbt-project dbt debug --profiles-dir ~/.dbt --target prod
cd ../dbt-project dbt debug --profiles-dir ~/.dbt --target prod

Check ktx can read dbt

检查ktx是否能读取dbt

ktx ingest --source dbt_prod --log-level debug
undefined
ktx ingest --source dbt_prod --log-level debug
undefined

Embedding Search Returns Poor Results

嵌入搜索结果质量不佳

Tune embedding quality:
  1. Use higher-quality embedding model:
yaml
embeddings:
  provider: voyage
  model: voyage-3
  1. Add more descriptive metadata to semantic layer YAML:
yaml
metrics:
  - name: mrr
    description: Monthly Recurring Revenue from active subscriptions
    tags: [revenue, subscription, saas]
    business_context: |
      MRR is the primary metric for tracking subscription health.
      Includes only active, monthly-billed customers.
优化嵌入质量:
  1. 使用更高质量的嵌入模型:
yaml
embeddings:
  provider: voyage
  model: voyage-3
  1. 为语义层YAML添加更多描述性元数据:
yaml
metrics:
  - name: mrr
    description: Monthly Recurring Revenue from active subscriptions
    tags: [revenue, subscription, saas]
    business_context: |
      MRR是跟踪订阅业务健康状况的核心指标。
      仅包含活跃的月付客户。

Large Projects Performance

大型项目性能优化

For projects with 100+ tables:
  1. Limit ingestion scope:
yaml
context_sources:
  dbt_prod:
    type: dbt
    include_patterns:
      - "marts/**"
      - "metrics/**"
    exclude_patterns:
      - "staging/**"
      - "tests/**"
  1. Increase sampling limits:
yaml
databases:
  warehouse:
    type: postgres
    sampling:
      max_tables: 500
      rows_per_table: 10000
  1. Run incremental ingestion:
bash
undefined
对于包含100+表的项目:
  1. 限制导入范围:
yaml
context_sources:
  dbt_prod:
    type: dbt
    include_patterns:
      - "marts/**"
      - "metrics/**"
    exclude_patterns:
      - "staging/**"
      - "tests/**"
  1. 提高采样限制:
yaml
databases:
  warehouse:
    type: postgres
    sampling:
      max_tables: 500
      rows_per_table: 10000
  1. 运行增量导入:
bash
undefined

Only ingest changed sources

仅导入有变化的源

ktx ingest --incremental
undefined
ktx ingest --incremental
undefined

Advanced Configuration

高级配置

Custom Sampling Strategy

自定义采样策略

yaml
databases:
  warehouse:
    type: postgres
    sampling:
      enabled: true
      max_tables: 200
      rows_per_table: 5000
      include_patterns:
        - "public.*"
        - "analytics.*"
      exclude_patterns:
        - "*_tmp"
        - "*_staging"
yaml
databases:
  warehouse:
    type: postgres
    sampling:
      enabled: true
      max_tables: 200
      rows_per_table: 5000
      include_patterns:
        - "public.*"
        - "analytics.*"
      exclude_patterns:
        - "*_tmp"
        - "*_staging"

Join Detection Tuning

关联检测调优

yaml
project:
  join_detection:
    enabled: true
    min_confidence: 0.85
    max_cardinality_ratio: 0.1
    require_foreign_keys: false
yaml
project:
  join_detection:
    enabled: true
    min_confidence: 0.85
    max_cardinality_ratio: 0.1
    require_foreign_keys: false

Wiki Ingestion from Git

从Git导入知识库

yaml
context_sources:
  team_wiki:
    type: git
    repo_url: https://github.com/company/analytics-wiki
    branch: main
    path: docs/
    auth_token: ${GITHUB_TOKEN}
yaml
context_sources:
  team_wiki:
    type: git
    repo_url: https://github.com/company/analytics-wiki
    branch: main
    path: docs/
    auth_token: ${GITHUB_TOKEN}

Project Resolution

项目解析规则

ktx resolves project directory in this order:
  1. --project-dir
    flag
  2. KTX_PROJECT_DIR
    environment variable
  3. Nearest
    ktx.yaml
    in current or parent directories
  4. Current working directory
For scripting, always use explicit path:
bash
ktx ingest --project-dir /path/to/project
ktx按以下顺序解析项目目录:
  1. --project-dir
    参数
  2. KTX_PROJECT_DIR
    环境变量
  3. 当前目录或父目录中最近的
    ktx.yaml
    文件
  4. 当前工作目录
脚本开发时,请始终使用明确路径:
bash
ktx ingest --project-dir /path/to/project

Environment Variables

环境变量

VariablePurpose
KTX_PROJECT_DIR
Default project directory
ANTHROPIC_API_KEY
Anthropic API key
OPENAI_API_KEY
OpenAI API key
POSTGRES_USER
PostgreSQL username
POSTGRES_PASSWORD
PostgreSQL password
SNOWFLAKE_USER
Snowflake username
SNOWFLAKE_PASSWORD
Snowflake password
GOOGLE_APPLICATION_CREDENTIALS
GCP service account key path
GITHUB_TOKEN
GitHub personal access token
Store in
.env
(git-ignored) or
.ktx/secrets.yaml
.
变量名用途
KTX_PROJECT_DIR
默认项目目录
ANTHROPIC_API_KEY
Anthropic API密钥
OPENAI_API_KEY
OpenAI API密钥
POSTGRES_USER
PostgreSQL用户名
POSTGRES_PASSWORD
PostgreSQL密码
SNOWFLAKE_USER
Snowflake用户名
SNOWFLAKE_PASSWORD
Snowflake密码
GOOGLE_APPLICATION_CREDENTIALS
GCP服务账号密钥路径
GITHUB_TOKEN
GitHub个人访问令牌
可存储在
.env
(需忽略git提交)或
.ktx/secrets.yaml
中。

CLI Reference Summary

CLI参考摘要

CommandPurpose
ktx setup
Create/resume project, configure providers
ktx status
Check project readiness
ktx ingest
Build context from configured sources
ktx sl <query>
Search semantic layer
ktx wiki <query>
Search wiki knowledge
ktx mcp start
Start MCP server for agents
ktx config get
Show current configuration
ktx config set
Update configuration
ktx connections list
List database connections
ktx sources list
List context sources
命令用途
ktx setup
创建/恢复项目,配置提供商
ktx status
检查项目就绪状态
ktx ingest
从已配置源构建上下文
ktx sl <query>
搜索语义层
ktx wiki <query>
搜索知识库
ktx mcp start
启动MCP服务器供Agent使用
ktx config get
显示当前配置
ktx config set
更新配置
ktx connections list
列出数据库连接
ktx sources list
列出上下文源