ktx-ai-data-agents-context-layer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ktx AI Data Agents Context Layer

ktx AI数据Agent上下文层

Skill by ara.so — AI Agent Skills collection.
ktx is a self-improving context layer that teaches AI agents how to query data warehouses accurately. It automatically builds a semantic layer from approved metric definitions, detects joinable columns, ingests business knowledge from wikis/dbt/BI tools, and exposes everything through CLI and MCP (Model Context Protocol) for agent execution.
Unlike general-purpose agents that re-explore your warehouse on every question or traditional semantic layers that require constant manual upkeep, ktx learns from company knowledge, maps your data stack automatically, and serves combined context to agents at runtime.
ara.so提供的Skill——AI Agent技能合集。
ktx是一个自我优化的上下文层,它教会AI Agent如何精准查询数据仓库。它会自动从已批准的指标定义构建语义层,可检测可关联的列,从知识库/dbt/BI工具中提取业务知识,并通过CLI和MCP(Model Context Protocol,模型上下文协议)将所有内容暴露给Agent执行。
与每次提问都要重新探索数据仓库的通用Agent,或需要持续手动维护的传统语义层不同,ktx会从企业知识中学习,自动映射你的数据栈,并在运行时为Agent提供合并后的上下文。

Installation

安装

Global CLI Installation

全局CLI安装

bash
npm install -g @kaelio/ktx
bash
npm install -g @kaelio/ktx

Project-Specific Installation

项目专属安装

bash
npm install --save-dev @kaelio/ktx
Or use with
npx
:
bash
npx @kaelio/ktx setup
bash
npm install --save-dev @kaelio/ktx
或使用
npx
bash
npx @kaelio/ktx setup

Quick Start

快速开始

Initial Setup

初始设置

bash
undefined
bash
undefined

Initialize or resume a ktx project

初始化或恢复ktx项目

ktx setup
ktx setup

Check project readiness

检查项目就绪状态

ktx status

The `ktx setup` command:
- Creates or resumes a local ktx project
- Configures LLM and embedding providers
- Sets up database connections
- Configures context sources (dbt, Looker, Metabase, Notion)
- Builds initial context
- Installs agent integration
ktx status

`ktx setup`命令会:
- 创建或恢复本地ktx项目
- 配置LLM和嵌入模型提供商
- 设置数据库连接
- 配置上下文源(dbt、Looker、Metabase、Notion)
- 构建初始上下文
- 安装Agent集成

Expected Status Output

预期状态输出

text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)
text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Project Structure

项目结构

text
my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources (commit)
├── wiki/global/                     # Shared business context (commit)
├── wiki/user/<user-id>/             # User-scoped notes (commit)
├── raw-sources/<connection-id>/     # Ingest artifacts and reports (commit)
└── .ktx/                            # Local state and secrets (git-ignore)
Commit:
ktx.yaml
,
semantic-layer/
,
wiki/
,
raw-sources/

Git-ignore:
.ktx/
text
my-project/
├── ktx.yaml                         # 项目配置文件
├── semantic-layer/<connection-id>/  # YAML语义源(需提交)
├── wiki/global/                     # 共享业务上下文(需提交)
├── wiki/user/<user-id>/             # 用户专属笔记(需提交)
├── raw-sources/<connection-id>/     # 提取的工件和报告(需提交)
└── .ktx/                            # 本地状态和密钥(需Git忽略)
需提交
ktx.yaml
semantic-layer/
wiki/
raw-sources/

Git忽略
.ktx/

Configuration

配置

ktx.yaml Structure

ktx.yaml结构

yaml
project:
  name: my-analytics-project
  version: 1.0.0

llm:
  provider: anthropic  # or google-vertex, ai-gateway, claude-code
  model: claude-sonnet-4-6
  # API key stored in .ktx/secrets.json

embeddings:
  provider: openai
  model: text-embedding-3-small
  # API key stored in .ktx/secrets.json

databases:
  warehouse:
    type: postgres  # or snowflake, bigquery, clickhouse, mysql, mssql, sqlite
    host: db.example.com
    port: 5432
    database: analytics
    schema: public
    # credentials stored in .ktx/secrets.json

context_sources:
  dbt_main:
    type: dbt
    manifest_path: ./target/manifest.json
  
  looker_main:
    type: looker
    base_url: https://company.looker.com
    # API credentials in .ktx/secrets.json
  
  notion_wiki:
    type: notion
    # API token in .ktx/secrets.json
yaml
project:
  name: my-analytics-project
  version: 1.0.0

llm:
  provider: anthropic  # 或google-vertex、ai-gateway、claude-code
  model: claude-sonnet-4-6
  # API密钥存储在.ktx/secrets.json中

embeddings:
  provider: openai
  model: text-embedding-3-small
  # API密钥存储在.ktx/secrets.json中

databases:
  warehouse:
    type: postgres  # 或snowflake、bigquery、clickhouse、mysql、mssql、sqlite
    host: db.example.com
    port: 5432
    database: analytics
    schema: public
    # 凭据存储在.ktx/secrets.json中

context_sources:
  dbt_main:
    type: dbt
    manifest_path: ./target/manifest.json
  
  looker_main:
    type: looker
    base_url: https://company.looker.com
    # API凭据存储在.ktx/secrets.json中
  
  notion_wiki:
    type: notion
    # API令牌存储在.ktx/secrets.json中

Environment Variables

环境变量

ktx respects these environment variables:
bash
export KTX_PROJECT_DIR=/path/to/project
export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
ktx支持以下环境变量:
bash
export KTX_PROJECT_DIR=/path/to/project
export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Core Commands

核心命令

Building Context

构建上下文

bash
undefined
bash
undefined

Ingest all configured sources

提取所有已配置的源

ktx ingest
ktx ingest

Ingest specific connection

提取特定连接的源

ktx ingest --connection warehouse
ktx ingest --connection warehouse

Ingest specific context source

提取特定上下文源

ktx ingest --source dbt_main
ktx ingest --source dbt_main

Force re-ingestion (skip cache)

强制重新提取(跳过缓存)

ktx ingest --force
undefined
ktx ingest --force
undefined

Searching Context

搜索上下文

bash
undefined
bash
undefined

Search semantic layer (metrics, dimensions, entities)

搜索语义层(指标、维度、实体)

ktx sl "revenue" ktx sl "customer lifetime value"
ktx sl "revenue" ktx sl "customer lifetime value"

Search wiki pages

搜索知识库页面

ktx wiki "refund policy" ktx wiki "how to calculate churn"
ktx wiki "refund policy" ktx wiki "how to calculate churn"

List all semantic sources

列出所有语义源

ktx sl list
ktx sl list

Show specific metric details

显示特定指标详情

ktx sl show "monthly_recurring_revenue"
undefined
ktx sl show "monthly_recurring_revenue"
undefined

MCP Server for Agents

面向Agent的MCP服务器

bash
undefined
bash
undefined

Start MCP server (required for agent integration)

启动MCP服务器(Agent集成必需)

ktx mcp start
ktx mcp start

Start with specific project

使用特定项目启动

ktx mcp start --project-dir /path/to/project
ktx mcp start --project-dir /path/to/project

Check MCP status

检查MCP状态

ktx mcp status
undefined
ktx mcp status
undefined

Project Management

项目管理

bash
undefined
bash
undefined

Validate project configuration

验证项目配置

ktx validate
ktx validate

Show detailed status

显示详细状态

ktx status --verbose
ktx status --verbose

Clear all context and rebuild

清除所有上下文并重新构建

ktx clear && ktx ingest
undefined
ktx clear && ktx ingest
undefined

Using ktx with AI Agents

与AI Agent配合使用

Claude Code / Codex Integration

Claude Code / Codex集成

Once
ktx setup
completes, the agent integration is automatic. Ensure the MCP server is running:
bash
ktx mcp start
Then ask the agent:
text
Query the warehouse using ktx to find monthly recurring revenue for the last 6 months
text
Search ktx wiki for our customer segmentation strategy
ktx setup
完成后,Agent集成会自动完成。确保MCP服务器正在运行:
bash
ktx mcp start
然后向Agent提问:
text
使用ktx查询数据仓库,找出过去6个月的月度经常性收入
text
搜索ktx知识库,查看我们的客户细分策略

Cursor / OpenCode Integration

Cursor / OpenCode集成

Add ktx as an MCP server in your agent configuration, then use natural language:
text
Use ktx to explore available metrics related to user engagement
在Agent配置中添加ktx作为MCP服务器,然后使用自然语言提问:
text
使用ktx探索与用户参与度相关的可用指标

Real-World Examples

实际应用示例

Defining a Semantic Metric

定义语义指标

Create a file in
semantic-layer/warehouse/metrics.yaml
:
yaml
metrics:
  - name: monthly_recurring_revenue
    description: Sum of all active subscription revenue for the month
    type: simple
    sql: "SUM(subscription_amount)"
    base_entity: subscription
    filters:
      - field: status
        operator: equals
        value: active
    dimensions:
      - plan_type
      - customer_segment
    
  - name: customer_ltv
    description: Customer lifetime value - total revenue per customer
    type: derived
    sql: |
      SUM(order_total) / COUNT(DISTINCT customer_id)
    base_entity: order
    dimensions:
      - acquisition_channel
      - cohort_month
semantic-layer/warehouse/metrics.yaml
中创建文件:
yaml
metrics:
  - name: monthly_recurring_revenue
    description: Sum of all active subscription revenue for the month
    type: simple
    sql: "SUM(subscription_amount)"
    base_entity: subscription
    filters:
      - field: status
        operator: equals
        value: active
    dimensions:
      - plan_type
      - customer_segment
    
  - name: customer_ltv
    description: Customer lifetime value - total revenue per customer
    type: derived
    sql: |
      SUM(order_total) / COUNT(DISTINCT customer_id)
    base_entity: order
    dimensions:
      - acquisition_channel
      - cohort_month

Adding Wiki Context

添加知识库上下文

Create a file in
wiki/global/analytics-definitions.md
:
markdown
undefined
wiki/global/analytics-definitions.md
中创建文件:
markdown
undefined

Analytics Definitions

分析定义

Revenue Recognition

收入确认

We recognize revenue on a cash basis when payment is received, not when the invoice is sent.
我们在收到付款时按收付实现制确认收入,而非开具发票时。

Customer Churn

客户流失

A customer is considered churned if they have not had an active subscription for 90 consecutive days. Paused subscriptions do not count as churn.
如果客户连续90天没有活跃订阅,则视为流失。暂停的订阅不算作流失。

Cohort Analysis

cohort分析

Customer cohorts are defined by their first purchase month. Use the
first_order_date
field to group customers into cohorts.

After adding wiki content, re-ingest:

```bash
ktx ingest --source wiki
客户群组根据其首次购买月份定义。使用
first_order_date
字段将客户分组为不同群组。

添加知识库内容后,重新提取:

```bash
ktx ingest --source wiki

Querying from TypeScript/Node

从TypeScript/Node查询

typescript
import { execSync } from 'child_process';

// Search semantic layer
const metrics = JSON.parse(
  execSync('ktx sl "revenue" --json', { encoding: 'utf-8' })
);

console.log('Available revenue metrics:', metrics);

// Search wiki
const wikiResults = JSON.parse(
  execSync('ktx wiki "churn" --json', { encoding: 'utf-8' })
);

console.log('Wiki pages about churn:', wikiResults);
typescript
import { execSync } from 'child_process';

// 搜索语义层
const metrics = JSON.parse(
  execSync('ktx sl "revenue" --json', { encoding: 'utf-8' })
);

console.log('可用收入指标:', metrics);

// 搜索知识库
const wikiResults = JSON.parse(
  execSync('ktx wiki "churn" --json', { encoding: 'utf-8' })
);

console.log('关于流失的知识库页面:', wikiResults);

Programmatic Context Building

程序化构建上下文

typescript
import { KtxProject } from '@kaelio/ktx';

async function buildContext() {
  const project = await KtxProject.load('/path/to/project');
  
  // Ingest all sources
  await project.ingest({ force: false });
  
  // Query semantic layer
  const revenueMetrics = await project.semanticLayer.search('revenue');
  
  console.log('Found metrics:', revenueMetrics.map(m => m.name));
}

buildContext();
typescript
import { KtxProject } from '@kaelio/ktx';

async function buildContext() {
  const project = await KtxProject.load('/path/to/project');
  
  // 提取所有源
  await project.ingest({ force: false });
  
  // 查询语义层
  const revenueMetrics = await project.semanticLayer.search('revenue');
  
  console.log('找到的指标:', revenueMetrics.map(m => m.name));
}

buildContext();

Integrating with dbt

与dbt集成

Ensure your dbt project has a compiled
manifest.json
:
bash
cd /path/to/dbt-project
dbt compile
Configure in
ktx.yaml
:
yaml
context_sources:
  dbt_main:
    type: dbt
    manifest_path: ../dbt-project/target/manifest.json
    include_models: true
    include_sources: true
    include_metrics: true
Then ingest:
bash
ktx ingest --source dbt_main
ktx will:
  • Extract model descriptions and column definitions
  • Import dbt metrics as semantic-layer metrics
  • Detect relationships from
    ref()
    and foreign keys
  • Flag contradictions between dbt docs and warehouse metadata
确保你的dbt项目已编译出
manifest.json
bash
cd /path/to/dbt-project
dbt compile
ktx.yaml
中配置:
yaml
context_sources:
  dbt_main:
    type: dbt
    manifest_path: ../dbt-project/target/manifest.json
    include_models: true
    include_sources: true
    include_metrics: true
然后提取:
bash
ktx ingest --source dbt_main
ktx会:
  • 提取模型描述和列定义
  • 将dbt指标导入为语义层指标
  • ref()
    和外键检测关系
  • 标记dbt文档与数据仓库元数据之间的矛盾

Common Patterns

常见模式

Multi-Warehouse Setup

多数据仓库设置

yaml
databases:
  production:
    type: snowflake
    account: xy12345.us-east-1
    database: PROD_DB
    warehouse: COMPUTE_WH
  
  staging:
    type: postgres
    host: staging-db.internal
    database: staging_analytics
Ingest separately:
bash
ktx ingest --connection production
ktx ingest --connection staging
yaml
databases:
  production:
    type: snowflake
    account: xy12345.us-east-1
    database: PROD_DB
    warehouse: COMPUTE_WH
  
  staging:
    type: postgres
    host: staging-db.internal
    database: staging_analytics
分别提取:
bash
ktx ingest --connection production
ktx ingest --connection staging

Custom Semantic Entities

自定义语义实体

Define entities (e.g., tables or views) in
semantic-layer/warehouse/entities.yaml
:
yaml
entities:
  - name: customer
    description: Customer dimension table
    table: public.customers
    primary_key: customer_id
    columns:
      - name: customer_id
        type: integer
      - name: email
        type: string
      - name: plan_type
        type: string
      - name: created_at
        type: timestamp

  - name: subscription
    description: Active and historical subscriptions
    table: public.subscriptions
    primary_key: subscription_id
    foreign_keys:
      - column: customer_id
        references_entity: customer
        references_column: customer_id
semantic-layer/warehouse/entities.yaml
中定义实体(如表或视图):
yaml
entities:
  - name: customer
    description: Customer dimension table
    table: public.customers
    primary_key: customer_id
    columns:
      - name: customer_id
        type: integer
      - name: email
        type: string
      - name: plan_type
        type: string
      - name: created_at
        type: timestamp

  - name: subscription
    description: Active and historical subscriptions
    table: public.subscriptions
    primary_key: subscription_id
    foreign_keys:
      - column: customer_id
        references_entity: customer
        references_column: customer_id

Searching with Filters

带筛选的搜索

bash
undefined
bash
undefined

Search only metrics (not dimensions or entities)

仅搜索指标(不包括维度或实体)

ktx sl "conversion" --type metric
ktx sl "conversion" --type metric

Search with semantic similarity threshold

带语义相似度阈值的搜索

ktx sl "customer value" --threshold 0.7
ktx sl "customer value" --threshold 0.7

Full-text search in wiki

知识库全文搜索

ktx wiki "refund AND policy"
undefined
ktx wiki "refund AND policy"
undefined

Troubleshooting

故障排除

MCP Server Not Starting

MCP服务器无法启动

Symptom:
ktx mcp start
exits immediately or shows connection errors.
Solutions:
bash
undefined
症状
ktx mcp start
立即退出或显示连接错误。
解决方案
bash
undefined

Check project status first

先检查项目状态

ktx status
ktx status

Ensure LLM and embeddings are configured

确保LLM和嵌入模型已配置

ktx setup
ktx setup

Verify project directory

验证项目目录

ktx mcp start --project-dir /path/to/project
ktx mcp start --project-dir /path/to/project

Check for port conflicts (default 3000)

检查端口冲突(默认3000)

lsof -i :3000
lsof -i :3000

Start on different port

使用不同端口启动

ktx mcp start --port 3001
undefined
ktx mcp start --port 3001
undefined

Ingestion Fails for Database

数据库提取失败

Symptom:
ktx ingest --connection warehouse
fails with authentication or timeout errors.
Solutions:
bash
undefined
症状
ktx ingest --connection warehouse
因认证或超时错误失败。
解决方案
bash
undefined

Verify credentials in .ktx/secrets.json

验证.ktx/secrets.json中的凭据

cat .ktx/secrets.json
cat .ktx/secrets.json

Test connection manually

手动测试连接

psql -h db.example.com -U your_user -d analytics
psql -h db.example.com -U your_user -d analytics

Check firewall / network access

检查防火墙/网络访问

telnet db.example.com 5432
telnet db.example.com 5432

Use read-only user for safety

使用只读用户以确保安全

ktx never writes, but explicit read-only is best practice

ktx默认不会写入数据,但显式只读是最佳实践

undefined
undefined

No Results from Search

搜索无结果

Symptom:
ktx sl "revenue"
returns empty results.
Solutions:
bash
undefined
症状
ktx sl "revenue"
返回空结果。
解决方案
bash
undefined

Ensure context is built

确保上下文已构建

ktx status
ktx status

Re-ingest if needed

必要时重新提取

ktx ingest --force
ktx ingest --force

Check semantic-layer directory

检查semantic-layer目录

ls semantic-layer/warehouse/
ls semantic-layer/warehouse/

Verify embeddings are configured

验证嵌入模型已配置

ktx status --verbose
undefined
ktx status --verbose
undefined

dbt Manifest Not Found

dbt清单未找到

Symptom:
ktx ingest --source dbt_main
reports missing manifest.
Solutions:
bash
undefined
症状
ktx ingest --source dbt_main
报告缺少清单。
解决方案
bash
undefined

Compile dbt project first

先编译dbt项目

cd /path/to/dbt-project dbt compile
cd /path/to/dbt-project dbt compile

Verify manifest path in ktx.yaml

验证ktx.yaml中的清单路径

cat ktx.yaml | grep manifest_path
cat ktx.yaml | grep manifest_path

Use absolute path if needed

必要时使用绝对路径

context_sources: dbt_main: type: dbt manifest_path: /absolute/path/to/target/manifest.json
undefined
context_sources: dbt_main: type: dbt manifest_path: /absolute/path/to/target/manifest.json
undefined

Contradictions Flagged

标记矛盾

Symptom: ktx reports contradictions between dbt docs and warehouse metadata.
Expected behavior: ktx flags these for human review. Check:
bash
undefined
症状:ktx报告dbt文档与数据仓库元数据之间存在矛盾。
预期行为:ktx会标记这些矛盾供人工审核。检查:
bash
undefined

Review flagged contradictions

查看标记的矛盾

cat raw-sources/warehouse/contradictions.json
cat raw-sources/warehouse/contradictions.json

Common causes:

常见原因:

- dbt docs out of sync with warehouse

- dbt文档与数据仓库不同步

- Recent schema changes not yet in dbt

- 最近的架构变更未同步到dbt

- Typos in column names or descriptions

- 列名或描述中的拼写错误


Resolve by updating dbt docs or warehouse schema, then re-ingest.

通过更新dbt文档或数据仓库架构解决问题,然后重新提取。

Agent Can't Access ktx

Agent无法访问ktx

Symptom: Agent says "ktx not found" or doesn't see semantic layer.
Solutions:
bash
undefined
症状:Agent显示“ktx未找到”或无法看到语义层。
解决方案
bash
undefined

Ensure MCP server is running

确保MCP服务器正在运行

ktx mcp start
ktx mcp start

Check agent integration status

检查Agent集成状态

ktx status | grep "Agent integration"
ktx status | grep "Agent integration"

Re-run setup to refresh integration

重新运行setup以刷新集成

ktx setup
ktx setup

Verify agent is configured to use MCP

验证Agent已配置为使用MCP

(check agent settings for MCP server configuration)

(检查Agent设置中的MCP服务器配置)

undefined
undefined

Advanced Usage

高级用法

Custom LLM Configuration

自定义LLM配置

yaml
llm:
  provider: ai-gateway
  endpoint: https://gateway.example.com/v1
  model: custom-model-v2
  max_tokens: 4096
  temperature: 0.1
yaml
llm:
  provider: ai-gateway
  endpoint: https://gateway.example.com/v1
  model: custom-model-v2
  max_tokens: 4096
  temperature: 0.1

Scripting with ktx

使用ktx编写脚本

bash
#!/bin/bash
bash
#!/bin/bash

Daily context refresh script

每日上下文刷新脚本

export KTX_PROJECT_DIR=/path/to/project
export KTX_PROJECT_DIR=/path/to/project

Pull latest dbt changes

拉取最新dbt变更

cd /path/to/dbt-project && git pull && dbt compile
cd /path/to/dbt-project && git pull && dbt compile

Rebuild ktx context

重建ktx上下文

cd $KTX_PROJECT_DIR ktx ingest --force
cd $KTX_PROJECT_DIR ktx ingest --force

Restart MCP server

重启MCP服务器

pkill -f "ktx mcp" || true ktx mcp start --daemon
undefined
pkill -f "ktx mcp" || true ktx mcp start --daemon
undefined

Filtering Ingestion

筛选提取内容

yaml
databases:
  warehouse:
    type: postgres
    # ... connection details ...
    include_schemas:
      - public
      - analytics
    exclude_tables:
      - tmp_*
      - staging_*
    sample_size: 1000  # rows to sample per table
yaml
databases:
  warehouse:
    type: postgres
    # ... 连接详情 ...
    include_schemas:
      - public
      - analytics
    exclude_tables:
      - tmp_*
      - staging_*
    sample_size: 1000  # 每个表要采样的行数

Key Resources

关键资源

Security Notes

安全说明

  • ktx connections are read-only by design
  • Credentials stored in
    .ktx/secrets.json
    (git-ignored)
  • No data sent to hosted services - only to your configured LLM provider
  • Use environment variables for CI/CD:
    ANTHROPIC_API_KEY
    ,
    OPENAI_API_KEY
    , etc.
  • ktx连接默认是只读
  • 凭据存储在
    .ktx/secrets.json
    中(已Git忽略)
  • 不会将数据发送到托管服务——仅发送到你配置的LLM提供商
  • 在CI/CD中使用环境变量:
    ANTHROPIC_API_KEY
    OPENAI_API_KEY