ktx-ai-data-agents-context-layer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesektx AI Data Agents Context Layer
ktx AI数据Agent上下文层
Skill by ara.so — AI Agent Skills collection.
ktx is a self-improving context layer that teaches AI agents how to query data warehouses accurately. It automatically builds a semantic layer from approved metric definitions, detects joinable columns, ingests business knowledge from wikis/dbt/BI tools, and exposes everything through CLI and MCP (Model Context Protocol) for agent execution.
Unlike general-purpose agents that re-explore your warehouse on every question or traditional semantic layers that require constant manual upkeep, ktx learns from company knowledge, maps your data stack automatically, and serves combined context to agents at runtime.
由ara.so提供的Skill——AI Agent技能合集。
ktx是一个自我优化的上下文层,它教会AI Agent如何精准查询数据仓库。它会自动从已批准的指标定义构建语义层,可检测可关联的列,从知识库/dbt/BI工具中提取业务知识,并通过CLI和MCP(Model Context Protocol,模型上下文协议)将所有内容暴露给Agent执行。
与每次提问都要重新探索数据仓库的通用Agent,或需要持续手动维护的传统语义层不同,ktx会从企业知识中学习,自动映射你的数据栈,并在运行时为Agent提供合并后的上下文。
Installation
安装
Global CLI Installation
全局CLI安装
bash
npm install -g @kaelio/ktxbash
npm install -g @kaelio/ktxProject-Specific Installation
项目专属安装
bash
npm install --save-dev @kaelio/ktxOr use with :
npxbash
npx @kaelio/ktx setupbash
npm install --save-dev @kaelio/ktx或使用:
npxbash
npx @kaelio/ktx setupQuick Start
快速开始
Initial Setup
初始设置
bash
undefinedbash
undefinedInitialize or resume a ktx project
初始化或恢复ktx项目
ktx setup
ktx setup
Check project readiness
检查项目就绪状态
ktx status
The `ktx setup` command:
- Creates or resumes a local ktx project
- Configures LLM and embedding providers
- Sets up database connections
- Configures context sources (dbt, Looker, Metabase, Notion)
- Builds initial context
- Installs agent integrationktx status
`ktx setup`命令会:
- 创建或恢复本地ktx项目
- 配置LLM和嵌入模型提供商
- 设置数据库连接
- 配置上下文源(dbt、Looker、Metabase、Notion)
- 构建初始上下文
- 安装Agent集成Expected Status Output
预期状态输出
text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)Project Structure
项目结构
text
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/<connection-id>/ # YAML semantic sources (commit)
├── wiki/global/ # Shared business context (commit)
├── wiki/user/<user-id>/ # User-scoped notes (commit)
├── raw-sources/<connection-id>/ # Ingest artifacts and reports (commit)
└── .ktx/ # Local state and secrets (git-ignore)Commit: , , ,
Git-ignore:
ktx.yamlsemantic-layer/wiki/raw-sources/Git-ignore:
.ktx/text
my-project/
├── ktx.yaml # 项目配置文件
├── semantic-layer/<connection-id>/ # YAML语义源(需提交)
├── wiki/global/ # 共享业务上下文(需提交)
├── wiki/user/<user-id>/ # 用户专属笔记(需提交)
├── raw-sources/<connection-id>/ # 提取的工件和报告(需提交)
└── .ktx/ # 本地状态和密钥(需Git忽略)需提交:、、、
Git忽略:
ktx.yamlsemantic-layer/wiki/raw-sources/Git忽略:
.ktx/Configuration
配置
ktx.yaml Structure
ktx.yaml结构
yaml
project:
name: my-analytics-project
version: 1.0.0
llm:
provider: anthropic # or google-vertex, ai-gateway, claude-code
model: claude-sonnet-4-6
# API key stored in .ktx/secrets.json
embeddings:
provider: openai
model: text-embedding-3-small
# API key stored in .ktx/secrets.json
databases:
warehouse:
type: postgres # or snowflake, bigquery, clickhouse, mysql, mssql, sqlite
host: db.example.com
port: 5432
database: analytics
schema: public
# credentials stored in .ktx/secrets.json
context_sources:
dbt_main:
type: dbt
manifest_path: ./target/manifest.json
looker_main:
type: looker
base_url: https://company.looker.com
# API credentials in .ktx/secrets.json
notion_wiki:
type: notion
# API token in .ktx/secrets.jsonyaml
project:
name: my-analytics-project
version: 1.0.0
llm:
provider: anthropic # 或google-vertex、ai-gateway、claude-code
model: claude-sonnet-4-6
# API密钥存储在.ktx/secrets.json中
embeddings:
provider: openai
model: text-embedding-3-small
# API密钥存储在.ktx/secrets.json中
databases:
warehouse:
type: postgres # 或snowflake、bigquery、clickhouse、mysql、mssql、sqlite
host: db.example.com
port: 5432
database: analytics
schema: public
# 凭据存储在.ktx/secrets.json中
context_sources:
dbt_main:
type: dbt
manifest_path: ./target/manifest.json
looker_main:
type: looker
base_url: https://company.looker.com
# API凭据存储在.ktx/secrets.json中
notion_wiki:
type: notion
# API令牌存储在.ktx/secrets.json中Environment Variables
环境变量
ktx respects these environment variables:
bash
export KTX_PROJECT_DIR=/path/to/project
export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.jsonktx支持以下环境变量:
bash
export KTX_PROJECT_DIR=/path/to/project
export ANTHROPIC_API_KEY=your-key-here
export OPENAI_API_KEY=your-key-here
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.jsonCore Commands
核心命令
Building Context
构建上下文
bash
undefinedbash
undefinedIngest all configured sources
提取所有已配置的源
ktx ingest
ktx ingest
Ingest specific connection
提取特定连接的源
ktx ingest --connection warehouse
ktx ingest --connection warehouse
Ingest specific context source
提取特定上下文源
ktx ingest --source dbt_main
ktx ingest --source dbt_main
Force re-ingestion (skip cache)
强制重新提取(跳过缓存)
ktx ingest --force
undefinedktx ingest --force
undefinedSearching Context
搜索上下文
bash
undefinedbash
undefinedSearch semantic layer (metrics, dimensions, entities)
搜索语义层(指标、维度、实体)
ktx sl "revenue"
ktx sl "customer lifetime value"
ktx sl "revenue"
ktx sl "customer lifetime value"
Search wiki pages
搜索知识库页面
ktx wiki "refund policy"
ktx wiki "how to calculate churn"
ktx wiki "refund policy"
ktx wiki "how to calculate churn"
List all semantic sources
列出所有语义源
ktx sl list
ktx sl list
Show specific metric details
显示特定指标详情
ktx sl show "monthly_recurring_revenue"
undefinedktx sl show "monthly_recurring_revenue"
undefinedMCP Server for Agents
面向Agent的MCP服务器
bash
undefinedbash
undefinedStart MCP server (required for agent integration)
启动MCP服务器(Agent集成必需)
ktx mcp start
ktx mcp start
Start with specific project
使用特定项目启动
ktx mcp start --project-dir /path/to/project
ktx mcp start --project-dir /path/to/project
Check MCP status
检查MCP状态
ktx mcp status
undefinedktx mcp status
undefinedProject Management
项目管理
bash
undefinedbash
undefinedValidate project configuration
验证项目配置
ktx validate
ktx validate
Show detailed status
显示详细状态
ktx status --verbose
ktx status --verbose
Clear all context and rebuild
清除所有上下文并重新构建
ktx clear && ktx ingest
undefinedktx clear && ktx ingest
undefinedUsing ktx with AI Agents
与AI Agent配合使用
Claude Code / Codex Integration
Claude Code / Codex集成
Once completes, the agent integration is automatic. Ensure the MCP server is running:
ktx setupbash
ktx mcp startThen ask the agent:
text
Query the warehouse using ktx to find monthly recurring revenue for the last 6 monthstext
Search ktx wiki for our customer segmentation strategyktx setupbash
ktx mcp start然后向Agent提问:
text
使用ktx查询数据仓库,找出过去6个月的月度经常性收入text
搜索ktx知识库,查看我们的客户细分策略Cursor / OpenCode Integration
Cursor / OpenCode集成
Add ktx as an MCP server in your agent configuration, then use natural language:
text
Use ktx to explore available metrics related to user engagement在Agent配置中添加ktx作为MCP服务器,然后使用自然语言提问:
text
使用ktx探索与用户参与度相关的可用指标Real-World Examples
实际应用示例
Defining a Semantic Metric
定义语义指标
Create a file in :
semantic-layer/warehouse/metrics.yamlyaml
metrics:
- name: monthly_recurring_revenue
description: Sum of all active subscription revenue for the month
type: simple
sql: "SUM(subscription_amount)"
base_entity: subscription
filters:
- field: status
operator: equals
value: active
dimensions:
- plan_type
- customer_segment
- name: customer_ltv
description: Customer lifetime value - total revenue per customer
type: derived
sql: |
SUM(order_total) / COUNT(DISTINCT customer_id)
base_entity: order
dimensions:
- acquisition_channel
- cohort_month在中创建文件:
semantic-layer/warehouse/metrics.yamlyaml
metrics:
- name: monthly_recurring_revenue
description: Sum of all active subscription revenue for the month
type: simple
sql: "SUM(subscription_amount)"
base_entity: subscription
filters:
- field: status
operator: equals
value: active
dimensions:
- plan_type
- customer_segment
- name: customer_ltv
description: Customer lifetime value - total revenue per customer
type: derived
sql: |
SUM(order_total) / COUNT(DISTINCT customer_id)
base_entity: order
dimensions:
- acquisition_channel
- cohort_monthAdding Wiki Context
添加知识库上下文
Create a file in :
wiki/global/analytics-definitions.mdmarkdown
undefined在中创建文件:
wiki/global/analytics-definitions.mdmarkdown
undefinedAnalytics Definitions
分析定义
Revenue Recognition
收入确认
We recognize revenue on a cash basis when payment is received, not when
the invoice is sent.
我们在收到付款时按收付实现制确认收入,而非开具发票时。
Customer Churn
客户流失
A customer is considered churned if they have not had an active subscription
for 90 consecutive days. Paused subscriptions do not count as churn.
如果客户连续90天没有活跃订阅,则视为流失。暂停的订阅不算作流失。
Cohort Analysis
cohort分析
Customer cohorts are defined by their first purchase month. Use the
field to group customers into cohorts.
first_order_date
After adding wiki content, re-ingest:
```bash
ktx ingest --source wiki客户群组根据其首次购买月份定义。使用字段将客户分组为不同群组。
first_order_date
添加知识库内容后,重新提取:
```bash
ktx ingest --source wikiQuerying from TypeScript/Node
从TypeScript/Node查询
typescript
import { execSync } from 'child_process';
// Search semantic layer
const metrics = JSON.parse(
execSync('ktx sl "revenue" --json', { encoding: 'utf-8' })
);
console.log('Available revenue metrics:', metrics);
// Search wiki
const wikiResults = JSON.parse(
execSync('ktx wiki "churn" --json', { encoding: 'utf-8' })
);
console.log('Wiki pages about churn:', wikiResults);typescript
import { execSync } from 'child_process';
// 搜索语义层
const metrics = JSON.parse(
execSync('ktx sl "revenue" --json', { encoding: 'utf-8' })
);
console.log('可用收入指标:', metrics);
// 搜索知识库
const wikiResults = JSON.parse(
execSync('ktx wiki "churn" --json', { encoding: 'utf-8' })
);
console.log('关于流失的知识库页面:', wikiResults);Programmatic Context Building
程序化构建上下文
typescript
import { KtxProject } from '@kaelio/ktx';
async function buildContext() {
const project = await KtxProject.load('/path/to/project');
// Ingest all sources
await project.ingest({ force: false });
// Query semantic layer
const revenueMetrics = await project.semanticLayer.search('revenue');
console.log('Found metrics:', revenueMetrics.map(m => m.name));
}
buildContext();typescript
import { KtxProject } from '@kaelio/ktx';
async function buildContext() {
const project = await KtxProject.load('/path/to/project');
// 提取所有源
await project.ingest({ force: false });
// 查询语义层
const revenueMetrics = await project.semanticLayer.search('revenue');
console.log('找到的指标:', revenueMetrics.map(m => m.name));
}
buildContext();Integrating with dbt
与dbt集成
Ensure your dbt project has a compiled :
manifest.jsonbash
cd /path/to/dbt-project
dbt compileConfigure in :
ktx.yamlyaml
context_sources:
dbt_main:
type: dbt
manifest_path: ../dbt-project/target/manifest.json
include_models: true
include_sources: true
include_metrics: trueThen ingest:
bash
ktx ingest --source dbt_mainktx will:
- Extract model descriptions and column definitions
- Import dbt metrics as semantic-layer metrics
- Detect relationships from and foreign keys
ref() - Flag contradictions between dbt docs and warehouse metadata
确保你的dbt项目已编译出:
manifest.jsonbash
cd /path/to/dbt-project
dbt compile在中配置:
ktx.yamlyaml
context_sources:
dbt_main:
type: dbt
manifest_path: ../dbt-project/target/manifest.json
include_models: true
include_sources: true
include_metrics: true然后提取:
bash
ktx ingest --source dbt_mainktx会:
- 提取模型描述和列定义
- 将dbt指标导入为语义层指标
- 从和外键检测关系
ref() - 标记dbt文档与数据仓库元数据之间的矛盾
Common Patterns
常见模式
Multi-Warehouse Setup
多数据仓库设置
yaml
databases:
production:
type: snowflake
account: xy12345.us-east-1
database: PROD_DB
warehouse: COMPUTE_WH
staging:
type: postgres
host: staging-db.internal
database: staging_analyticsIngest separately:
bash
ktx ingest --connection production
ktx ingest --connection stagingyaml
databases:
production:
type: snowflake
account: xy12345.us-east-1
database: PROD_DB
warehouse: COMPUTE_WH
staging:
type: postgres
host: staging-db.internal
database: staging_analytics分别提取:
bash
ktx ingest --connection production
ktx ingest --connection stagingCustom Semantic Entities
自定义语义实体
Define entities (e.g., tables or views) in :
semantic-layer/warehouse/entities.yamlyaml
entities:
- name: customer
description: Customer dimension table
table: public.customers
primary_key: customer_id
columns:
- name: customer_id
type: integer
- name: email
type: string
- name: plan_type
type: string
- name: created_at
type: timestamp
- name: subscription
description: Active and historical subscriptions
table: public.subscriptions
primary_key: subscription_id
foreign_keys:
- column: customer_id
references_entity: customer
references_column: customer_id在中定义实体(如表或视图):
semantic-layer/warehouse/entities.yamlyaml
entities:
- name: customer
description: Customer dimension table
table: public.customers
primary_key: customer_id
columns:
- name: customer_id
type: integer
- name: email
type: string
- name: plan_type
type: string
- name: created_at
type: timestamp
- name: subscription
description: Active and historical subscriptions
table: public.subscriptions
primary_key: subscription_id
foreign_keys:
- column: customer_id
references_entity: customer
references_column: customer_idSearching with Filters
带筛选的搜索
bash
undefinedbash
undefinedSearch only metrics (not dimensions or entities)
仅搜索指标(不包括维度或实体)
ktx sl "conversion" --type metric
ktx sl "conversion" --type metric
Search with semantic similarity threshold
带语义相似度阈值的搜索
ktx sl "customer value" --threshold 0.7
ktx sl "customer value" --threshold 0.7
Full-text search in wiki
知识库全文搜索
ktx wiki "refund AND policy"
undefinedktx wiki "refund AND policy"
undefinedTroubleshooting
故障排除
MCP Server Not Starting
MCP服务器无法启动
Symptom: exits immediately or shows connection errors.
ktx mcp startSolutions:
bash
undefined症状:立即退出或显示连接错误。
ktx mcp start解决方案:
bash
undefinedCheck project status first
先检查项目状态
ktx status
ktx status
Ensure LLM and embeddings are configured
确保LLM和嵌入模型已配置
ktx setup
ktx setup
Verify project directory
验证项目目录
ktx mcp start --project-dir /path/to/project
ktx mcp start --project-dir /path/to/project
Check for port conflicts (default 3000)
检查端口冲突(默认3000)
lsof -i :3000
lsof -i :3000
Start on different port
使用不同端口启动
ktx mcp start --port 3001
undefinedktx mcp start --port 3001
undefinedIngestion Fails for Database
数据库提取失败
Symptom: fails with authentication or timeout errors.
ktx ingest --connection warehouseSolutions:
bash
undefined症状:因认证或超时错误失败。
ktx ingest --connection warehouse解决方案:
bash
undefinedVerify credentials in .ktx/secrets.json
验证.ktx/secrets.json中的凭据
cat .ktx/secrets.json
cat .ktx/secrets.json
Test connection manually
手动测试连接
psql -h db.example.com -U your_user -d analytics
psql -h db.example.com -U your_user -d analytics
Check firewall / network access
检查防火墙/网络访问
telnet db.example.com 5432
telnet db.example.com 5432
Use read-only user for safety
使用只读用户以确保安全
ktx never writes, but explicit read-only is best practice
ktx默认不会写入数据,但显式只读是最佳实践
undefinedundefinedNo Results from Search
搜索无结果
Symptom: returns empty results.
ktx sl "revenue"Solutions:
bash
undefined症状:返回空结果。
ktx sl "revenue"解决方案:
bash
undefinedEnsure context is built
确保上下文已构建
ktx status
ktx status
Re-ingest if needed
必要时重新提取
ktx ingest --force
ktx ingest --force
Check semantic-layer directory
检查semantic-layer目录
ls semantic-layer/warehouse/
ls semantic-layer/warehouse/
Verify embeddings are configured
验证嵌入模型已配置
ktx status --verbose
undefinedktx status --verbose
undefineddbt Manifest Not Found
dbt清单未找到
Symptom: reports missing manifest.
ktx ingest --source dbt_mainSolutions:
bash
undefined症状:报告缺少清单。
ktx ingest --source dbt_main解决方案:
bash
undefinedCompile dbt project first
先编译dbt项目
cd /path/to/dbt-project
dbt compile
cd /path/to/dbt-project
dbt compile
Verify manifest path in ktx.yaml
验证ktx.yaml中的清单路径
cat ktx.yaml | grep manifest_path
cat ktx.yaml | grep manifest_path
Use absolute path if needed
必要时使用绝对路径
context_sources:
dbt_main:
type: dbt
manifest_path: /absolute/path/to/target/manifest.json
undefinedcontext_sources:
dbt_main:
type: dbt
manifest_path: /absolute/path/to/target/manifest.json
undefinedContradictions Flagged
标记矛盾
Symptom: ktx reports contradictions between dbt docs and warehouse metadata.
Expected behavior: ktx flags these for human review. Check:
bash
undefined症状:ktx报告dbt文档与数据仓库元数据之间存在矛盾。
预期行为:ktx会标记这些矛盾供人工审核。检查:
bash
undefinedReview flagged contradictions
查看标记的矛盾
cat raw-sources/warehouse/contradictions.json
cat raw-sources/warehouse/contradictions.json
Common causes:
常见原因:
- dbt docs out of sync with warehouse
- dbt文档与数据仓库不同步
- Recent schema changes not yet in dbt
- 最近的架构变更未同步到dbt
- Typos in column names or descriptions
- 列名或描述中的拼写错误
Resolve by updating dbt docs or warehouse schema, then re-ingest.
通过更新dbt文档或数据仓库架构解决问题,然后重新提取。Agent Can't Access ktx
Agent无法访问ktx
Symptom: Agent says "ktx not found" or doesn't see semantic layer.
Solutions:
bash
undefined症状:Agent显示“ktx未找到”或无法看到语义层。
解决方案:
bash
undefinedEnsure MCP server is running
确保MCP服务器正在运行
ktx mcp start
ktx mcp start
Check agent integration status
检查Agent集成状态
ktx status | grep "Agent integration"
ktx status | grep "Agent integration"
Re-run setup to refresh integration
重新运行setup以刷新集成
ktx setup
ktx setup
Verify agent is configured to use MCP
验证Agent已配置为使用MCP
(check agent settings for MCP server configuration)
(检查Agent设置中的MCP服务器配置)
undefinedundefinedAdvanced Usage
高级用法
Custom LLM Configuration
自定义LLM配置
yaml
llm:
provider: ai-gateway
endpoint: https://gateway.example.com/v1
model: custom-model-v2
max_tokens: 4096
temperature: 0.1yaml
llm:
provider: ai-gateway
endpoint: https://gateway.example.com/v1
model: custom-model-v2
max_tokens: 4096
temperature: 0.1Scripting with ktx
使用ktx编写脚本
bash
#!/bin/bashbash
#!/bin/bashDaily context refresh script
每日上下文刷新脚本
export KTX_PROJECT_DIR=/path/to/project
export KTX_PROJECT_DIR=/path/to/project
Pull latest dbt changes
拉取最新dbt变更
cd /path/to/dbt-project && git pull && dbt compile
cd /path/to/dbt-project && git pull && dbt compile
Rebuild ktx context
重建ktx上下文
cd $KTX_PROJECT_DIR
ktx ingest --force
cd $KTX_PROJECT_DIR
ktx ingest --force
Restart MCP server
重启MCP服务器
pkill -f "ktx mcp" || true
ktx mcp start --daemon
undefinedpkill -f "ktx mcp" || true
ktx mcp start --daemon
undefinedFiltering Ingestion
筛选提取内容
yaml
databases:
warehouse:
type: postgres
# ... connection details ...
include_schemas:
- public
- analytics
exclude_tables:
- tmp_*
- staging_*
sample_size: 1000 # rows to sample per tableyaml
databases:
warehouse:
type: postgres
# ... 连接详情 ...
include_schemas:
- public
- analytics
exclude_tables:
- tmp_*
- staging_*
sample_size: 1000 # 每个表要采样的行数Key Resources
关键资源
- Documentation: https://docs.kaelio.com/ktx/docs/
- CLI Reference: https://docs.kaelio.com/ktx/docs/cli-reference/ktx
- Agent Setup Guide: https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart
- Slack Community: https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ
- GitHub Issues: https://github.com/Kaelio/ktx/issues
- 文档:https://docs.kaelio.com/ktx/docs/
- CLI参考:https://docs.kaelio.com/ktx/docs/cli-reference/ktx
- Agent设置指南:https://docs.kaelio.com/ktx/docs/ai-resources/agent-quickstart
- Slack社区:https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ
- GitHub问题:https://github.com/Kaelio/ktx/issues
Security Notes
安全说明
- ktx connections are read-only by design
- Credentials stored in (git-ignored)
.ktx/secrets.json - No data sent to hosted services - only to your configured LLM provider
- Use environment variables for CI/CD: ,
ANTHROPIC_API_KEY, etc.OPENAI_API_KEY
- ktx连接默认是只读的
- 凭据存储在中(已Git忽略)
.ktx/secrets.json - 不会将数据发送到托管服务——仅发送到你配置的LLM提供商
- 在CI/CD中使用环境变量:、
ANTHROPIC_API_KEY等OPENAI_API_KEY