lakebase-setup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLakebase Setup for Agent Memory
为Agent内存配置Lakebase
Note: This template does not include memory by default. Use this skill if you want to add memory capabilities to your agent. For pre-configured memory templates, see:
- Conversation history within a sessionagent-langgraph-short-term-memory - User facts that persist across sessionsagent-langgraph-long-term-memory
注意: 本模板默认不包含内存功能。如果您想要为Agent添加内存功能,请使用本技能。如需预配置内存的模板,请查看:
- 会话内的对话历史agent-langgraph-short-term-memory - 跨会话持久化的用户信息agent-langgraph-long-term-memory
Overview
概述
Lakebase provides persistent storage for agent memory:
- Short-term memory: Conversation history within a thread ()
AsyncCheckpointSaver - Long-term memory: User facts across sessions ()
AsyncDatabricksStore
Lakebase为Agent内存提供持久化存储:
- 短期内存:线程内的对话历史()
AsyncCheckpointSaver - 长期内存:跨会话的用户信息()
AsyncDatabricksStore
Complete Setup Workflow
完整设置流程
┌─────────────────────────────────────────────────────────────────────────────┐
│ 1. Add dependency → 2. Get instance → 3. Configure DAB + app.yaml │
│ 4. Configure .env → 5. Initialize tables → 6. Deploy + Run │
└─────────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────────┐
│ 1. 添加依赖 → 2. 获取实例 → 3. 配置DAB + app.yaml │
│ 4. 配置.env → 5. 初始化表 → 6. 部署并运行 │
└─────────────────────────────────────────────────────────────────────────────┘Step 1: Add Memory Dependency
步骤1:添加内存依赖
Add the memory extra to your :
pyproject.tomltoml
dependencies = [
"databricks-langchain[memory]",
# ... other dependencies
]Then sync dependencies:
bash
uv sync在您的中添加内存扩展依赖:
pyproject.tomltoml
dependencies = [
"databricks-langchain[memory]",
# ... 其他依赖
]然后同步依赖:
bash
uv syncStep 2: Create or Get Lakebase Instance
步骤2:创建或获取Lakebase实例
Option A: Create New Instance (via Databricks UI)
选项A:创建新实例(通过Databricks UI)
- Go to your Databricks workspace
- Navigate to Compute → Lakebase
- Click Create Instance
- Note the instance name
- 进入您的Databricks工作区
- 导航至Compute → Lakebase
- 点击Create Instance
- 记录实例名称
Option B: Use Existing Instance
选项B:使用现有实例
If you have an existing instance, note its name for the next step.
如果您已有实例,请记录其名称以便后续步骤使用。
Step 3: Configure databricks.yml (Lakebase Resource)
步骤3:配置databricks.yml(Lakebase资源)
Add the Lakebase resource to your app in :
databasedatabricks.ymlyaml
resources:
apps:
agent_langgraph:
name: "your-app-name"
source_code_path: ./
resources:
# ... other resources (experiment, UC functions, etc.) ...
# Lakebase instance for long-term memory
- name: 'database'
database:
instance_name: '<your-lakebase-instance-name>'
database_name: 'postgres'
permission: 'CAN_CONNECT_AND_CREATE'Important:
- The must match the
instance_name: '<your-lakebase-instance-name>'reference invalueapp.yaml - Using the resource type automatically grants the app's service principal access to Lakebase
database
在中为您的应用添加Lakebase 资源:
databricks.ymldatabaseyaml
resources:
apps:
agent_langgraph:
name: "your-app-name"
source_code_path: ./
resources:
# ... 其他资源(实验、UC函数等) ...
# 用于长期内存的Lakebase实例
- name: 'database'
database:
instance_name: '<your-lakebase-instance-name>'
database_name: 'postgres'
permission: 'CAN_CONNECT_AND_CREATE'重要提示:
- 必须与
instance_name: '<your-lakebase-instance-name>'中的引用值匹配app.yaml - 使用资源类型会自动为应用的服务主体授予Lakebase访问权限
database
Update app.yaml (Environment Variables)
更新app.yaml(环境变量)
Update with the Lakebase instance name:
app.yamlyaml
env:
# ... other env vars ...
# Lakebase instance name - must match instance_name in databricks.yml database resource
# Note: Use 'value' (not 'valueFrom') because AsyncDatabricksStore needs the instance name,
# not the full connection string that valueFrom would provide
- name: LAKEBASE_INSTANCE_NAME
value: "<your-lakebase-instance-name>"
# Static values for embedding configuration
- name: EMBEDDING_ENDPOINT
value: "databricks-gte-large-en"
- name: EMBEDDING_DIMS
value: "1024"Important:
- The value must match the
LAKEBASE_INSTANCE_NAMEin yourinstance_namedatabase resourcedatabricks.yml - The resource handles permissions;
databaseprovides the instance name to your codeapp.yaml - Don't use for Lakebase - it provides the connection string, not the instance name
valueFrom
在中更新Lakebase实例名称:
app.yamlyaml
env:
# ... 其他环境变量 ...
# Lakebase实例名称 - 必须与databricks.yml database资源中的instance_name匹配
# 注意:使用'value'(而非'valueFrom'),因为AsyncDatabricksStore需要实例名称,
# 而不是valueFrom提供的完整连接字符串
- name: LAKEBASE_INSTANCE_NAME
value: "<your-lakebase-instance-name>"
# 嵌入配置的静态值
- name: EMBEDDING_ENDPOINT
value: "databricks-gte-large-en"
- name: EMBEDDING_DIMS
value: "1024"重要提示:
- 的值必须与
LAKEBASE_INSTANCE_NAMEdatabase资源中的databricks.yml匹配instance_name - 资源负责处理权限;
database为代码提供实例名称app.yaml - 不要为Lakebase使用- 它提供的是连接字符串,而非实例名称
valueFrom
Step 4: Configure .env (Local Development)
步骤4:配置.env(本地开发)
For local development, add to :
.envbash
undefined对于本地开发,在中添加以下内容:
.envbash
undefinedLakebase configuration for long-term memory
用于长期内存的Lakebase配置
LAKEBASE_INSTANCE_NAME=<your-instance-name>
EMBEDDING_ENDPOINT=databricks-gte-large-en
EMBEDDING_DIMS=1024
**Important:** `embedding_dims` must match the embedding endpoint:
| Endpoint | Dimensions |
|----------|------------|
| `databricks-gte-large-en` | 1024 |
| `databricks-bge-large-en` | 1024 |
> **Note:** `.env` is only for local development. When deployed, the app gets `LAKEBASE_INSTANCE_NAME` from the `valueFrom` reference in `app.yaml`.
---LAKEBASE_INSTANCE_NAME=<your-instance-name>
EMBEDDING_ENDPOINT=databricks-gte-large-en
EMBEDDING_DIMS=1024
**重要提示:** `embedding_dims`必须与嵌入端点匹配:
| 端点 | 维度 |
|----------|------------|
| `databricks-gte-large-en` | 1024 |
| `databricks-bge-large-en` | 1024 |
> **注意:** `.env`仅用于本地开发。部署后,应用会从`app.yaml`的`valueFrom`引用中获取`LAKEBASE_INSTANCE_NAME`。
---Step 5: Initialize Store Tables (CRITICAL - First Time Only)
步骤5:初始化存储表(关键 - 仅首次需要)
Before deploying, you must initialize the Lakebase tables. The creates tables on first use, but you need to do this locally first:
AsyncDatabricksStorepython
undefined部署前,您必须初始化Lakebase表。会在首次使用时创建表,但您需要先在本地执行此操作:
AsyncDatabricksStorepython
undefinedRun this script locally BEFORE first deployment
首次部署前在本地运行此脚本
import asyncio
from databricks_langchain import AsyncDatabricksStore
async def setup_store():
async with AsyncDatabricksStore(
instance_name="<your-instance-name>",
embedding_endpoint="databricks-gte-large-en",
embedding_dims=1024,
) as store:
print("Setting up store tables...")
await store.setup() # Creates required tables
print("Store tables created!")
# Verify with a test write/read
await store.aput(("test", "init"), "test_key", {"value": "test_value"})
results = await store.asearch(("test", "init"), query="test", limit=1)
print(f"Test successful: {results}")asyncio.run(setup_store())
Run with:
```bash
uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore
async def setup():
async with AsyncDatabricksStore(
instance_name="<your-instance-name>",
embedding_endpoint="databricks-gte-large-en",
embedding_dims=1024,
) as store:
await store.setup()
print("Tables created!")
asyncio.run(setup())
EOF
)"This creates these tables in the schema:
public- - Key-value storage for memories
store - - Vector embeddings for semantic search
store_vectors - - Schema migration tracking
store_migrations - - Vector schema migration tracking
vector_migrations
import asyncio
from databricks_langchain import AsyncDatabricksStore
async def setup_store():
async with AsyncDatabricksStore(
instance_name="<your-instance-name>",
embedding_endpoint="databricks-gte-large-en",
embedding_dims=1024,
) as store:
print("正在设置存储表...")
await store.setup() # 创建所需表
print("存储表创建完成!")
# 通过测试写入/读取验证
await store.aput(("test", "init"), "test_key", {"value": "test_value"})
results = await store.asearch(("test", "init"), query="test", limit=1)
print(f"测试成功:{results}")asyncio.run(setup_store())
运行命令:
```bash
uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore
async def setup():
async with AsyncDatabricksStore(
instance_name="<your-instance-name>",
embedding_endpoint="databricks-gte-large-en",
embedding_dims=1024,
) as store:
await store.setup()
print("表创建完成!")
asyncio.run(setup())
EOF
)"这会在模式下创建以下表:
public- - 内存的键值存储
store - - 用于语义搜索的向量嵌入
store_vectors - - 模式迁移跟踪
store_migrations - - 向量模式迁移跟踪
vector_migrations
Step 6: Deploy and Run Your App
步骤6:部署并运行您的应用
IMPORTANT: Always run both AND commands:
deployrunbash
undefined重要提示: 请务必同时运行和命令:
deployrunbash
undefinedDeploy resources and upload files
部署资源并上传文件
databricks bundle deploy
databricks bundle deploy
Start/restart the app with new code (REQUIRED!)
启动/重启应用以加载新代码(必须执行!)
databricks bundle run agent_langgraph
> **Note:** `bundle deploy` only uploads files and configures resources. `bundle run` is required to actually start the app with the new code.
---databricks bundle run agent_langgraph
> **注意:** `bundle deploy`仅上传文件并配置资源。`bundle run`才会实际使用新代码启动应用。
---Complete Example: databricks.yml with Lakebase
完整示例:包含Lakebase的databricks.yml
yaml
bundle:
name: agent_langgraph
resources:
experiments:
agent_langgraph_experiment:
name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}
apps:
agent_langgraph:
name: "my-agent-app"
description: "Agent with long-term memory"
source_code_path: ./
resources:
- name: 'experiment'
experiment:
experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
permission: 'CAN_MANAGE'
# Lakebase instance for long-term memory
- name: 'database'
database:
instance_name: '<your-lakebase-instance-name>'
database_name: 'postgres'
permission: 'CAN_CONNECT_AND_CREATE'
targets:
dev:
mode: development
default: trueyaml
bundle:
name: agent_langgraph
resources:
experiments:
agent_langgraph_experiment:
name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}
apps:
agent_langgraph:
name: "my-agent-app"
description: "具备长期内存的Agent"
source_code_path: ./
resources:
- name: 'experiment'
experiment:
experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
permission: 'CAN_MANAGE'
# 用于长期内存的Lakebase实例
- name: 'database'
database:
instance_name: '<your-lakebase-instance-name>'
database_name: 'postgres'
permission: 'CAN_CONNECT_AND_CREATE'
targets:
dev:
mode: development
default: trueComplete Example: app.yaml
完整示例:app.yaml
yaml
command: ["uv", "run", "start-app"]
env:
- name: MLFLOW_TRACKING_URI
value: "databricks"
- name: MLFLOW_REGISTRY_URI
value: "databricks-uc"
- name: API_PROXY
value: "http://localhost:8000/invocations"
- name: CHAT_APP_PORT
value: "3000"
- name: CHAT_PROXY_TIMEOUT_SECONDS
value: "300"
# Reference experiment resource from databricks.yml
- name: MLFLOW_EXPERIMENT_ID
valueFrom: "experiment"
# Lakebase instance name (must match instance_name in databricks.yml)
- name: LAKEBASE_INSTANCE_NAME
value: "<your-lakebase-instance-name>"
# Embedding configuration
- name: EMBEDDING_ENDPOINT
value: "databricks-gte-large-en"
- name: EMBEDDING_DIMS
value: "1024"yaml
command: ["uv", "run", "start-app"]
env:
- name: MLFLOW_TRACKING_URI
value: "databricks"
- name: MLFLOW_REGISTRY_URI
value: "databricks-uc"
- name: API_PROXY
value: "http://localhost:8000/invocations"
- name: CHAT_APP_PORT
value: "3000"
- name: CHAT_PROXY_TIMEOUT_SECONDS
value: "300"
# 引用databricks.yml中的实验资源
- name: MLFLOW_EXPERIMENT_ID
valueFrom: "experiment"
# Lakebase实例名称(必须与databricks.yml中的instance_name匹配)
- name: LAKEBASE_INSTANCE_NAME
value: "<your-lakebase-instance-name>"
# 嵌入配置
- name: EMBEDDING_ENDPOINT
value: "databricks-gte-large-en"
- name: EMBEDDING_DIMS
value: "1024"Troubleshooting
故障排查
| Issue | Cause | Solution |
|---|---|---|
| "embedding_dims is required when embedding_endpoint is specified" | Missing | Add |
| "relation 'store' does not exist" | Tables not initialized | Run |
| "Unable to resolve Lakebase instance 'None'" | Missing env var in deployed app | Add |
| "Unable to resolve Lakebase instance '...database.cloud.databricks.com'" | Used valueFrom instead of value | Use |
| "permission denied for table store" | Missing grants | The |
| "Failed to connect to Lakebase" | Wrong instance name | Verify instance name in databricks.yml and .env |
| Connection pool errors on exit | Python cleanup race | Ignore |
| App not updated after deploy | Forgot to run bundle | Run |
| valueFrom not resolving | Resource name mismatch | Ensure |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| "embedding_dims is required when embedding_endpoint is specified" | 缺少 | 为AsyncDatabricksStore添加 |
| "relation 'store' does not exist" | 表未初始化 | 先在本地运行 |
| "Unable to resolve Lakebase instance 'None'" | 部署的应用中缺少环境变量 | 在app.yaml中添加 |
| "Unable to resolve Lakebase instance '...database.cloud.databricks.com'" | 使用了valueFrom而非value | 对Lakebase使用 |
| "permission denied for table store" | 缺少权限授予 | DAB中的 |
| "Failed to connect to Lakebase" | 实例名称错误 | 验证databricks.yml和.env中的实例名称 |
| 退出时出现连接池错误 | Python清理竞争条件 | 忽略 |
| 部署后应用未更新 | 忘记运行bundle命令 | 部署后运行 |
| valueFrom未解析 | 资源名称不匹配 | 确保 |
Quick Reference: LakebaseClient API
快速参考:LakebaseClient API
For manual permission management (usually not needed with DAB resource):
databasepython
from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege
client = LakebaseClient(instance_name="...")如需手动管理权限(使用DAB 资源时通常不需要):
databasepython
from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege
client = LakebaseClient(instance_name="...")Create role (must do first)
创建角色(必须先执行此操作)
client.create_role(identity_name, "SERVICE_PRINCIPAL")
client.create_role(identity_name, "SERVICE_PRINCIPAL")
Grant schema (note: schemas is a list, grantee not role)
授予模式权限(注意:schemas是列表,grantee不是角色)
client.grant_schema(
grantee="...",
schemas=["public"],
privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE],
)
client.grant_schema(
grantee="...",
schemas=["public"],
privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE],
)
Grant tables (note: tables includes schema prefix)
授予表权限(注意:tables包含模式前缀)
client.grant_table(
grantee="...",
tables=["public.store"],
privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...],
)
client.grant_table(
grantee="...",
tables=["public.store"],
privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...],
)
Execute raw SQL
执行原始SQL
client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")
undefinedclient.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")
undefinedService Principal Identifiers
服务主体标识符
When granting permissions manually, note that Databricks apps have multiple identifiers:
| Field | Format | Example |
|---|---|---|
| Numeric ID | |
| UUID | |
| String name | |
Get all identifiers:
bash
databricks apps get <app-name> --output json | jq '{
id: .service_principal_id,
client_id: .service_principal_client_id,
name: .service_principal_name
}'Which to use:
- - Use
LakebaseClient.create_role()(UUID) orservice_principal_client_idservice_principal_name - Raw SQL grants - Use (UUID)
service_principal_client_id
手动授予权限时,请注意Databricks应用有多个标识符:
| 字段 | 格式 | 示例 |
|---|---|---|
| 数字ID | |
| UUID | |
| 字符串名称 | |
获取所有标识符:
bash
databricks apps get <app-name> --output json | jq '{
id: .service_principal_id,
client_id: .service_principal_client_id,
name: .service_principal_name
}'使用说明:
- - 使用
LakebaseClient.create_role()(UUID)或service_principal_client_idservice_principal_name - 原始SQL授予权限 - 使用(UUID)
service_principal_client_id
Next Steps
后续步骤
- Add memory to agent code: see agent-memory skill
- Test locally: see run-locally skill
- Deploy: see deploy skill
- 为Agent代码添加内存:查看agent-memory技能
- 本地测试:查看run-locally技能
- 部署:查看deploy技能