lakebase-setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Lakebase Setup for Agent Memory

为Agent内存配置Lakebase

Note: This template does not include memory by default. Use this skill if you want to add memory capabilities to your agent. For pre-configured memory templates, see:
  • agent-langgraph-short-term-memory
    - Conversation history within a session
  • agent-langgraph-long-term-memory
    - User facts that persist across sessions
注意: 本模板默认不包含内存功能。如果您想要为Agent添加内存功能,请使用本技能。如需预配置内存的模板,请查看:
  • agent-langgraph-short-term-memory
    - 会话内的对话历史
  • agent-langgraph-long-term-memory
    - 跨会话持久化的用户信息

Overview

概述

Lakebase provides persistent storage for agent memory:
  • Short-term memory: Conversation history within a thread (
    AsyncCheckpointSaver
    )
  • Long-term memory: User facts across sessions (
    AsyncDatabricksStore
    )
Lakebase为Agent内存提供持久化存储:
  • 短期内存:线程内的对话历史(
    AsyncCheckpointSaver
  • 长期内存:跨会话的用户信息(
    AsyncDatabricksStore

Complete Setup Workflow

完整设置流程

┌─────────────────────────────────────────────────────────────────────────────┐
│  1. Add dependency  →  2. Get instance  →  3. Configure DAB + app.yaml     │
│  4. Configure .env  →  5. Initialize tables  →  6. Deploy + Run      │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│  1. 添加依赖 → 2. 获取实例 → 3. 配置DAB + app.yaml     │
│  4. 配置.env → 5. 初始化表 → 6. 部署并运行      │
└─────────────────────────────────────────────────────────────────────────────┘

Step 1: Add Memory Dependency

步骤1:添加内存依赖

Add the memory extra to your
pyproject.toml
:
toml
dependencies = [
    "databricks-langchain[memory]",
    # ... other dependencies
]
Then sync dependencies:
bash
uv sync

在您的
pyproject.toml
中添加内存扩展依赖:
toml
dependencies = [
    "databricks-langchain[memory]",
    # ... 其他依赖
]
然后同步依赖:
bash
uv sync

Step 2: Create or Get Lakebase Instance

步骤2:创建或获取Lakebase实例

Option A: Create New Instance (via Databricks UI)

选项A:创建新实例(通过Databricks UI)

  1. Go to your Databricks workspace
  2. Navigate to ComputeLakebase
  3. Click Create Instance
  4. Note the instance name
  1. 进入您的Databricks工作区
  2. 导航至ComputeLakebase
  3. 点击Create Instance
  4. 记录实例名称

Option B: Use Existing Instance

选项B:使用现有实例

If you have an existing instance, note its name for the next step.

如果您已有实例,请记录其名称以便后续步骤使用。

Step 3: Configure databricks.yml (Lakebase Resource)

步骤3:配置databricks.yml(Lakebase资源)

Add the Lakebase
database
resource to your app in
databricks.yml
:
yaml
resources:
  apps:
    agent_langgraph:
      name: "your-app-name"
      source_code_path: ./

      resources:
        # ... other resources (experiment, UC functions, etc.) ...

        # Lakebase instance for long-term memory
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'
Important:
  • The
    instance_name: '<your-lakebase-instance-name>'
    must match the
    value
    reference in
    app.yaml
  • Using the
    database
    resource type automatically grants the app's service principal access to Lakebase
databricks.yml
中为您的应用添加Lakebase
database
资源:
yaml
resources:
  apps:
    agent_langgraph:
      name: "your-app-name"
      source_code_path: ./

      resources:
        # ... 其他资源(实验、UC函数等) ...

        # 用于长期内存的Lakebase实例
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'
重要提示:
  • instance_name: '<your-lakebase-instance-name>'
    必须与
    app.yaml
    中的引用值匹配
  • 使用
    database
    资源类型会自动为应用的服务主体授予Lakebase访问权限

Update app.yaml (Environment Variables)

更新app.yaml(环境变量)

Update
app.yaml
with the Lakebase instance name:
yaml
env:
  # ... other env vars ...

  # Lakebase instance name - must match instance_name in databricks.yml database resource
  # Note: Use 'value' (not 'valueFrom') because AsyncDatabricksStore needs the instance name,
  # not the full connection string that valueFrom would provide
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"

  # Static values for embedding configuration
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"
Important:
  • The
    LAKEBASE_INSTANCE_NAME
    value must match the
    instance_name
    in your
    databricks.yml
    database resource
  • The
    database
    resource handles permissions;
    app.yaml
    provides the instance name to your code
  • Don't use
    valueFrom
    for Lakebase - it provides the connection string, not the instance name

app.yaml
中更新Lakebase实例名称:
yaml
env:
  # ... 其他环境变量 ...

  # Lakebase实例名称 - 必须与databricks.yml database资源中的instance_name匹配
  # 注意:使用'value'(而非'valueFrom'),因为AsyncDatabricksStore需要实例名称,
  # 而不是valueFrom提供的完整连接字符串
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"

  # 嵌入配置的静态值
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"
重要提示:
  • LAKEBASE_INSTANCE_NAME
    的值必须与
    databricks.yml
    database资源中的
    instance_name
    匹配
  • database
    资源负责处理权限;
    app.yaml
    为代码提供实例名称
  • 不要为Lakebase使用
    valueFrom
    - 它提供的是连接字符串,而非实例名称

Step 4: Configure .env (Local Development)

步骤4:配置.env(本地开发)

For local development, add to
.env
:
bash
undefined
对于本地开发,在
.env
中添加以下内容:
bash
undefined

Lakebase configuration for long-term memory

用于长期内存的Lakebase配置

LAKEBASE_INSTANCE_NAME=<your-instance-name> EMBEDDING_ENDPOINT=databricks-gte-large-en EMBEDDING_DIMS=1024

**Important:** `embedding_dims` must match the embedding endpoint:

| Endpoint | Dimensions |
|----------|------------|
| `databricks-gte-large-en` | 1024 |
| `databricks-bge-large-en` | 1024 |

> **Note:** `.env` is only for local development. When deployed, the app gets `LAKEBASE_INSTANCE_NAME` from the `valueFrom` reference in `app.yaml`.

---
LAKEBASE_INSTANCE_NAME=<your-instance-name> EMBEDDING_ENDPOINT=databricks-gte-large-en EMBEDDING_DIMS=1024

**重要提示:** `embedding_dims`必须与嵌入端点匹配:

| 端点 | 维度 |
|----------|------------|
| `databricks-gte-large-en` | 1024 |
| `databricks-bge-large-en` | 1024 |

> **注意:** `.env`仅用于本地开发。部署后,应用会从`app.yaml`的`valueFrom`引用中获取`LAKEBASE_INSTANCE_NAME`。

---

Step 5: Initialize Store Tables (CRITICAL - First Time Only)

步骤5:初始化存储表(关键 - 仅首次需要)

Before deploying, you must initialize the Lakebase tables. The
AsyncDatabricksStore
creates tables on first use, but you need to do this locally first:
python
undefined
部署前,您必须初始化Lakebase表。
AsyncDatabricksStore
会在首次使用时创建表,但您需要先在本地执行此操作:
python
undefined

Run this script locally BEFORE first deployment

首次部署前在本地运行此脚本

import asyncio from databricks_langchain import AsyncDatabricksStore
async def setup_store(): async with AsyncDatabricksStore( instance_name="<your-instance-name>", embedding_endpoint="databricks-gte-large-en", embedding_dims=1024, ) as store: print("Setting up store tables...") await store.setup() # Creates required tables print("Store tables created!")
    # Verify with a test write/read
    await store.aput(("test", "init"), "test_key", {"value": "test_value"})
    results = await store.asearch(("test", "init"), query="test", limit=1)
    print(f"Test successful: {results}")
asyncio.run(setup_store())

Run with:
```bash
uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup():
    async with AsyncDatabricksStore(
        instance_name="<your-instance-name>",
        embedding_endpoint="databricks-gte-large-en",
        embedding_dims=1024,
    ) as store:
        await store.setup()
        print("Tables created!")

asyncio.run(setup())
EOF
)"
This creates these tables in the
public
schema:
  • store
    - Key-value storage for memories
  • store_vectors
    - Vector embeddings for semantic search
  • store_migrations
    - Schema migration tracking
  • vector_migrations
    - Vector schema migration tracking

import asyncio from databricks_langchain import AsyncDatabricksStore
async def setup_store(): async with AsyncDatabricksStore( instance_name="<your-instance-name>", embedding_endpoint="databricks-gte-large-en", embedding_dims=1024, ) as store: print("正在设置存储表...") await store.setup() # 创建所需表 print("存储表创建完成!")
    # 通过测试写入/读取验证
    await store.aput(("test", "init"), "test_key", {"value": "test_value"})
    results = await store.asearch(("test", "init"), query="test", limit=1)
    print(f"测试成功:{results}")
asyncio.run(setup_store())

运行命令:
```bash
uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup():
    async with AsyncDatabricksStore(
        instance_name="<your-instance-name>",
        embedding_endpoint="databricks-gte-large-en",
        embedding_dims=1024,
    ) as store:
        await store.setup()
        print("表创建完成!")

asyncio.run(setup())
EOF
)"
这会在
public
模式下创建以下表:
  • store
    - 内存的键值存储
  • store_vectors
    - 用于语义搜索的向量嵌入
  • store_migrations
    - 模式迁移跟踪
  • vector_migrations
    - 向量模式迁移跟踪

Step 6: Deploy and Run Your App

步骤6:部署并运行您的应用

IMPORTANT: Always run both
deploy
AND
run
commands:
bash
undefined
重要提示: 请务必同时运行
deploy
run
命令:
bash
undefined

Deploy resources and upload files

部署资源并上传文件

databricks bundle deploy
databricks bundle deploy

Start/restart the app with new code (REQUIRED!)

启动/重启应用以加载新代码(必须执行!)

databricks bundle run agent_langgraph

> **Note:** `bundle deploy` only uploads files and configures resources. `bundle run` is required to actually start the app with the new code.

---
databricks bundle run agent_langgraph

> **注意:** `bundle deploy`仅上传文件并配置资源。`bundle run`才会实际使用新代码启动应用。

---

Complete Example: databricks.yml with Lakebase

完整示例:包含Lakebase的databricks.yml

yaml
bundle:
  name: agent_langgraph

resources:
  experiments:
    agent_langgraph_experiment:
      name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}

  apps:
    agent_langgraph:
      name: "my-agent-app"
      description: "Agent with long-term memory"
      source_code_path: ./

      resources:
        - name: 'experiment'
          experiment:
            experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
            permission: 'CAN_MANAGE'

        # Lakebase instance for long-term memory
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

targets:
  dev:
    mode: development
    default: true
yaml
bundle:
  name: agent_langgraph

resources:
  experiments:
    agent_langgraph_experiment:
      name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}

  apps:
    agent_langgraph:
      name: "my-agent-app"
      description: "具备长期内存的Agent"
      source_code_path: ./

      resources:
        - name: 'experiment'
          experiment:
            experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
            permission: 'CAN_MANAGE'

        # 用于长期内存的Lakebase实例
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

targets:
  dev:
    mode: development
    default: true

Complete Example: app.yaml

完整示例:app.yaml

yaml
command: ["uv", "run", "start-app"]

env:
  - name: MLFLOW_TRACKING_URI
    value: "databricks"
  - name: MLFLOW_REGISTRY_URI
    value: "databricks-uc"
  - name: API_PROXY
    value: "http://localhost:8000/invocations"
  - name: CHAT_APP_PORT
    value: "3000"
  - name: CHAT_PROXY_TIMEOUT_SECONDS
    value: "300"
  # Reference experiment resource from databricks.yml
  - name: MLFLOW_EXPERIMENT_ID
    valueFrom: "experiment"
  # Lakebase instance name (must match instance_name in databricks.yml)
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"
  # Embedding configuration
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

yaml
command: ["uv", "run", "start-app"]

env:
  - name: MLFLOW_TRACKING_URI
    value: "databricks"
  - name: MLFLOW_REGISTRY_URI
    value: "databricks-uc"
  - name: API_PROXY
    value: "http://localhost:8000/invocations"
  - name: CHAT_APP_PORT
    value: "3000"
  - name: CHAT_PROXY_TIMEOUT_SECONDS
    value: "300"
  # 引用databricks.yml中的实验资源
  - name: MLFLOW_EXPERIMENT_ID
    valueFrom: "experiment"
  # Lakebase实例名称(必须与databricks.yml中的instance_name匹配)
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"
  # 嵌入配置
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

Troubleshooting

故障排查

IssueCauseSolution
"embedding_dims is required when embedding_endpoint is specified"Missing
embedding_dims
parameter
Add
embedding_dims=1024
to AsyncDatabricksStore
"relation 'store' does not exist"Tables not initializedRun
await store.setup()
locally first (Step 5)
"Unable to resolve Lakebase instance 'None'"Missing env var in deployed appAdd
LAKEBASE_INSTANCE_NAME
value to app.yaml
"Unable to resolve Lakebase instance '...database.cloud.databricks.com'"Used valueFrom instead of valueUse
value: "<instance-name>"
not
valueFrom
for Lakebase
"permission denied for table store"Missing grantsThe
database
resource in DAB should handle this; verify the resource is configured
"Failed to connect to Lakebase"Wrong instance nameVerify instance name in databricks.yml and .env
Connection pool errors on exitPython cleanup raceIgnore
PythonFinalizationError
- it's harmless
App not updated after deployForgot to run bundleRun
databricks bundle run agent_langgraph
after deploy
valueFrom not resolvingResource name mismatchEnsure
valueFrom
value matches
name
in databricks.yml resources

问题原因解决方案
"embedding_dims is required when embedding_endpoint is specified"缺少
embedding_dims
参数
为AsyncDatabricksStore添加
embedding_dims=1024
"relation 'store' does not exist"表未初始化先在本地运行
await store.setup()
(步骤5)
"Unable to resolve Lakebase instance 'None'"部署的应用中缺少环境变量在app.yaml中添加
LAKEBASE_INSTANCE_NAME
"Unable to resolve Lakebase instance '...database.cloud.databricks.com'"使用了valueFrom而非value对Lakebase使用
value: "<instance-name>"
而非
valueFrom
"permission denied for table store"缺少权限授予DAB中的
database
资源应处理此问题;验证资源配置是否正确
"Failed to connect to Lakebase"实例名称错误验证databricks.yml和.env中的实例名称
退出时出现连接池错误Python清理竞争条件忽略
PythonFinalizationError
- 此错误无影响
部署后应用未更新忘记运行bundle命令部署后运行
databricks bundle run agent_langgraph
valueFrom未解析资源名称不匹配确保
valueFrom
值与databricks.yml资源中的
name
匹配

Quick Reference: LakebaseClient API

快速参考:LakebaseClient API

For manual permission management (usually not needed with DAB
database
resource):
python
from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

client = LakebaseClient(instance_name="...")
如需手动管理权限(使用DAB
database
资源时通常不需要):
python
from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

client = LakebaseClient(instance_name="...")

Create role (must do first)

创建角色(必须先执行此操作)

client.create_role(identity_name, "SERVICE_PRINCIPAL")
client.create_role(identity_name, "SERVICE_PRINCIPAL")

Grant schema (note: schemas is a list, grantee not role)

授予模式权限(注意:schemas是列表,grantee不是角色)

client.grant_schema( grantee="...", schemas=["public"], privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE], )
client.grant_schema( grantee="...", schemas=["public"], privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE], )

Grant tables (note: tables includes schema prefix)

授予表权限(注意:tables包含模式前缀)

client.grant_table( grantee="...", tables=["public.store"], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...], )
client.grant_table( grantee="...", tables=["public.store"], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...], )

Execute raw SQL

执行原始SQL

client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")
undefined
client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")
undefined

Service Principal Identifiers

服务主体标识符

When granting permissions manually, note that Databricks apps have multiple identifiers:
FieldFormatExample
service_principal_id
Numeric ID
1234567890123456
service_principal_client_id
UUID
a1b2c3d4-e5f6-7890-abcd-ef1234567890
service_principal_name
String name
my-app-service-principal
Get all identifiers:
bash
databricks apps get <app-name> --output json | jq '{
  id: .service_principal_id,
  client_id: .service_principal_client_id,
  name: .service_principal_name
}'
Which to use:
  • LakebaseClient.create_role()
    - Use
    service_principal_client_id
    (UUID) or
    service_principal_name
  • Raw SQL grants - Use
    service_principal_client_id
    (UUID)

手动授予权限时,请注意Databricks应用有多个标识符:
字段格式示例
service_principal_id
数字ID
1234567890123456
service_principal_client_id
UUID
a1b2c3d4-e5f6-7890-abcd-ef1234567890
service_principal_name
字符串名称
my-app-service-principal
获取所有标识符:
bash
databricks apps get <app-name> --output json | jq '{
  id: .service_principal_id,
  client_id: .service_principal_client_id,
  name: .service_principal_name
}'
使用说明:
  • LakebaseClient.create_role()
    - 使用
    service_principal_client_id
    (UUID)或
    service_principal_name
  • 原始SQL授予权限 - 使用
    service_principal_client_id
    (UUID)

Next Steps

后续步骤

  • Add memory to agent code: see agent-memory skill
  • Test locally: see run-locally skill
  • Deploy: see deploy skill
  • 为Agent代码添加内存:查看agent-memory技能
  • 本地测试:查看run-locally技能
  • 部署:查看deploy技能