lakebase-setup

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Lakebase Setup for Agent Memory

为Agent内存配置Lakebase

Note: This template does not include memory by default. Use this skill if you want to add memory capabilities to your agent. For pre-configured memory templates, see:
agent-langgraph-short-term-memory
- Conversation history within a session
agent-langgraph-long-term-memory
- User facts that persist across sessions

注意： 本模板默认不包含内存功能。如果您想要为Agent添加内存功能，请使用本技能。如需预配置内存的模板，请查看：
agent-langgraph-short-term-memory
- 会话内的对话历史
agent-langgraph-long-term-memory
- 跨会话持久化的用户信息

Overview

概述

Lakebase provides persistent storage for agent memory:

Short-term memory: Conversation history within a thread (
```
AsyncCheckpointSaver
```
)
Long-term memory: User facts across sessions (
```
AsyncDatabricksStore
```
)

Lakebase为Agent内存提供持久化存储：

短期内存：线程内的对话历史（
```
AsyncCheckpointSaver
```
）
长期内存：跨会话的用户信息（
```
AsyncDatabricksStore
```
）

Complete Setup Workflow

完整设置流程

┌─────────────────────────────────────────────────────────────────────────────┐
│  1. Add dependency  →  2. Get instance  →  3. Configure DAB + app.yaml     │
│  4. Configure .env  →  5. Initialize tables  →  6. Deploy + Run      │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│  1. 添加依赖 → 2. 获取实例 → 3. 配置DAB + app.yaml     │
│  4. 配置.env → 5. 初始化表 → 6. 部署并运行      │
└─────────────────────────────────────────────────────────────────────────────┘

Step 1: Add Memory Dependency

步骤1：添加内存依赖

Add the memory extra to your

pyproject.toml

toml

dependencies = [
    "databricks-langchain[memory]",
    # ... other dependencies
]

Then sync dependencies:

bash

uv sync

在您的

pyproject.toml

中添加内存扩展依赖：

toml

dependencies = [
    "databricks-langchain[memory]",
    # ... 其他依赖
]

然后同步依赖：

bash

uv sync

Step 2: Create or Get Lakebase Instance

步骤2：创建或获取Lakebase实例

Option A: Create New Instance (via Databricks UI)

选项A：创建新实例（通过Databricks UI）

Go to your Databricks workspace
Navigate to Compute → Lakebase
Click Create Instance
Note the instance name

进入您的Databricks工作区
导航至Compute → Lakebase
点击Create Instance
记录实例名称

Option B: Use Existing Instance

选项B：使用现有实例

If you have an existing instance, note its name for the next step.

如果您已有实例，请记录其名称以便后续步骤使用。

Step 3: Configure databricks.yml (Lakebase Resource)

步骤3：配置databricks.yml（Lakebase资源）

Add the Lakebase

database

resource to your app in

databricks.yml

yaml

resources:
  apps:
    agent_langgraph:
      name: "your-app-name"
      source_code_path: ./

      resources:
        # ... other resources (experiment, UC functions, etc.) ...

        # Lakebase instance for long-term memory
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

Important:

The

instance_name: '<your-lakebase-instance-name>'

must match the

value

reference in

app.yaml

Using the
```
database
```
resource type automatically grants the app's service principal access to Lakebase

在

databricks.yml

中为您的应用添加Lakebase

database

资源：

yaml

resources:
  apps:
    agent_langgraph:
      name: "your-app-name"
      source_code_path: ./

      resources:
        # ... 其他资源（实验、UC函数等） ...

        # 用于长期内存的Lakebase实例
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

重要提示：

instance_name: '<your-lakebase-instance-name>'

必须与

app.yaml

中的引用值匹配

使用
```
database
```
资源类型会自动为应用的服务主体授予Lakebase访问权限

Update app.yaml (Environment Variables)

更新app.yaml（环境变量）

Update

app.yaml

with the Lakebase instance name:

yaml

env:
  # ... other env vars ...

  # Lakebase instance name - must match instance_name in databricks.yml database resource
  # Note: Use 'value' (not 'valueFrom') because AsyncDatabricksStore needs the instance name,
  # not the full connection string that valueFrom would provide
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"

  # Static values for embedding configuration
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

Important:

The

LAKEBASE_INSTANCE_NAME

value must match the

instance_name

in your

databricks.yml

database resource

The
```
database
```
resource handles permissions;
```
app.yaml
```
provides the instance name to your code
Don't use
```
valueFrom
```
for Lakebase - it provides the connection string, not the instance name

在

app.yaml

中更新Lakebase实例名称：

yaml

env:
  # ... 其他环境变量 ...

  # Lakebase实例名称 - 必须与databricks.yml database资源中的instance_name匹配
  # 注意：使用'value'（而非'valueFrom'），因为AsyncDatabricksStore需要实例名称，
  # 而不是valueFrom提供的完整连接字符串
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"

  # 嵌入配置的静态值
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

重要提示：

LAKEBASE_INSTANCE_NAME

的值必须与

databricks.yml

database资源中的

instance_name

匹配

```
database
```
资源负责处理权限；
```
app.yaml
```
为代码提供实例名称
不要为Lakebase使用
```
valueFrom
```
- 它提供的是连接字符串，而非实例名称

Step 4: Configure .env (Local Development)

步骤4：配置.env（本地开发）

For local development, add to

.env

bash

undefined

对于本地开发，在

.env

中添加以下内容：

bash

undefined

Lakebase configuration for long-term memory

用于长期内存的Lakebase配置

LAKEBASE_INSTANCE_NAME=<your-instance-name> EMBEDDING_ENDPOINT=databricks-gte-large-en EMBEDDING_DIMS=1024


**Important:** `embedding_dims` must match the embedding endpoint:

| Endpoint | Dimensions |
|----------|------------|
| `databricks-gte-large-en` | 1024 |
| `databricks-bge-large-en` | 1024 |

> **Note:** `.env` is only for local development. When deployed, the app gets `LAKEBASE_INSTANCE_NAME` from the `valueFrom` reference in `app.yaml`.

---

LAKEBASE_INSTANCE_NAME=<your-instance-name> EMBEDDING_ENDPOINT=databricks-gte-large-en EMBEDDING_DIMS=1024


**重要提示：** `embedding_dims`必须与嵌入端点匹配：

| 端点 | 维度 |
|----------|------------|
| `databricks-gte-large-en` | 1024 |
| `databricks-bge-large-en` | 1024 |

> **注意：** `.env`仅用于本地开发。部署后，应用会从`app.yaml`的`valueFrom`引用中获取`LAKEBASE_INSTANCE_NAME`。

---

Step 5: Initialize Store Tables (CRITICAL - First Time Only)

步骤5：初始化存储表（关键 - 仅首次需要）

Before deploying, you must initialize the Lakebase tables. The

AsyncDatabricksStore

creates tables on first use, but you need to do this locally first:

python

undefined

部署前，您必须初始化Lakebase表。

AsyncDatabricksStore

会在首次使用时创建表，但您需要先在本地执行此操作：

python

undefined

Run this script locally BEFORE first deployment

首次部署前在本地运行此脚本

import asyncio from databricks_langchain import AsyncDatabricksStore

async def setup_store(): async with AsyncDatabricksStore( instance_name="<your-instance-name>", embedding_endpoint="databricks-gte-large-en", embedding_dims=1024, ) as store: print("Setting up store tables...") await store.setup() # Creates required tables print("Store tables created!")

    # Verify with a test write/read
    await store.aput(("test", "init"), "test_key", {"value": "test_value"})
    results = await store.asearch(("test", "init"), query="test", limit=1)
    print(f"Test successful: {results}")

asyncio.run(setup_store())


Run with:
```bash
uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup():
    async with AsyncDatabricksStore(
        instance_name="<your-instance-name>",
        embedding_endpoint="databricks-gte-large-en",
        embedding_dims=1024,
    ) as store:
        await store.setup()
        print("Tables created!")

asyncio.run(setup())
EOF
)"

This creates these tables in the

public

schema:

```
store
```
- Key-value storage for memories
```
store_vectors
```
- Vector embeddings for semantic search
```
store_migrations
```
- Schema migration tracking
```
vector_migrations
```
- Vector schema migration tracking

import asyncio from databricks_langchain import AsyncDatabricksStore

async def setup_store(): async with AsyncDatabricksStore( instance_name="<your-instance-name>", embedding_endpoint="databricks-gte-large-en", embedding_dims=1024, ) as store: print("正在设置存储表...") await store.setup() # 创建所需表 print("存储表创建完成！")

    # 通过测试写入/读取验证
    await store.aput(("test", "init"), "test_key", {"value": "test_value"})
    results = await store.asearch(("test", "init"), query="test", limit=1)
    print(f"测试成功：{results}")

asyncio.run(setup_store())


运行命令：
```bash
uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup():
    async with AsyncDatabricksStore(
        instance_name="<your-instance-name>",
        embedding_endpoint="databricks-gte-large-en",
        embedding_dims=1024,
    ) as store:
        await store.setup()
        print("表创建完成！")

asyncio.run(setup())
EOF
)"

这会在

public

模式下创建以下表：

```
store
```
- 内存的键值存储
```
store_vectors
```
- 用于语义搜索的向量嵌入
```
store_migrations
```
- 模式迁移跟踪
```
vector_migrations
```
- 向量模式迁移跟踪

Step 6: Deploy and Run Your App

步骤6：部署并运行您的应用

IMPORTANT: Always run both

deploy

AND

run

commands:

bash

undefined

重要提示： 请务必同时运行

deploy

和

run

命令：

bash

undefined

Deploy resources and upload files

部署资源并上传文件

databricks bundle deploy

Start/restart the app with new code (REQUIRED!)

启动/重启应用以加载新代码（必须执行！）

databricks bundle run agent_langgraph


> **Note:** `bundle deploy` only uploads files and configures resources. `bundle run` is required to actually start the app with the new code.

---

databricks bundle run agent_langgraph


> **注意：** `bundle deploy`仅上传文件并配置资源。`bundle run`才会实际使用新代码启动应用。

---

Complete Example: databricks.yml with Lakebase

完整示例：包含Lakebase的databricks.yml

yaml

bundle:
  name: agent_langgraph

resources:
  experiments:
    agent_langgraph_experiment:
      name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}

  apps:
    agent_langgraph:
      name: "my-agent-app"
      description: "Agent with long-term memory"
      source_code_path: ./

      resources:
        - name: 'experiment'
          experiment:
            experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
            permission: 'CAN_MANAGE'

        # Lakebase instance for long-term memory
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

targets:
  dev:
    mode: development
    default: true

yaml

bundle:
  name: agent_langgraph

resources:
  experiments:
    agent_langgraph_experiment:
      name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}

  apps:
    agent_langgraph:
      name: "my-agent-app"
      description: "具备长期内存的Agent"
      source_code_path: ./

      resources:
        - name: 'experiment'
          experiment:
            experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
            permission: 'CAN_MANAGE'

        # 用于长期内存的Lakebase实例
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

targets:
  dev:
    mode: development
    default: true

Complete Example: app.yaml

完整示例：app.yaml

yaml

command: ["uv", "run", "start-app"]

env:
  - name: MLFLOW_TRACKING_URI
    value: "databricks"
  - name: MLFLOW_REGISTRY_URI
    value: "databricks-uc"
  - name: API_PROXY
    value: "http://localhost:8000/invocations"
  - name: CHAT_APP_PORT
    value: "3000"
  - name: CHAT_PROXY_TIMEOUT_SECONDS
    value: "300"
  # Reference experiment resource from databricks.yml
  - name: MLFLOW_EXPERIMENT_ID
    valueFrom: "experiment"
  # Lakebase instance name (must match instance_name in databricks.yml)
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"
  # Embedding configuration
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

yaml

command: ["uv", "run", "start-app"]

env:
  - name: MLFLOW_TRACKING_URI
    value: "databricks"
  - name: MLFLOW_REGISTRY_URI
    value: "databricks-uc"
  - name: API_PROXY
    value: "http://localhost:8000/invocations"
  - name: CHAT_APP_PORT
    value: "3000"
  - name: CHAT_PROXY_TIMEOUT_SECONDS
    value: "300"
  # 引用databricks.yml中的实验资源
  - name: MLFLOW_EXPERIMENT_ID
    valueFrom: "experiment"
  # Lakebase实例名称（必须与databricks.yml中的instance_name匹配）
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"
  # 嵌入配置
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

Troubleshooting

故障排查

Issue	Cause	Solution
"embedding_dims is required when embedding_endpoint is specified"	Missing `embedding_dims` parameter	Add `embedding_dims=1024` to AsyncDatabricksStore
"relation 'store' does not exist"	Tables not initialized	Run `await store.setup()` locally first (Step 5)
"Unable to resolve Lakebase instance 'None'"	Missing env var in deployed app	Add `LAKEBASE_INSTANCE_NAME` value to app.yaml
"Unable to resolve Lakebase instance '...database.cloud.databricks.com'"	Used valueFrom instead of value	Use `value: "<instance-name>"` not `valueFrom` for Lakebase
"permission denied for table store"	Missing grants	The `database` resource in DAB should handle this; verify the resource is configured
"Failed to connect to Lakebase"	Wrong instance name	Verify instance name in databricks.yml and .env
Connection pool errors on exit	Python cleanup race	Ignore `PythonFinalizationError` - it's harmless
App not updated after deploy	Forgot to run bundle	Run `databricks bundle run agent_langgraph` after deploy
valueFrom not resolving	Resource name mismatch	Ensure `valueFrom` value matches `name` in databricks.yml resources

问题	原因	解决方案
"embedding_dims is required when embedding_endpoint is specified"	缺少 `embedding_dims` 参数	为AsyncDatabricksStore添加 `embedding_dims=1024`
"relation 'store' does not exist"	表未初始化	先在本地运行 `await store.setup()` （步骤5）
"Unable to resolve Lakebase instance 'None'"	部署的应用中缺少环境变量	在app.yaml中添加 `LAKEBASE_INSTANCE_NAME` 值
"Unable to resolve Lakebase instance '...database.cloud.databricks.com'"	使用了valueFrom而非value	对Lakebase使用 `value: "<instance-name>"` 而非 `valueFrom`
"permission denied for table store"	缺少权限授予	DAB中的 `database` 资源应处理此问题；验证资源配置是否正确
"Failed to connect to Lakebase"	实例名称错误	验证databricks.yml和.env中的实例名称
退出时出现连接池错误	Python清理竞争条件	忽略 `PythonFinalizationError` - 此错误无影响
部署后应用未更新	忘记运行bundle命令	部署后运行 `databricks bundle run agent_langgraph`
valueFrom未解析	资源名称不匹配	确保 `valueFrom` 值与databricks.yml资源中的 `name` 匹配

Quick Reference: LakebaseClient API

快速参考：LakebaseClient API

For manual permission management (usually not needed with DAB

database

resource):

python

from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

client = LakebaseClient(instance_name="...")

如需手动管理权限（使用DAB

database

资源时通常不需要）：

python

from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

client = LakebaseClient(instance_name="...")

Create role (must do first)

创建角色（必须先执行此操作）

client.create_role(identity_name, "SERVICE_PRINCIPAL")

Grant schema (note: schemas is a list, grantee not role)

授予模式权限（注意：schemas是列表，grantee不是角色）

client.grant_schema( grantee="...", schemas=["public"], privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE], )

Grant tables (note: tables includes schema prefix)

授予表权限（注意：tables包含模式前缀）

client.grant_table( grantee="...", tables=["public.store"], privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...], )

Execute raw SQL

执行原始SQL

client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")

undefined

client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")

undefined

Service Principal Identifiers

服务主体标识符

When granting permissions manually, note that Databricks apps have multiple identifiers:

Field	Format	Example
`service_principal_id`	Numeric ID	`1234567890123456`
`service_principal_client_id`	UUID	`a1b2c3d4-e5f6-7890-abcd-ef1234567890`
`service_principal_name`	String name	`my-app-service-principal`

Get all identifiers:

bash

databricks apps get <app-name> --output json | jq '{
  id: .service_principal_id,
  client_id: .service_principal_client_id,
  name: .service_principal_name
}'

Which to use:

LakebaseClient.create_role()

- Use

service_principal_client_id

(UUID) or

service_principal_name

Raw SQL grants - Use
```
service_principal_client_id
```
(UUID)

手动授予权限时，请注意Databricks应用有多个标识符：

字段	格式	示例
`service_principal_id`	数字ID	`1234567890123456`
`service_principal_client_id`	UUID	`a1b2c3d4-e5f6-7890-abcd-ef1234567890`
`service_principal_name`	字符串名称	`my-app-service-principal`

获取所有标识符：

bash

databricks apps get <app-name> --output json | jq '{
  id: .service_principal_id,
  client_id: .service_principal_client_id,
  name: .service_principal_name
}'

使用说明：

LakebaseClient.create_role()

- 使用

service_principal_client_id

（UUID）或

service_principal_name

原始SQL授予权限 - 使用
```
service_principal_client_id
```
（UUID）

Next Steps

后续步骤

Add memory to agent code: see agent-memory skill
Test locally: see run-locally skill
Deploy: see deploy skill

为Agent代码添加内存：查看agent-memory技能
本地测试：查看run-locally技能
部署：查看deploy技能