lakebase-autoscale

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Lakebase Autoscaling

Lakebase Autoscaling

Patterns and best practices for using Lakebase Autoscaling, the next-generation managed PostgreSQL on Databricks with autoscaling compute, branching, scale-to-zero, and instant restore.
使用Lakebase Autoscaling的模式与最佳实践——这是Databricks上针对OLTP工作负载的下一代托管PostgreSQL服务,具备自动扩缩容计算资源、类Git分支、缩容至零以及即时时间点恢复功能。

When to Use

适用场景

Use this skill when:
  • Building applications that need a PostgreSQL database with autoscaling compute
  • Working with database branching for dev/test/staging workflows
  • Adding persistent state to applications with scale-to-zero cost savings
  • Implementing reverse ETL from Delta Lake to an operational database via synced tables
  • Managing Lakebase Autoscaling projects, branches, computes, or credentials
在以下场景中使用该技能:
  • 构建需要具备自动扩缩容计算资源的PostgreSQL数据库的应用
  • 为开发/测试/预发布工作流使用数据库分支功能
  • 为应用添加持久化状态并通过缩容至零节省成本
  • 通过同步表实现从Delta Lake到业务数据库的反向ETL
  • 管理Lakebase Autoscaling的项目、分支、计算资源或凭证

Overview

概述

Lakebase Autoscaling is Databricks' next-generation managed PostgreSQL service for OLTP workloads. It provides autoscaling compute, Git-like branching, scale-to-zero, and instant point-in-time restore.
FeatureDescription
Autoscaling Compute0.5-112 CU with 2 GB RAM per CU; scales dynamically based on load
Scale-to-ZeroCompute suspends after configurable inactivity timeout
BranchingCreate isolated database environments (like Git branches) for dev/test
Instant RestorePoint-in-time restore from any moment within the configured window (up to 35 days)
OAuth AuthenticationToken-based auth via Databricks SDK (1-hour expiry)
Reverse ETLSync data from Delta tables to PostgreSQL via synced tables
Available Regions (AWS): us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2
Available Regions (Azure Beta): eastus2, westeurope, westus
Lakebase Autoscaling是Databricks针对OLTP工作负载推出的下一代托管PostgreSQL服务。它提供自动扩缩容计算资源、类Git分支、缩容至零以及即时时间点恢复功能。
功能说明
自动扩缩容计算资源0.5-112 CU,每CU配备2GB内存;根据负载动态扩缩容
缩容至零在配置的闲置超时后,计算资源会自动暂停
分支功能创建隔离的数据库环境(类似Git分支),用于开发/测试
即时恢复可在配置的时间窗口内(最长35天)从任意时间点进行恢复
OAuth认证通过Databricks SDK实现基于令牌的认证(令牌1小时后过期)
反向ETL通过同步表将Delta表的数据同步至PostgreSQL
可用区域(AWS):us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2
可用区域(Azure Beta):eastus2, westeurope, westus

Project Hierarchy

项目层级

Understanding the hierarchy is essential for working with Lakebase Autoscaling:
Project (top-level container)
  └── Branch(es) (isolated database environments)
        ├── Compute (primary R/W endpoint)
        ├── Read Replica(s) (optional, read-only)
        ├── Role(s) (Postgres roles)
        └── Database(s) (Postgres databases)
              └── Schema(s)
ObjectDescription
ProjectTop-level container. Created via
w.postgres.create_project()
.
BranchIsolated database environment with copy-on-write storage. Default branch is
production
.
ComputePostgres server powering a branch. Configurable CU sizing and autoscaling.
DatabaseStandard Postgres database within a branch. Default is
databricks_postgres
.
理解层级结构是使用Lakebase Autoscaling的关键:
Project (top-level container)
  └── Branch(es) (isolated database environments)
        ├── Compute (primary R/W endpoint)
        ├── Read Replica(s) (optional, read-only)
        ├── Role(s) (Postgres roles)
        └── Database(s) (Postgres databases)
              └── Schema(s)
对象说明
项目(Project)顶级容器。通过
w.postgres.create_project()
创建。
分支(Branch)具备写时复制存储的隔离数据库环境。默认分支是
production
计算资源(Compute)为分支提供支撑的Postgres服务器。可配置CU规格和自动扩缩容。
数据库(Database)分支内的标准Postgres数据库。默认数据库为
databricks_postgres

Quick Start

快速入门

Create a project and connect:
python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Project, ProjectSpec

w = WorkspaceClient()
创建项目并连接:
python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Project, ProjectSpec

w = WorkspaceClient()

Create a project (long-running operation)

Create a project (long-running operation)

operation = w.postgres.create_project( project=Project( spec=ProjectSpec( display_name="My Application", pg_version="17" ) ), project_id="my-app" ) result = operation.wait() print(f"Created project: {result.name}")
undefined
operation = w.postgres.create_project( project=Project( spec=ProjectSpec( display_name="My Application", pg_version="17" ) ), project_id="my-app" ) result = operation.wait() print(f"Created project: {result.name}")
undefined

Common Patterns

常见模式

Generate OAuth Token

生成OAuth令牌

python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
python
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

Generate database credential for connecting (optionally scoped to an endpoint)

Generate database credential for connecting (optionally scoped to an endpoint)

cred = w.postgres.generate_database_credential( endpoint="projects/my-app/branches/production/endpoints/ep-primary" ) token = cred.token # Use as password in connection string
cred = w.postgres.generate_database_credential( endpoint="projects/my-app/branches/production/endpoints/ep-primary" ) token = cred.token # Use as password in connection string

Token expires after 1 hour

Token expires after 1 hour

undefined
undefined

Connect from Notebook

从Notebook连接

python
import psycopg
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
python
import psycopg
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

Get endpoint details

Get endpoint details

endpoint = w.postgres.get_endpoint( name="projects/my-app/branches/production/endpoints/ep-primary" ) host = endpoint.status.hosts.host
endpoint = w.postgres.get_endpoint( name="projects/my-app/branches/production/endpoints/ep-primary" ) host = endpoint.status.hosts.host

Generate token (scoped to endpoint)

Generate token (scoped to endpoint)

cred = w.postgres.generate_database_credential( endpoint="projects/my-app/branches/production/endpoints/ep-primary" )
cred = w.postgres.generate_database_credential( endpoint="projects/my-app/branches/production/endpoints/ep-primary" )

Connect using psycopg3

Connect using psycopg3

conn_string = ( f"host={host} " f"dbname=databricks_postgres " f"user={w.current_user.me().user_name} " f"password={cred.token} " f"sslmode=require" ) with psycopg.connect(conn_string) as conn: with conn.cursor() as cur: cur.execute("SELECT version()") print(cur.fetchone())
undefined
conn_string = ( f"host={host} " f"dbname=databricks_postgres " f"user={w.current_user.me().user_name} " f"password={cred.token} " f"sslmode=require" ) with psycopg.connect(conn_string) as conn: with conn.cursor() as cur: cur.execute("SELECT version()") print(cur.fetchone())
undefined

Create a Branch for Development

为开发环境创建分支

python
from databricks.sdk.service.postgres import Branch, BranchSpec, Duration
python
from databricks.sdk.service.postgres import Branch, BranchSpec, Duration

Create a dev branch with 7-day expiration

Create a dev branch with 7-day expiration

branch = w.postgres.create_branch( parent="projects/my-app", branch=Branch( spec=BranchSpec( source_branch="projects/my-app/branches/production", ttl=Duration(seconds=604800) # 7 days ) ), branch_id="development" ).wait() print(f"Branch created: {branch.name}")
undefined
branch = w.postgres.create_branch( parent="projects/my-app", branch=Branch( spec=BranchSpec( source_branch="projects/my-app/branches/production", ttl=Duration(seconds=604800) # 7 days ) ), branch_id="development" ).wait() print(f"Branch created: {branch.name}")
undefined

Resize Compute (Autoscaling)

调整计算资源(自动扩缩容)

python
from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMask
python
from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMask

Update compute to autoscale between 2-8 CU

Update compute to autoscale between 2-8 CU

w.postgres.update_endpoint( name="projects/my-app/branches/production/endpoints/ep-primary", endpoint=Endpoint( name="projects/my-app/branches/production/endpoints/ep-primary", spec=EndpointSpec( autoscaling_limit_min_cu=2.0, autoscaling_limit_max_cu=8.0 ) ), update_mask=FieldMask(field_mask=[ "spec.autoscaling_limit_min_cu", "spec.autoscaling_limit_max_cu" ]) ).wait()
undefined
w.postgres.update_endpoint( name="projects/my-app/branches/production/endpoints/ep-primary", endpoint=Endpoint( name="projects/my-app/branches/production/endpoints/ep-primary", spec=EndpointSpec( autoscaling_limit_min_cu=2.0, autoscaling_limit_max_cu=8.0 ) ), update_mask=FieldMask(field_mask=[ "spec.autoscaling_limit_min_cu", "spec.autoscaling_limit_max_cu" ]) ).wait()
undefined

MCP Tools

MCP工具

The following MCP tools are available for managing Lakebase infrastructure. Use
type="autoscale"
for Lakebase Autoscaling.
以下MCP工具可用于管理Lakebase基础设施。针对Lakebase Autoscaling,使用
type="autoscale"
参数。

Database (Project) Management

数据库(项目)管理

ToolDescription
create_or_update_lakebase_database
Create or update a database. Finds by name, creates if new, updates if existing. Use
type="autoscale"
,
display_name
,
pg_version
params. A new project auto-creates a production branch, default compute, and databricks_postgres database.
get_lakebase_database
Get database details (including branches and endpoints) or list all. Pass
name
to get one, omit to list all. Use
type="autoscale"
to filter.
delete_lakebase_database
Delete a project and all its branches, computes, and data. Use
type="autoscale"
.
工具说明
create_or_update_lakebase_database
创建或更新数据库。按名称查找,不存在则创建,存在则更新。使用
type="autoscale"
display_name
pg_version
参数。新建项目会自动创建production分支、默认计算资源和databricks_postgres数据库。
get_lakebase_database
获取数据库详情(包括分支和端点)或列出所有数据库。传入
name
获取单个数据库,省略则列出全部。使用
type="autoscale"
进行筛选。
delete_lakebase_database
删除项目及其所有分支、计算资源和数据。使用
type="autoscale"

Branch Management

分支管理

ToolDescription
create_or_update_lakebase_branch
Create or update a branch with its compute endpoint. Params:
project_name
,
branch_id
,
source_branch
,
ttl_seconds
,
is_protected
, plus compute params (
autoscaling_limit_min_cu
,
autoscaling_limit_max_cu
,
scale_to_zero_seconds
).
delete_lakebase_branch
Delete a branch and its compute endpoints.
工具说明
create_or_update_lakebase_branch
创建或更新带有计算端点的分支。参数包括:
project_name
branch_id
source_branch
ttl_seconds
is_protected
,以及计算资源参数(
autoscaling_limit_min_cu
autoscaling_limit_max_cu
scale_to_zero_seconds
)。
delete_lakebase_branch
删除分支及其计算端点。

Credentials

凭证管理

ToolDescription
generate_lakebase_credential
Generate OAuth token for PostgreSQL connections (1-hour expiry). Pass
endpoint
resource name for autoscale.
工具说明
generate_lakebase_credential
生成用于PostgreSQL连接的OAuth令牌(1小时后过期)。针对Autoscale版本,传入
endpoint
资源名称。

Reference Files

参考文档

  • projects.md - Project management patterns and settings
  • branches.md - Branching workflows, protection, and expiration
  • computes.md - Compute sizing, autoscaling, and scale-to-zero
  • connection-patterns.md - Connection patterns for different use cases
  • reverse-etl.md - Synced tables from Delta Lake to Lakebase
  • projects.md - 项目管理模式与设置
  • branches.md - 分支工作流、保护与过期设置
  • computes.md - 计算资源规格、自动扩缩容与缩容至零
  • connection-patterns.md - 不同场景下的连接模式
  • reverse-etl.md - 从Delta Lake到Lakebase的同步表

CLI Quick Reference

CLI快速参考

bash
undefined
bash
undefined

Create a project

Create a project

databricks postgres create-project
--project-id my-app
--json '{"spec": {"display_name": "My App", "pg_version": "17"}}'
databricks postgres create-project
--project-id my-app
--json '{"spec": {"display_name": "My App", "pg_version": "17"}}'

List projects

List projects

databricks postgres list-projects
databricks postgres list-projects

Get project details

Get project details

databricks postgres get-project projects/my-app
databricks postgres get-project projects/my-app

Create a branch

Create a branch

databricks postgres create-branch projects/my-app development
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'
databricks postgres create-branch projects/my-app development
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'

List branches

List branches

databricks postgres list-branches projects/my-app
databricks postgres list-branches projects/my-app

Get endpoint details

Get endpoint details

databricks postgres get-endpoint projects/my-app/branches/production/endpoints/ep-primary
databricks postgres get-endpoint projects/my-app/branches/production/endpoints/ep-primary

Delete a project

Delete a project

databricks postgres delete-project projects/my-app
undefined
databricks postgres delete-project projects/my-app
undefined

Key Differences from Lakebase Provisioned

与Lakebase Provisioned的主要差异

AspectProvisionedAutoscaling
SDK module
w.database
w.postgres
Top-level resourceInstanceProject
CapacityCU_1, CU_2, CU_4, CU_8 (16 GB/CU)0.5-112 CU (2 GB/CU)
BranchingNot supportedFull branching support
Scale-to-zeroNot supportedConfigurable timeout
OperationsSynchronousLong-running operations (LRO)
Read replicasReadable secondariesDedicated read-only endpoints
维度Provisioned版本Autoscaling版本
SDK模块
w.database
w.postgres
顶级资源实例(Instance)项目(Project)
容量规格CU_1、CU_2、CU_4、CU_8(每CU 16GB内存)0.5-112 CU(每CU 2GB内存)
分支功能不支持完全支持分支功能
缩容至零不支持可配置超时时间
操作方式同步操作长时间运行操作(LRO)
只读副本可读从节点专用只读端点

Common Issues

常见问题

IssueSolution
Token expired during long queryImplement token refresh loop; tokens expire after 1 hour
Connection refused after scale-to-zeroCompute wakes automatically on connection; reactivation takes a few hundred ms; implement retry logic
DNS resolution fails on macOSUse
dig
command to resolve hostname, pass
hostaddr
to psycopg
Branch deletion blockedDelete child branches first; cannot delete branches with children
Autoscaling range too wideMax - min cannot exceed 8 CU (e.g., 8-16 CU is valid, 0.5-32 CU is not)
SSL required errorAlways use
sslmode=require
in connection string
Update mask requiredAll update operations require an
update_mask
specifying fields to modify
Connection closed after 24h idleAll connections have a 24-hour idle timeout and 3-day max lifetime; implement retry logic
问题解决方案
长查询过程中令牌过期实现令牌刷新逻辑;令牌有效期为1小时
缩容至零后连接被拒绝计算资源会在连接请求时自动唤醒;重新激活需几百毫秒;实现重试逻辑
macOS上DNS解析失败使用
dig
命令解析主机名,在psycopg中传入
hostaddr
参数
分支删除被阻止先删除子分支;无法删除包含子分支的分支
自动扩缩容范围过宽最大值与最小值的差不能超过8 CU(例如8-16 CU是合法的,0.5-32 CU不合法)
要求SSL的错误连接字符串中始终使用
sslmode=require
需要更新掩码所有更新操作都需要指定
update_mask
来定义要修改的字段
闲置24小时后连接被关闭所有连接的闲置超时为24小时,最大生命周期为3天;实现重试逻辑

Current Limitations

当前限制

These features are NOT yet supported in Lakebase Autoscaling:
  • High availability with readable secondaries (use read replicas instead)
  • Databricks Apps UI integration (Apps can connect manually via credentials)
  • Feature Store integration
  • Stateful AI agents (LangChain memory)
  • Postgres-to-Delta sync (only Delta-to-Postgres reverse ETL)
  • Custom billing tags and serverless budget policies
  • Direct migration from Lakebase Provisioned (use pg_dump/pg_restore or reverse ETL)
Lakebase Autoscaling目前暂不支持以下功能:
  • 带可读从节点的高可用(可使用只读副本替代)
  • Databricks Apps UI集成(应用可通过凭证手动连接)
  • Feature Store集成
  • 有状态AI Agent(LangChain内存)
  • Postgres到Delta的同步(仅支持Delta到Postgres的反向ETL)
  • 自定义账单标签与无服务器预算策略
  • 从Lakebase Provisioned直接迁移(可使用pg_dump/pg_restore或反向ETL)

SDK Version Requirements

SDK版本要求

  • Databricks SDK for Python: >= 0.81.0 (for
    w.postgres
    module)
  • psycopg: 3.x (supports
    hostaddr
    parameter for DNS workaround)
  • SQLAlchemy: 2.x with
    postgresql+psycopg
    driver
python
%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemy
  • Databricks SDK for Python:>= 0.81.0(用于
    w.postgres
    模块)
  • psycopg:3.x(支持
    hostaddr
    参数以解决DNS问题)
  • SQLAlchemy:2.x 搭配
    postgresql+psycopg
    驱动
python
%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemy

Notes

注意事项

  • Compute Units in Autoscaling provide ~2 GB RAM each (vs 16 GB in Provisioned).
  • Resource naming follows hierarchical paths:
    projects/{id}/branches/{id}/endpoints/{id}
    .
  • All create/update/delete operations are long-running -- use
    .wait()
    in the SDK.
  • Tokens are short-lived (1 hour) -- production apps MUST implement token refresh.
  • Postgres versions 16 and 17 are supported.
  • 计算单元(CU):Autoscaling版本中每个CU提供约2GB内存(Provisioned版本为每CU 16GB)。
  • 资源命名:遵循层级路径格式:
    projects/{id}/branches/{id}/endpoints/{id}
  • 所有创建/更新/删除操作均为长时间运行操作——在SDK中使用
    .wait()
    方法等待完成。
  • 令牌有效期较短(1小时)——生产应用必须实现令牌刷新逻辑。
  • PostgreSQL版本:支持16和17版本。