lakebase-autoscale
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLakebase Autoscaling
Lakebase Autoscaling
Patterns and best practices for using Lakebase Autoscaling, the next-generation managed PostgreSQL on Databricks with autoscaling compute, branching, scale-to-zero, and instant restore.
使用Lakebase Autoscaling的模式与最佳实践——这是Databricks上针对OLTP工作负载的下一代托管PostgreSQL服务,具备自动扩缩容计算资源、类Git分支、缩容至零以及即时时间点恢复功能。
When to Use
适用场景
Use this skill when:
- Building applications that need a PostgreSQL database with autoscaling compute
- Working with database branching for dev/test/staging workflows
- Adding persistent state to applications with scale-to-zero cost savings
- Implementing reverse ETL from Delta Lake to an operational database via synced tables
- Managing Lakebase Autoscaling projects, branches, computes, or credentials
在以下场景中使用该技能:
- 构建需要具备自动扩缩容计算资源的PostgreSQL数据库的应用
- 为开发/测试/预发布工作流使用数据库分支功能
- 为应用添加持久化状态并通过缩容至零节省成本
- 通过同步表实现从Delta Lake到业务数据库的反向ETL
- 管理Lakebase Autoscaling的项目、分支、计算资源或凭证
Overview
概述
Lakebase Autoscaling is Databricks' next-generation managed PostgreSQL service for OLTP workloads. It provides autoscaling compute, Git-like branching, scale-to-zero, and instant point-in-time restore.
| Feature | Description |
|---|---|
| Autoscaling Compute | 0.5-112 CU with 2 GB RAM per CU; scales dynamically based on load |
| Scale-to-Zero | Compute suspends after configurable inactivity timeout |
| Branching | Create isolated database environments (like Git branches) for dev/test |
| Instant Restore | Point-in-time restore from any moment within the configured window (up to 35 days) |
| OAuth Authentication | Token-based auth via Databricks SDK (1-hour expiry) |
| Reverse ETL | Sync data from Delta tables to PostgreSQL via synced tables |
Available Regions (AWS): us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2
Available Regions (Azure Beta): eastus2, westeurope, westus
Lakebase Autoscaling是Databricks针对OLTP工作负载推出的下一代托管PostgreSQL服务。它提供自动扩缩容计算资源、类Git分支、缩容至零以及即时时间点恢复功能。
| 功能 | 说明 |
|---|---|
| 自动扩缩容计算资源 | 0.5-112 CU,每CU配备2GB内存;根据负载动态扩缩容 |
| 缩容至零 | 在配置的闲置超时后,计算资源会自动暂停 |
| 分支功能 | 创建隔离的数据库环境(类似Git分支),用于开发/测试 |
| 即时恢复 | 可在配置的时间窗口内(最长35天)从任意时间点进行恢复 |
| OAuth认证 | 通过Databricks SDK实现基于令牌的认证(令牌1小时后过期) |
| 反向ETL | 通过同步表将Delta表的数据同步至PostgreSQL |
可用区域(AWS):us-east-1, us-east-2, eu-central-1, eu-west-1, eu-west-2, ap-south-1, ap-southeast-1, ap-southeast-2
可用区域(Azure Beta):eastus2, westeurope, westus
Project Hierarchy
项目层级
Understanding the hierarchy is essential for working with Lakebase Autoscaling:
Project (top-level container)
└── Branch(es) (isolated database environments)
├── Compute (primary R/W endpoint)
├── Read Replica(s) (optional, read-only)
├── Role(s) (Postgres roles)
└── Database(s) (Postgres databases)
└── Schema(s)| Object | Description |
|---|---|
| Project | Top-level container. Created via |
| Branch | Isolated database environment with copy-on-write storage. Default branch is |
| Compute | Postgres server powering a branch. Configurable CU sizing and autoscaling. |
| Database | Standard Postgres database within a branch. Default is |
理解层级结构是使用Lakebase Autoscaling的关键:
Project (top-level container)
└── Branch(es) (isolated database environments)
├── Compute (primary R/W endpoint)
├── Read Replica(s) (optional, read-only)
├── Role(s) (Postgres roles)
└── Database(s) (Postgres databases)
└── Schema(s)| 对象 | 说明 |
|---|---|
| 项目(Project) | 顶级容器。通过 |
| 分支(Branch) | 具备写时复制存储的隔离数据库环境。默认分支是 |
| 计算资源(Compute) | 为分支提供支撑的Postgres服务器。可配置CU规格和自动扩缩容。 |
| 数据库(Database) | 分支内的标准Postgres数据库。默认数据库为 |
Quick Start
快速入门
Create a project and connect:
python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Project, ProjectSpec
w = WorkspaceClient()创建项目并连接:
python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Project, ProjectSpec
w = WorkspaceClient()Create a project (long-running operation)
Create a project (long-running operation)
operation = w.postgres.create_project(
project=Project(
spec=ProjectSpec(
display_name="My Application",
pg_version="17"
)
),
project_id="my-app"
)
result = operation.wait()
print(f"Created project: {result.name}")
undefinedoperation = w.postgres.create_project(
project=Project(
spec=ProjectSpec(
display_name="My Application",
pg_version="17"
)
),
project_id="my-app"
)
result = operation.wait()
print(f"Created project: {result.name}")
undefinedCommon Patterns
常见模式
Generate OAuth Token
生成OAuth令牌
python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()python
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()Generate database credential for connecting (optionally scoped to an endpoint)
Generate database credential for connecting (optionally scoped to an endpoint)
cred = w.postgres.generate_database_credential(
endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
token = cred.token # Use as password in connection string
cred = w.postgres.generate_database_credential(
endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
token = cred.token # Use as password in connection string
Token expires after 1 hour
Token expires after 1 hour
undefinedundefinedConnect from Notebook
从Notebook连接
python
import psycopg
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()python
import psycopg
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()Get endpoint details
Get endpoint details
endpoint = w.postgres.get_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary"
)
host = endpoint.status.hosts.host
endpoint = w.postgres.get_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary"
)
host = endpoint.status.hosts.host
Generate token (scoped to endpoint)
Generate token (scoped to endpoint)
cred = w.postgres.generate_database_credential(
endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
cred = w.postgres.generate_database_credential(
endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
Connect using psycopg3
Connect using psycopg3
conn_string = (
f"host={host} "
f"dbname=databricks_postgres "
f"user={w.current_user.me().user_name} "
f"password={cred.token} "
f"sslmode=require"
)
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
cur.execute("SELECT version()")
print(cur.fetchone())
undefinedconn_string = (
f"host={host} "
f"dbname=databricks_postgres "
f"user={w.current_user.me().user_name} "
f"password={cred.token} "
f"sslmode=require"
)
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
cur.execute("SELECT version()")
print(cur.fetchone())
undefinedCreate a Branch for Development
为开发环境创建分支
python
from databricks.sdk.service.postgres import Branch, BranchSpec, Durationpython
from databricks.sdk.service.postgres import Branch, BranchSpec, DurationCreate a dev branch with 7-day expiration
Create a dev branch with 7-day expiration
branch = w.postgres.create_branch(
parent="projects/my-app",
branch=Branch(
spec=BranchSpec(
source_branch="projects/my-app/branches/production",
ttl=Duration(seconds=604800) # 7 days
)
),
branch_id="development"
).wait()
print(f"Branch created: {branch.name}")
undefinedbranch = w.postgres.create_branch(
parent="projects/my-app",
branch=Branch(
spec=BranchSpec(
source_branch="projects/my-app/branches/production",
ttl=Duration(seconds=604800) # 7 days
)
),
branch_id="development"
).wait()
print(f"Branch created: {branch.name}")
undefinedResize Compute (Autoscaling)
调整计算资源(自动扩缩容)
python
from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMaskpython
from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMaskUpdate compute to autoscale between 2-8 CU
Update compute to autoscale between 2-8 CU
w.postgres.update_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary",
endpoint=Endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary",
spec=EndpointSpec(
autoscaling_limit_min_cu=2.0,
autoscaling_limit_max_cu=8.0
)
),
update_mask=FieldMask(field_mask=[
"spec.autoscaling_limit_min_cu",
"spec.autoscaling_limit_max_cu"
])
).wait()
undefinedw.postgres.update_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary",
endpoint=Endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary",
spec=EndpointSpec(
autoscaling_limit_min_cu=2.0,
autoscaling_limit_max_cu=8.0
)
),
update_mask=FieldMask(field_mask=[
"spec.autoscaling_limit_min_cu",
"spec.autoscaling_limit_max_cu"
])
).wait()
undefinedMCP Tools
MCP工具
The following MCP tools are available for managing Lakebase infrastructure. Use for Lakebase Autoscaling.
type="autoscale"以下MCP工具可用于管理Lakebase基础设施。针对Lakebase Autoscaling,使用参数。
type="autoscale"Database (Project) Management
数据库(项目)管理
| Tool | Description |
|---|---|
| Create or update a database. Finds by name, creates if new, updates if existing. Use |
| Get database details (including branches and endpoints) or list all. Pass |
| Delete a project and all its branches, computes, and data. Use |
| 工具 | 说明 |
|---|---|
| 创建或更新数据库。按名称查找,不存在则创建,存在则更新。使用 |
| 获取数据库详情(包括分支和端点)或列出所有数据库。传入 |
| 删除项目及其所有分支、计算资源和数据。使用 |
Branch Management
分支管理
| Tool | Description |
|---|---|
| Create or update a branch with its compute endpoint. Params: |
| Delete a branch and its compute endpoints. |
| 工具 | 说明 |
|---|---|
| 创建或更新带有计算端点的分支。参数包括: |
| 删除分支及其计算端点。 |
Credentials
凭证管理
| Tool | Description |
|---|---|
| Generate OAuth token for PostgreSQL connections (1-hour expiry). Pass |
| 工具 | 说明 |
|---|---|
| 生成用于PostgreSQL连接的OAuth令牌(1小时后过期)。针对Autoscale版本,传入 |
Reference Files
参考文档
- projects.md - Project management patterns and settings
- branches.md - Branching workflows, protection, and expiration
- computes.md - Compute sizing, autoscaling, and scale-to-zero
- connection-patterns.md - Connection patterns for different use cases
- reverse-etl.md - Synced tables from Delta Lake to Lakebase
- projects.md - 项目管理模式与设置
- branches.md - 分支工作流、保护与过期设置
- computes.md - 计算资源规格、自动扩缩容与缩容至零
- connection-patterns.md - 不同场景下的连接模式
- reverse-etl.md - 从Delta Lake到Lakebase的同步表
CLI Quick Reference
CLI快速参考
bash
undefinedbash
undefinedCreate a project
Create a project
databricks postgres create-project
--project-id my-app
--json '{"spec": {"display_name": "My App", "pg_version": "17"}}'
--project-id my-app
--json '{"spec": {"display_name": "My App", "pg_version": "17"}}'
databricks postgres create-project
--project-id my-app
--json '{"spec": {"display_name": "My App", "pg_version": "17"}}'
--project-id my-app
--json '{"spec": {"display_name": "My App", "pg_version": "17"}}'
List projects
List projects
databricks postgres list-projects
databricks postgres list-projects
Get project details
Get project details
databricks postgres get-project projects/my-app
databricks postgres get-project projects/my-app
Create a branch
Create a branch
databricks postgres create-branch projects/my-app development
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'
databricks postgres create-branch projects/my-app development
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'
--json '{"spec": {"source_branch": "projects/my-app/branches/production", "no_expiry": true}}'
List branches
List branches
databricks postgres list-branches projects/my-app
databricks postgres list-branches projects/my-app
Get endpoint details
Get endpoint details
databricks postgres get-endpoint projects/my-app/branches/production/endpoints/ep-primary
databricks postgres get-endpoint projects/my-app/branches/production/endpoints/ep-primary
Delete a project
Delete a project
databricks postgres delete-project projects/my-app
undefineddatabricks postgres delete-project projects/my-app
undefinedKey Differences from Lakebase Provisioned
与Lakebase Provisioned的主要差异
| Aspect | Provisioned | Autoscaling |
|---|---|---|
| SDK module | | |
| Top-level resource | Instance | Project |
| Capacity | CU_1, CU_2, CU_4, CU_8 (16 GB/CU) | 0.5-112 CU (2 GB/CU) |
| Branching | Not supported | Full branching support |
| Scale-to-zero | Not supported | Configurable timeout |
| Operations | Synchronous | Long-running operations (LRO) |
| Read replicas | Readable secondaries | Dedicated read-only endpoints |
| 维度 | Provisioned版本 | Autoscaling版本 |
|---|---|---|
| SDK模块 | | |
| 顶级资源 | 实例(Instance) | 项目(Project) |
| 容量规格 | CU_1、CU_2、CU_4、CU_8(每CU 16GB内存) | 0.5-112 CU(每CU 2GB内存) |
| 分支功能 | 不支持 | 完全支持分支功能 |
| 缩容至零 | 不支持 | 可配置超时时间 |
| 操作方式 | 同步操作 | 长时间运行操作(LRO) |
| 只读副本 | 可读从节点 | 专用只读端点 |
Common Issues
常见问题
| Issue | Solution |
|---|---|
| Token expired during long query | Implement token refresh loop; tokens expire after 1 hour |
| Connection refused after scale-to-zero | Compute wakes automatically on connection; reactivation takes a few hundred ms; implement retry logic |
| DNS resolution fails on macOS | Use |
| Branch deletion blocked | Delete child branches first; cannot delete branches with children |
| Autoscaling range too wide | Max - min cannot exceed 8 CU (e.g., 8-16 CU is valid, 0.5-32 CU is not) |
| SSL required error | Always use |
| Update mask required | All update operations require an |
| Connection closed after 24h idle | All connections have a 24-hour idle timeout and 3-day max lifetime; implement retry logic |
| 问题 | 解决方案 |
|---|---|
| 长查询过程中令牌过期 | 实现令牌刷新逻辑;令牌有效期为1小时 |
| 缩容至零后连接被拒绝 | 计算资源会在连接请求时自动唤醒;重新激活需几百毫秒;实现重试逻辑 |
| macOS上DNS解析失败 | 使用 |
| 分支删除被阻止 | 先删除子分支;无法删除包含子分支的分支 |
| 自动扩缩容范围过宽 | 最大值与最小值的差不能超过8 CU(例如8-16 CU是合法的,0.5-32 CU不合法) |
| 要求SSL的错误 | 连接字符串中始终使用 |
| 需要更新掩码 | 所有更新操作都需要指定 |
| 闲置24小时后连接被关闭 | 所有连接的闲置超时为24小时,最大生命周期为3天;实现重试逻辑 |
Current Limitations
当前限制
These features are NOT yet supported in Lakebase Autoscaling:
- High availability with readable secondaries (use read replicas instead)
- Databricks Apps UI integration (Apps can connect manually via credentials)
- Feature Store integration
- Stateful AI agents (LangChain memory)
- Postgres-to-Delta sync (only Delta-to-Postgres reverse ETL)
- Custom billing tags and serverless budget policies
- Direct migration from Lakebase Provisioned (use pg_dump/pg_restore or reverse ETL)
Lakebase Autoscaling目前暂不支持以下功能:
- 带可读从节点的高可用(可使用只读副本替代)
- Databricks Apps UI集成(应用可通过凭证手动连接)
- Feature Store集成
- 有状态AI Agent(LangChain内存)
- Postgres到Delta的同步(仅支持Delta到Postgres的反向ETL)
- 自定义账单标签与无服务器预算策略
- 从Lakebase Provisioned直接迁移(可使用pg_dump/pg_restore或反向ETL)
SDK Version Requirements
SDK版本要求
- Databricks SDK for Python: >= 0.81.0 (for module)
w.postgres - psycopg: 3.x (supports parameter for DNS workaround)
hostaddr - SQLAlchemy: 2.x with driver
postgresql+psycopg
python
%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemy- Databricks SDK for Python:>= 0.81.0(用于模块)
w.postgres - psycopg:3.x(支持参数以解决DNS问题)
hostaddr - SQLAlchemy:2.x 搭配驱动
postgresql+psycopg
python
%pip install -U "databricks-sdk>=0.81.0" "psycopg[binary]>=3.0" sqlalchemyNotes
注意事项
- Compute Units in Autoscaling provide ~2 GB RAM each (vs 16 GB in Provisioned).
- Resource naming follows hierarchical paths: .
projects/{id}/branches/{id}/endpoints/{id} - All create/update/delete operations are long-running -- use in the SDK.
.wait() - Tokens are short-lived (1 hour) -- production apps MUST implement token refresh.
- Postgres versions 16 and 17 are supported.
- 计算单元(CU):Autoscaling版本中每个CU提供约2GB内存(Provisioned版本为每CU 16GB)。
- 资源命名:遵循层级路径格式:。
projects/{id}/branches/{id}/endpoints/{id} - 所有创建/更新/删除操作均为长时间运行操作——在SDK中使用方法等待完成。
.wait() - 令牌有效期较短(1小时)——生产应用必须实现令牌刷新逻辑。
- PostgreSQL版本:支持16和17版本。