vtex-io-masterdata-strategy

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Master Data Strategy

Master Data 策略

When this skill applies

适用场景

Use this skill when deciding whether Master Data v2 is the right mechanism for custom data in a VTEX IO app.

Modeling reviews, wishlists, forms, or custom app records
Choosing entity boundaries
Planning schema indexing and lifecycle
Reviewing long-term Master Data design

Do not use this skill for:

low-level client usage details
runtime or route structure
app settings schemas
frontend UI behavior

当你需要决策VTEX IO应用中的自定义数据是否适合使用Master Data v2作为存储方案时，可参考本规范：

评论、心愿单、表单或自定义应用记录的建模
实体边界划分
规划schema索引与生命周期
审核长期Master Data设计方案

本规范不适用于以下场景：

底层客户端使用细节
运行时或路由结构设计
应用设置schema
前端UI行为

Decision rules

决策规则

Use this skill once Master Data is a serious candidate storage mechanism. For the broader choice between Master Data, VBase, VTEX core APIs, and external stores, use
```
vtex-io-data-access-patterns
```
.
Use Master Data for structured custom data that needs validation, indexing, and query support.
Use the
```
masterdata
```
builder when this app introduces a new business entity, owns the data model, and wants the schema to be created and versioned as part of the app contract.
Prefer using only the Master Data client when the entity and schema already exist and are shared or centrally managed, and this app only needs to read or write records without redefining the schema itself.
For stable schemas that the app owns but should not be recreated or updated on every app version, keep the schema definition in code and use the Master Data client in a controlled setup path to create or update the schema only when needed.
Remember that Master Data entities are account-scoped. Changing a shared entity or schema affects every app in that account that depends on it, so prefer client-only consumption when the schema is centrally managed.
Keep entity boundaries intentional and aligned with the business concept being stored.
Index fields that are actually used for filtering and search.
Plan schema lifecycle explicitly to avoid schema sprawl.
Consider data volume and retention from the start. If the dataset will grow unbounded and there is no retention or archival strategy, Master Data is likely not the right storage mechanism.
Do not treat Master Data as an unbounded dumping ground for arbitrary payloads.
Do not use Master Data as an unbounded log or event store for high-volume append-only data. Prefer dedicated logging or storage mechanisms when the main need is raw history rather than structured queries.
Do not store secrets, credentials, or global app configuration in Master Data. Use app settings or configuration apps instead.
Do not generate one entity or schema per account, workspace, or feature flag. Keep a stable entity name and distinguish tenants or environments through record fields when necessary.
Be careful when tying schema evolution directly to app versioning through the
```
masterdata
```
builder. Frequent schema changes coupled to app releases can generate excessive schema updates, indexing changes, and long-term schema sprawl.

当Master Data已经成为重点候选存储机制时可参考本规范。如果需要在Master Data、VBase、VTEX核心API、外部存储之间做更广泛的选型，请参考
```
vtex-io-data-access-patterns
```
规范。
对于需要校验、索引和查询支持的结构化自定义数据，可使用Master Data存储。
如果应用需要引入新的业务实体、拥有数据模型所有权，并且希望schema作为应用契约的一部分被创建和版本管理，请使用
```
masterdata
```
builder。
如果实体和schema已经存在，且是共享的或集中管理的，应用仅需要读写记录无需重新定义schema本身，优先仅使用Master Data客户端即可。
对于应用拥有所有权、但不需要在每次应用版本发布时都重新创建或更新的稳定schema，可将schema定义保存在代码中，通过受控的设置路径使用Master Data客户端仅在需要时创建或更新schema。
请注意Master Data实体是账户级范围的，修改共享实体或schema会影响该账户下所有依赖它的应用，因此当schema是集中管理时，优先仅通过客户端消费。
实体边界设计要清晰，与存储的业务概念对齐。
仅为实际用于过滤和搜索的字段建立索引。
明确规划schema生命周期，避免schema泛滥。
从一开始就要考虑数据量和留存策略，如果数据集会无限增长且没有留存或归档策略，Master Data大概率不是合适的存储机制。
不要将Master Data当做任意载荷的无界转储空间。
不要将Master Data用作高容量仅追加数据的无界日志或事件存储，当核心需求是存储原始历史记录而非结构化查询时，优先使用专用的日志或存储机制。
不要在Master Data中存储密钥、凭证或全局应用配置，请改用应用设置或配置类应用。
不要为每个账户、工作区或特性标记生成单独的实体或schema，保持稳定的实体名称，必要时通过记录字段区分租户或环境。
通过
```
masterdata
```
builder将schema演进与应用版本直接绑定时要谨慎，与应用发布绑定的频繁schema变更会导致过多的schema更新、索引变更，以及长期的schema泛滥问题。

Choosing between the

masterdata

builder and the Master Data client

masterdata

builder与Master Data客户端的选型

There are three main ways for a VTEX IO app to work with Master Data:

Owning the schema via the
```
masterdata
```
builder:
- The app declares entities and schemas under
```
masterdata/
```
  in the repository.
- Schema fields, validation, and indexing evolve together with the app code.
- Use this when the app is the primary owner of the data model, schema changes are relatively infrequent, and the schema should be rolled out as part of the app contract.
Consuming an existing schema via the Master Data client only:
- The app uses a Master Data client, but does not declare entities or schemas through the
```
masterdata
```
  builder.
- The app assumes a stable schema managed elsewhere and only reads or writes records that follow that contract.
- Use this when the entity is shared across multiple apps or managed centrally, and this app should not redefine or fragment the schema across environments.
Owning a stable schema definition in code and applying it through the client:
- The app keeps a stable schema definition in code instead of
```
masterdata/
```
  builder files.
- A controlled setup path checks whether the schema exists and creates or updates it only when needed.
- Use this when the app truly owns the schema, but should not couple schema rollout to every app version or every release pipeline step.

VTEX IO应用对接Master Data主要有三种方式：

通过
```
masterdata
```
builder持有schema所有权：
- 应用在代码仓库的
```
masterdata/
```
  目录下声明实体和schema。
- Schema字段、校验规则、索引与应用代码一同演进。
- 当应用是数据模型的主要所有者、schema变更相对不频繁，且schema需要作为应用契约的一部分发布时，使用该方式。
仅通过Master Data客户端消费现有schema：
- 应用使用Master Data客户端，但不通过
```
masterdata
```
  builder声明实体或schema。
- 应用假设schema是由其他方管理的稳定版本，仅读写符合该契约的记录。
- 当实体是跨多个应用共享的或集中管理的，且应用不应该跨环境重新定义或拆分schema时，使用该方式。
在代码中持有稳定schema定义并通过客户端应用：
- 应用将稳定的schema定义保存在代码中，而非
```
masterdata/
```
  builder文件内。
- 受控的设置路径会检查schema是否存在，仅在需要时创建或更新。
- 当应用确实拥有schema所有权，但不应该将schema发布与每个应用版本或每个发布流水线步骤绑定的时候，使用该方式。

Hard constraints

硬性约束

Constraint: Master Data entities must have explicit schema boundaries

约束：Master Data实体必须有明确的schema边界

Each entity MUST represent a clear business concept and have a schema that matches its intended usage.

Why this matters

Weak entity boundaries create confusing queries, poor indexing choices, and schema drift.

Detection

If one entity mixes unrelated concepts or stores many unrelated record shapes, STOP and split the design.

Correct

json

{
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": { "type": "string" },
    "userId": { "type": "string" },
    "rating": { "type": "number" },
    "approved": { "type": "boolean" }
  },
  "required": ["productId", "userId", "rating"],
  "v-indexed": ["productId", "userId", "approved"]
}

Wrong

json

{
  "title": "everything-schema",
  "type": "object"
}

每个实体必须代表清晰的业务概念，并且schema与其预期用途匹配。

重要性

模糊的实体边界会导致查询混乱、索引选择不当以及schema漂移。

检测方式

如果一个实体混合了不相关的概念，或者存储了大量不相关的记录结构，立刻停止并拆分设计。

正确示例

json

{
  "title": "review-schema-v1",
  "type": "object",
  "properties": {
    "productId": { "type": "string" },
    "userId": { "type": "string" },
    "rating": { "type": "number" },
    "approved": { "type": "boolean" }
  },
  "required": ["productId", "userId", "rating"],
  "v-indexed": ["productId", "userId", "approved"]
}

错误示例

json

{
  "title": "everything-schema",
  "type": "object"
}

Constraint: Indexed fields must match real query behavior

约束：索引字段必须与实际查询行为匹配

Fields used in filters or lookups MUST be indexed intentionally.

Why this matters

Missing indexes lead to poor query behavior and unnecessary operational risk.

Detection

If queries depend on fields that are not represented in indexing strategy, STOP and align schema and access patterns.

Correct

json

{
  "v-indexed": ["productId", "approved"]
}

Wrong

json

{
  "v-indexed": []
}

过滤或查询中用到的字段必须有明确的索引。

重要性

缺失索引会导致查询性能差，带来不必要的运维风险。

检测方式

如果查询依赖的字段没有纳入索引策略，立刻停止并对齐schema与访问模式。

正确示例

json

{
  "v-indexed": ["productId", "approved"]
}

错误示例

json

{
  "v-indexed": []
}

Constraint: Schema lifecycle must be managed explicitly

约束：必须显式管理schema生命周期

Master Data schema evolution MUST be planned with cleanup and versioning in mind.

Why this matters

Unmanaged schema growth creates long-term operational pain and can run into platform limits.

Detection

If schema versions are added with no lifecycle or cleanup plan, STOP and define that plan.

Correct

text

review-schema-v1 -> review-schema-v2 with cleanup plan

Wrong

text

review-schema-v1, v2, v3, v4, v5 with no cleanup strategy

Remember that changing indexed fields or field types can affect how existing documents are indexed and queried. When schema evolution is coupled to frequent app version changes, this risk increases.

Master Data schema演进必须提前规划清理与版本管理机制。

重要性

不受管控的schema增长会带来长期运维负担，还可能触发平台限制。

检测方式

如果新增schema版本没有配套生命周期或清理计划，立刻停止并制定相关计划。

正确示例

text

review-schema-v1 -> review-schema-v2 配套清理计划

错误示例

text

review-schema-v1, v2, v3, v4, v5 无清理策略

请注意修改索引字段或字段类型会影响已有文档的索引和查询方式，如果schema演进与频繁的应用版本变更绑定，该风险会进一步提升。

Constraint: Entity and schema names must remain stable across environments

约束：实体与schema名称必须跨环境保持稳定

Entity names and schema identifiers MUST remain stable across accounts, workspaces, and environments. Do not encode account names, workspaces, or rollout flags into the entity or schema name itself.

Why this matters

Per-account or per-workspace schema naming leads to schema sprawl, harder lifecycle management, and operational limits that are difficult to clean up later.

Detection

If the design proposes one entity or schema per workspace, per account, or per environment, STOP and redesign around stable names with scoped fields or records instead.

Correct

text

review-schema-v1
RV

Wrong

text

review-schema-brazil-master
RV_US_MASTER

Using one clearly managed schema for development and one for production can be acceptable when there is a deliberate plan to keep them synchronized. Avoid generating schema names per workspace, per account, or per feature flag.

实体名称与schema标识符必须跨账户、工作区和环境保持稳定，不要将账户名称、工作区或发布标记编码到实体或schema名称本身。

重要性

按账户或工作区命名schema会导致schema泛滥、生命周期管理难度提升，还会触发后续难以清理的运维限制。

检测方式

如果设计方案提出为每个工作区、每个账户或每个环境单独创建实体或schema，立刻停止，改为使用带范围字段或记录的稳定名称重新设计。

正确示例

text

review-schema-v1
RV

错误示例

text

review-schema-brazil-master
RV_US_MASTER

如果有明确的同步计划，为开发和生产分别使用一套清晰管理的schema是可接受的，避免为每个工作区、每个账户或每个特性标记生成独立schema名称。

Preferred pattern

Common failure modes

常见故障模式

Creating entities that are too broad.
Querying on fields that are not indexed.
Accumulating schema versions with no lifecycle plan.
Using Master Data as a high-volume log or event sink without retention or archival strategy.
Storing configuration, secrets, or cross-app shared settings in Master Data instead of using configuration-specific mechanisms.
Generating per-account or per-workspace entities such as
```
RV_storeA_master
```
instead of using a stable entity like
```
RV
```
with scoped record fields.
Relying on the
```
masterdata
```
builder for frequent schema changes tied to every app version, causing excessive schema updates and indexing side effects over time.

创建的实体范围过宽。
在未建立索引的字段上执行查询。
累积大量无生命周期计划的schema版本。
将Master Data用作无留存或归档策略的高容量日志或事件接收器。
在Master Data中存储配置、密钥或跨应用共享设置，而非使用专用配置机制。
生成按账户或工作区划分的实体（例如
```
RV_storeA_master
```
），而非使用带范围记录字段的稳定实体（例如
```
RV
```
）。
依赖
```
masterdata
```
builder处理与每个应用版本绑定的频繁schema变更，长期来看会导致过多的schema更新和索引副作用。

Review checklist

审核检查清单

Is Master Data the right storage mechanism for this use case?
Should this app own the schema through the
```
masterdata
```
builder, or just consume an existing stable schema through the client?
Would a stable schema in code plus a controlled setup path be safer than coupling schema rollout to every app version?
Does each entity represent a clear business concept?
Are entity and schema names stable across workspaces and accounts?
Are filtered fields indexed intentionally?
Is there a schema lifecycle plan?
If different schemas are used for development and production, is there a clear plan to keep them synchronized without creating schema sprawl?

该场景下Master Data是合适的存储机制吗？
该应用应该通过
```
masterdata
```
builder持有schema所有权，还是仅通过客户端消费现有稳定schema？
代码中保存稳定schema加受控设置路径的方案，是否比schema发布与每个应用版本绑定的方案更安全？
每个实体都代表清晰的业务概念吗？
实体和schema名称跨工作区和账户保持稳定吗？
过滤用到的字段都有明确建立索引吗？
有对应的schema生命周期计划吗？
如果开发和生产使用不同的schema，是否有明确的同步计划避免schema泛滥？

vtex-io-masterdata-strategy

Original

Translation

Master Data Strategy

Master Data 策略

When this skill applies

适用场景

Decision rules

决策规则

Choosing between the
`masterdata`
builder and the Master Data client

`masterdata`
builder与Master Data客户端的选型

Hard constraints

硬性约束

Constraint: Master Data entities must have explicit schema boundaries

约束：Master Data实体必须有明确的schema边界

Constraint: Indexed fields must match real query behavior

约束：索引字段必须与实际查询行为匹配

Constraint: Schema lifecycle must be managed explicitly

约束：必须显式管理schema生命周期

Constraint: Entity and schema names must remain stable across environments

约束：实体与schema名称必须跨环境保持稳定

Preferred pattern

推荐模式

Common failure modes

常见故障模式

Review checklist

审核检查清单

Related skills

相关规范

Reference

参考资料