Master Data storage strategy
When this skill applies
Use this skill before creating any new Master Data entity or when auditing existing usage. It helps you answer:
- Is Master Data the right storage for this data, or would Catalog, OMS, VBase, or an external database serve better?
- How should I design the JSON Schema for performance and security?
- Which fields should I index (), and which should I not?
- Should I enable or disable caching ()?
- Do I need triggers (), or is an event-driven IO approach better?
- How do I plan for capacity and lifecycle of schemas and documents?
Do not use this skill for:
- VTEX IO app integration patterns (MasterDataClient, builder, CRUD in code) — use
- Performance patterns for IO services (LRU, VBase caching layers) — use
vtex-io-application-performance
Decision rules
When to use Master Data
Master Data is a good fit when all of the following are true:
- Document-oriented access — Your data is naturally key-value or document-shaped (JSON documents with variable schemas). You query by indexed fields and retrieve full or partial documents.
- Platform-integrated — You benefit from VTEX-native features: for automated workflows, for per-field public access, for search/filter, and the builder for schema-as-code.
- Moderate volume — Your entity will hold thousands to low millions of documents. MD handles this well with proper indexing.
- Not on the purchase critical path — MD is not optimized for sub-10ms latency. Synchronous MD reads in checkout/cart/payment flows risk conversion if MD is slow.
- No better native fit — The data doesn't belong in Catalog (product/SKU attributes), OMS (order data), CL/AD (customer profiles/addresses), or VBase (app-specific cache/state).
When NOT to use Master Data
| Data type | Better storage | Why |
|---|
| Product attributes, specifications | Catalog (specifications, unstructured specs) | Native indexing, search integration, catalog APIs |
| Order data, order history | OMS (via OMS APIs + BFF cache) | Single source of truth; duplicating to MD creates drift |
| Customer profiles, addresses | CL/AD native entities | Platform-managed, already indexed and cached |
| App-specific cache or temp state | VBase | Designed for per-app ephemeral storage, no schema overhead |
| Application logs, debug traces | | Structured logging infrastructure, not a database |
| High-throughput time-series data | External database (SQL, NoSQL, time-series DB) | MD is not designed for millions of writes/day |
| Relational data with joins | External SQL database | MD has no join support; denormalize or use a relational DB |
| Data requiring strong consistency | External database | MD is eventually consistent for indexed fields |
Schema design principles
- One entity per concept — Don't mix unrelated data in a single entity. Each entity should represent a clear business concept (e.g. , , ).
- Index what you query — Only fields in can be used in clauses. But don't over-index: each indexed field increases write latency and storage because the index is updated on every document change.
- Minimal — Return only the fields most consumers need by default. Large default payloads waste bandwidth.
- matches the workload — Leave (default) for read-heavy entities. Set to for entities with high write frequency where consumers need immediate consistency after writes.
- is explicit — Set unless unauthenticated list access is intentional. Use , , only for fields that must be accessible without authentication.
VTEX schema extensions ( fields) — reference
Master Data v2 extends standard JSON Schema with
properties that control indexing, caching, security, defaults, triggers, and schema inheritance. These are
VTEX-specific; standard JSON Schema validators ignore them.
Array of field names that Master Data will create secondary indexes for.
- Only indexed fields can appear in clauses for and . Queries on non-indexed fields trigger full document scans that time out on large datasets.
- Each index is updated on every document write. Over-indexing increases write latency and storage cost proportionally.
- When to index: fields used in filters, sort expressions, or . When not to index: large text fields (, ), fields never queried, or fields only read by document ID (indexing adds no benefit for ).
json
{ "v-indexed": ["email", "status", "createdAt"] }
Boolean (default
). Controls whether Master Data caches GET responses for individual documents.
- (default) — Read-heavy entities benefit from caching. Most entities should leave this as default.
- — Use for entities with high write frequency where consumers need fresh reads immediately after writes (e.g. real-time counters, configuration flags, session-like state).
Array of field names returned when the caller does
not specify a
parameter in the API request.
- Keep this minimal — only the fields most consumers need by default.
- Reduces payload size for common queries.
json
{ "v-default-fields": ["email", "status", "score", "createdAt"] }
Object controlling unauthenticated (public) access to fields. By default, all fields require authentication.
| Property | Type | Description |
|---|
| | If , unauthenticated users can list all documents. Default ; keep it off unless intentional. |
| | Fields readable without authentication |
| | Fields writable without authentication |
| | Fields usable in clauses without authentication (must also be in ) |
json
{
"v-security": {
"allowGetAll": false,
"publicRead": ["status", "displayName", "rating"],
"publicWrite": [],
"publicFilter": ["status"]
}
}
Never include PII (email, phone, addresses), internal IDs, or business-sensitive data in
or
.
Array of trigger objects that define automated actions executed when documents are created or updated and meet specified conditions.
| Property | Type | Description |
|---|
| | Unique trigger name |
| | Enable/disable the trigger |
| | -style filter (e.g. , "status=pending AND priority>3"
) |
| | , (webhook), or |
| | Email provider name (for email type) |
| | Webhook URL (for http type) |
| | HTTP method for webhook (for http type) |
| | Retry count on failure |
| | Delay between retries (e.g. ) |
json
{
"v-triggers": [
{
"name": "notify-on-creation",
"active": true,
"condition": "status=new",
"action": {
"type": "email",
"provider": "default",
"subject": "New record: {{title}}",
"to": ["admin@mystore.com"],
"body": "Record {{id}} created by {{author}}"
},
"retry": { "times": 3, "delay": { "addMinutes": 5 } }
},
{
"name": "webhook-on-approval",
"active": true,
"condition": "approved=true",
"action": {
"type": "http",
"uri": "https://my-integration.example.com/webhook",
"method": "POST",
"headers": { "X-Custom-Header": "value" }
},
"retry": { "times": 2, "delay": { "addMinutes": 10 } }
}
]
}
URL pointing to another schema in the same entity for schema inheritance. The current schema inherits properties and constraints from the target.
json
{
"v-canonicalto": "https://{host}/api/dataentities/{entity}/schemas/{base-schema}"
}
Standard JSON Schema property, but worth noting: set to
to
reject fields not declared in
. By default Master Data preserves extra fields without validation.
Hard constraints
Constraint: Index only fields used in where clauses or sort expressions
Every field in
creates a secondary index that is updated on
every document write. Indexing fields that are never queried wastes write throughput and storage.
Why this matters — Over-indexing a high-write entity (e.g. indexing 15 fields when only 3 are queried) can double or triple write latency. On entities with millions of documents, unnecessary indexes also increase storage costs.
Detection — Compare
fields with actual
clauses in the codebase. Any indexed field not referenced in a
or sort is likely unnecessary.
Correct — Index only the fields you filter or sort on.
json
{
"properties": {
"email": { "type": "string" },
"status": { "type": "string" },
"score": { "type": "integer" },
"notes": { "type": "string" },
"createdAt": { "type": "string", "format": "date-time" }
},
"v-indexed": ["email", "status", "createdAt"]
}
Wrong — Indexing every field "just in case."
json
{
"properties": {
"email": { "type": "string" },
"status": { "type": "string" },
"score": { "type": "integer" },
"notes": { "type": "string" },
"createdAt": { "type": "string", "format": "date-time" }
},
"v-indexed": ["email", "status", "score", "notes", "createdAt"]
}
Constraint: Do not expose sensitive fields via v-security publicRead
The
array makes fields accessible
without any authentication. Never include PII (email, phone, addresses), internal IDs, or business-sensitive data in this list.
Why this matters — Public fields are accessible to anyone with the entity name and a document ID or search query. Exposing PII violates data protection regulations and creates security vulnerabilities.
Detection — Check
and
for fields containing user data, internal references, or anything that should require authentication.
Correct — Expose only non-sensitive, display-oriented fields.
json
{
"v-security": {
"allowGetAll": false,
"publicRead": ["status", "displayName", "rating"],
"publicWrite": [],
"publicFilter": ["status"]
}
}
Wrong — Exposing PII and internal fields publicly.
json
{
"v-security": {
"allowGetAll": true,
"publicRead": ["email", "phone", "cpf", "internalScore", "organizationId"],
"publicWrite": ["email"],
"publicFilter": ["email", "phone"]
}
}
Constraint: Respect the 60-schema-per-entity limit
Master Data v2 entities have a hard limit of
60 schemas. The
builder creates a
new schema per app version linked or installed. Once the limit is reached, new versions fail to deploy.
Why this matters — During active development with frequent
cycles, schemas accumulate quickly. Hitting the limit blocks deployment until old schemas are manually deleted.
Detection — Apps with many link/publish cycles. Check schema count via
GET /api/dataentities/{entity}/schemas
.
Correct — Periodically clean up unused schemas. Automate cleanup in CI/CD.
bash
# List schemas to identify stale ones
curl "https://{account}.vtexcommercestable.com.br/api/dataentities/{entity}/schemas" \
-H "X-VTEX-API-AppKey: {key}" -H "X-VTEX-API-AppToken: {token}"
# Delete unused schemas
curl -X DELETE "https://{account}.vtexcommercestable.com.br/api/dataentities/{entity}/schemas/{old-schema}" \
-H "X-VTEX-API-AppKey: {key}" -H "X-VTEX-API-AppToken: {token}"
Wrong — Never cleaning up schemas during development until the limit blocks deployment.
Preferred pattern
Complete schema example with all VTEX extensions
json
{
"$schema": "http://json-schema.org/schema#",
"title": "product-review-v1",
"type": "object",
"properties": {
"productId": { "type": "string" },
"author": { "type": "string" },
"email": { "type": "string", "format": "email" },
"rating": { "type": "integer", "minimum": 1, "maximum": 5 },
"title": { "type": "string", "maxLength": 200 },
"text": { "type": "string", "maxLength": 5000 },
"approved": { "type": "boolean" },
"createdAt": { "type": "string", "format": "date-time" }
},
"required": ["productId", "rating", "title", "text"],
"v-indexed": ["productId", "approved", "rating", "createdAt"],
"v-default-fields": [
"productId",
"author",
"rating",
"title",
"approved",
"createdAt"
],
"v-cache": true,
"v-security": {
"allowGetAll": false,
"publicRead": [
"productId",
"author",
"rating",
"title",
"text",
"approved"
],
"publicWrite": [],
"publicFilter": ["productId", "approved", "rating"]
},
"v-triggers": [
{
"name": "notify-moderator",
"active": true,
"condition": "approved=false",
"action": {
"type": "email",
"provider": "default",
"subject": "New review pending moderation",
"to": ["moderator@mystore.com"],
"body": "Review for product {{productId}} by {{author}}: {{title}}"
},
"retry": {
"times": 3,
"delay": { "addMinutes": 5 }
}
}
]
}
Triggers: when to use and when not to
Use triggers when:
- You need email notifications on document changes (e.g. moderation alerts)
- You need to call an external webhook when a document meets a condition
- The action is simple, fire-and-forget, and doesn't need complex error handling
Do NOT use triggers when:
- You need complex orchestration, retries with backoff, or error recovery — use IO events instead
- You need sub-second response to changes — triggers have built-in delay
- The action modifies other MD entities in a chain — risk of cascading trigger loops
- You need conditional logic more complex than a -style filter
Document counting without full fetch
Use the
header to get document counts efficiently:
bash
# Count documents without fetching them
curl "https://{account}.vtexcommercestable.com.br/api/dataentities/{entity}/search?_fields=id" \
-H "REST-Range: resources=0-0" \
-H "X-VTEX-API-AppKey: {key}" -H "X-VTEX-API-AppToken: {token}"
# Response header: REST-Content-Range: resources 0-0/12345
# The number after "/" is the total document count
Search vs Scroll
| Use | When | Max page size |
|---|
| Bounded result sets, UI pagination, known small size | 100 per page |
| Large exports, bulk operations, unbounded iteration | Configurable batch |
Common failure modes
- Over-indexing — Indexing 10+ fields on a high-write entity. Every write updates all indexes, increasing latency and storage.
- Missing indexes — Querying on non-indexed fields triggers full scans. Works in dev with 100 docs, times out in production with 100k.
- by default — Disabling cache on read-heavy entities forces every GET to hit the database. Only disable for high-write entities.
- with PII — Unauthenticated users can list all documents including sensitive data.
- Schema accumulation — 60 schemas from development cycles blocks production deployments.
- Trigger chains — Trigger A modifies entity B, which has a trigger that modifies entity A — infinite loop.
- MD as a log store — Entities growing unboundedly with traffic volume. Use instead.
- MD on critical path — Synchronous MD read in checkout with no timeout or fallback.
Review checklist
Related skills
- vtex-io-masterdata — IO app integration: MasterDataClient, builder, CRUD patterns
- vtex-io-application-performance — Caching layers and BFF patterns when exposing MD data
- architecture-well-architected-commerce — Cross-cutting storage and architecture principles
Reference
- Working with JSON Schemas in Master Data v2 — v-indexed, v-cache, v-security, v-triggers configuration
- Master Data v2 Basics — Core concepts and data model
- Master Data Schema Lifecycle — Schema versioning and the 60-schema limit
- Setting Up Triggers on Master Data v2 — Trigger configuration and patterns
- Master Data v2 API Reference — Complete API specification
- Master Data v2 Document Saving Flow — Validation, indexing, and trigger execution order