doris-architecture-advisor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Apache Doris Architecture Advisor

Apache Doris架构顾问

Workload-aware architecture design for Apache Doris. 8 decision rules, 3 worked examples. Complements
doris-best-practices
with sizing-first workflow.

面向Apache Doris的工作负载感知架构设计。 8条决策规则，3个实践示例。通过以规模规划为优先的工作流，对
doris-best-practices
进行补充。

Workflow

工作流

Follow these 5 steps in order:

DDL validation — The
```
doris-best-practices
```
skill handles DDL correctness. Its Pre-Flight Checklist and DDL Gotchas apply to every CREATE TABLE. This advisor focuses on architecture decisions (which model, which partition strategy, which indexes), not DDL syntax. Always calculate explicit bucket counts. If volume is unknown, choose a conservative default: 3 for small dimensions, 8 for medium tables, 16-32 for large daily fact tables.
Classify workload — Read
```
references/decision-workload-classification.md
```
. Match user's scenario to one or more of the 6 workload types. Composite workloads (e.g., IoT = time-series + device state + logs + dashboards) decompose into multiple sub-tables.
Size the cluster — Read
```
references/decision-sizing-matrix.md
```
. Estimate write throughput, query QPS, latency target, and hot data volume. Output sizing as total vCPU and total cache only — never break down into per-node specs (in cloud / storage-compute mode, node count is typically managed by the platform). Also read
```
references/decision-deployment-mode.md
```
if user hasn't specified cloud vs on-prem.

Design architecture — Based on workload classification, read the relevant decision rules:

Workload signal	Read these rules
Append-only events, logs, time-series	`decision-data-model-selection` , `decision-time-series-design` , `decision-ingestion-strategy`
Updates, CDC, device state tracking	`decision-data-model-selection` , `decision-mutable-state` , `decision-ingestion-strategy`
Semi-structured / multi-protocol JSON	`decision-data-model-selection` (VARIANT section)
Dashboards, pre-aggregated metrics	`decision-query-acceleration`
Point query API, high-concurrency lookups	`decision-query-acceleration` (point query section)
Text search, log search, full-text	`decision-query-acceleration` (index section)
Vector / embedding search	`decision-query-acceleration` (vector section)
Warehouse layering (ODS/DWD/DWS/ADS)	`decision-workload-classification` (layering section), `decision-data-model-selection`
Multi-department / workload isolation	`decision-workload-classification` (isolation section)
Hot/cold tiering with data lake	`decision-workload-classification` (lakehouse section), `decision-deployment-mode`

Output the architecture design: data flow diagram, table-per-sub-workload mapping, and the key design decisions (model, partition strategy, bucket key, indexes, compression, ingestion method) for each table.

Generate DDL — Produce CREATE TABLE statements applying ALL constraints from step 1. Calculate explicit bucket counts with the formula in
```
decision-time-series-design.md
```
; use the fallback counts above when inputs are incomplete. For each table, cite the best-practices rule that drove the decision.

按顺序遵循以下5个步骤：

DDL验证 —
```
doris-best-practices
```
技能负责处理DDL的正确性。其预检查清单和DDL注意事项适用于所有CREATE TABLE语句。本顾问专注于架构决策（选择哪种模型、哪种分区策略、哪种索引），而非DDL语法。务必计算明确的分桶数量。如果数据量未知，选择保守的默认值：小型维度表设为3，中型表设为8，大型日事实表设为16-32。
工作负载分类 — 阅读
```
references/decision-workload-classification.md
```
。将用户的场景匹配到6种工作负载类型中的一种或多种。复合工作负载（例如：物联网 = 时间序列 + 设备状态 + 日志 + 仪表盘）需分解为多个子表。
集群规模规划 — 阅读
```
references/decision-sizing-matrix.md
```
。估算写入吞吐量、查询QPS、延迟目标和热数据量。仅输出总vCPU和总缓存的规模建议——切勿拆分为单节点规格（在云/存算分离模式下，节点数量通常由平台管理）。如果用户未指定云部署还是本地部署，还需阅读
```
references/decision-deployment-mode.md
```
。

架构设计 — 根据工作负载分类，阅读相关的决策规则：

工作负载信号	阅读以下规则
仅追加事件、日志、时间序列	`decision-data-model-selection` , `decision-time-series-design` , `decision-ingestion-strategy`
更新、CDC、设备状态追踪	`decision-data-model-selection` , `decision-mutable-state` , `decision-ingestion-strategy`
半结构化/多协议JSON	`decision-data-model-selection` （VARIANT章节）
仪表盘、预聚合指标	`decision-query-acceleration`
点查询API、高并发查询	`decision-query-acceleration` （点查询章节）
文本搜索、日志搜索、全文检索	`decision-query-acceleration` （索引章节）
向量/嵌入搜索	`decision-query-acceleration` （向量章节）
仓库分层（ODS/DWD/DWS/ADS）	`decision-workload-classification` （分层章节）, `decision-data-model-selection`
多部门/工作负载隔离	`decision-workload-classification` （隔离章节）
冷热分层搭配数据湖	`decision-workload-classification` （湖仓章节）, `decision-deployment-mode`

输出架构设计：数据流图、每个子工作负载对应的表映射，以及每张表的关键设计决策（模型、分区策略、分桶键、索引、压缩方式、数据摄入方法）。

生成DDL — 生成应用步骤1中所有约束的CREATE TABLE语句。使用
```
decision-time-series-design.md
```
中的公式计算明确的分桶数量；当输入信息不完整时，使用上述的备用数量。对于每张表，注明驱动该决策的最佳实践规则。

Output Structure

输出结构

Responses should include these sections (adapt formatting to conversation):

Workload Summary — Classification, write rate, QPS, latency target, hot data volume
Sizing Recommendation — Warehouse tier, storage estimate, cache strategy
Architecture Overview — Data flow from sources → ingestion → Apache Doris → applications
Table Designs — CREATE TABLE with inline comments citing decision rules

Rules Checked — For each table, list the rules applied with exact file paths so users can look up the rule for troubleshooting. Format:

Per [rule-name](doris-best-practices/references/rule-name.md)

. Example:

Table: sensor_readings
Rules Applied:
- [schema-model-choose-for-workload](doris-best-practices/references/schema-model-choose-for-workload.md) — DUPLICATE for append-only
- [schema-bucket-target-size](doris-best-practices/references/schema-bucket-target-size.md) — 10 buckets (21 GB / 2 GB)
- [schema-props-compression](doris-best-practices/references/schema-props-compression.md) — ZSTD for IoT data

Decision Provenance — Each recommendation tagged:
```
official
```
(from Doris docs),
```
derived
```
(logical inference), or
```
field
```
(experience heuristic with disclaimer)

回复应包含以下部分（可根据对话调整格式）：

工作负载摘要 — 分类、写入速率、QPS、延迟目标、热数据量
规模建议 — 仓库层级、存储估算、缓存策略
架构概述 — 从数据源 → 数据摄入 → Apache Doris → 应用的数据流
表设计 — 带有内联注释的CREATE TABLE语句，注释需引用决策规则

已检查规则 — 对于每张表，列出应用的规则及精确文件路径，方便用户查阅规则进行故障排查。格式：

Per [规则名称](doris-best-practices/references/规则名称.md)

。示例：

Table: sensor_readings
Rules Applied:
- [schema-model-choose-for-workload](doris-best-practices/references/schema-model-choose-for-workload.md) — 仅追加场景使用DUPLICATE模型
- [schema-bucket-target-size](doris-best-practices/references/schema-bucket-target-size.md) — 10个分桶（21 GB / 2 GB）
- [schema-props-compression](doris-best-practices/references/schema-props-compression.md) — IoT数据使用ZSTD压缩

决策来源 — 每个建议需标记：
```
official
```
（来自Doris官方文档）、
```
derived
```
（逻辑推导）或
```
field
```
（经验启发，附免责声明）

Worked Examples

实践示例

For complete input → output examples, read:

```
references/example-iot-sensor-platform.md
```
— IoT: 50K sensors, composite workload, 4 tables
```
references/example-log-observability.md
```
— Logs + traces + metrics, inverted index, ZSTD

references/example-cdc-operational-sync.md

— MySQL CDC, UNIQUE MoW, sequence column

```
references/example-securities-analytics.md
```
— Securities firm: ODS→DWD→DWS→ADS layering, customer 360, compliance, lakehouse, workload isolation
```
references/example-retail-fashion.md
```
— Retail/fashion: omnichannel inventory, wide+tall table for user profiling, BITMAP segmentation, multi-brand isolation, peak season scaling
```
references/example-logistics-courier.md
```
— Logistics/courier: AGGREGATE for parcel status (MIN/MAX/REPLACE), vehicle GPS with GIS + cooldown_ttl, sorting center KPIs, platform consolidation (Presto+Kudu+ES+HBase→Apache Doris)
```
references/example-web3-exchange.md
```
— Web3/crypto: multi-chain VARIANT schema, custody monitoring, TVL/token async MVs, AML risk detection, wallet profiling, session analysis
```
references/example-payment-fintech.md
```
— Payment/fintech: partial column update for tx lifecycle, acquiring row-column hybrid (100+ cols), merchant reconciliation, risk engine, log platform replacing ES, Lambda→unified architecture
```
references/example-gaming.md
```
— Gaming: retention/funnel analysis, player profiling BITMAP, NL2SQL Agentic analytics via MCP, anti-cheat anomaly detection, lakehouse for offline data
```
references/example-adtech-marketing.md
```
— AdTech/marketing: dual-path RTB serving + analytics, DSP/ADX, creative analysis with VARIANT + vector, cross-border multi-region, replacing Redis+MySQL+HBase+Hive

完整的输入→输出示例，请阅读：

```
references/example-iot-sensor-platform.md
```
— 物联网：50K传感器，复合工作负载，4张表
```
references/example-log-observability.md
```
— 日志+链路追踪+指标，倒排索引，ZSTD压缩

references/example-cdc-operational-sync.md

— MySQL CDC，UNIQUE MoW，序列列

```
references/example-securities-analytics.md
```
— 证券公司：ODS→DWD→DWS→ADS分层，客户360视图，合规性，湖仓，工作负载隔离
```
references/example-retail-fashion.md
```
— 零售/时尚：全渠道库存，用户画像宽表，BITMAP分段，多品牌隔离，旺季扩容
```
references/example-logistics-courier.md
```
— 物流/快递：包裹状态使用AGGREGATE模型（MIN/MAX/REPLACE），车辆GPS搭配GIS + cooldown_ttl，分拣中心KPI，平台整合（Presto+Kudu+ES+HBase→Apache Doris）
```
references/example-web3-exchange.md
```
— Web3/加密货币：多链VARIANT schema，托管监控，TVL/代币异步MV，AML风险检测，钱包画像，会话分析
```
references/example-payment-fintech.md
```
— 支付/金融科技：交易生命周期的部分列更新，收单业务行列混合（100+列），商户对账，风险引擎，日志平台替换ES，Lambda→统一架构
```
references/example-gaming.md
```
— 游戏：留存/漏斗分析，玩家画像BITMAP，通过MCP实现NL2SQL Agentic分析，反作弊异常检测，离线数据湖仓
```
references/example-adtech-marketing.md
```
— 广告技术/营销：RTB服务+分析双路径，DSP/ADX，创意分析搭配VARIANT + 向量，跨境多区域，替换Redis+MySQL+HBase+Hive

When NOT to Use This Skill

不适用场景

Reviewing existing DDL → use
```
doris-best-practices
```
instead
Optimizing a slow query → use
```
doris-best-practices
```
query rules
CLI / connection setup → use
```
doris-best-practices
```
"Connection & CLI" section

审核现有DDL → 改用
```
doris-best-practices
```
优化慢查询 → 使用
```
doris-best-practices
```
的查询规则
CLI / 连接设置 → 使用
```
doris-best-practices
```
的“连接与CLI”章节