doris-architecture-advisor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseApache Doris Architecture Advisor
Apache Doris架构顾问
Workload-aware architecture design for Apache Doris. 8 decision rules, 3 worked examples. Complementswith sizing-first workflow.doris-best-practices
面向Apache Doris的工作负载感知架构设计。 8条决策规则,3个实践示例。 通过以规模规划为优先的工作流,对进行补充。doris-best-practices
Workflow
工作流
Follow these 5 steps in order:
-
DDL validation — Theskill handles DDL correctness. Its Pre-Flight Checklist and DDL Gotchas apply to every CREATE TABLE. This advisor focuses on architecture decisions (which model, which partition strategy, which indexes), not DDL syntax. Always calculate explicit bucket counts. If volume is unknown, choose a conservative default: 3 for small dimensions, 8 for medium tables, 16-32 for large daily fact tables.
doris-best-practices -
Classify workload — Read. Match user's scenario to one or more of the 6 workload types. Composite workloads (e.g., IoT = time-series + device state + logs + dashboards) decompose into multiple sub-tables.
references/decision-workload-classification.md -
Size the cluster — Read. Estimate write throughput, query QPS, latency target, and hot data volume. Output sizing as total vCPU and total cache only — never break down into per-node specs (in cloud / storage-compute mode, node count is typically managed by the platform). Also read
references/decision-sizing-matrix.mdif user hasn't specified cloud vs on-prem.references/decision-deployment-mode.md -
Design architecture — Based on workload classification, read the relevant decision rules:
Workload signal Read these rules Append-only events, logs, time-series ,decision-data-model-selection,decision-time-series-designdecision-ingestion-strategyUpdates, CDC, device state tracking ,decision-data-model-selection,decision-mutable-statedecision-ingestion-strategySemi-structured / multi-protocol JSON (VARIANT section)decision-data-model-selectionDashboards, pre-aggregated metrics decision-query-accelerationPoint query API, high-concurrency lookups (point query section)decision-query-accelerationText search, log search, full-text (index section)decision-query-accelerationVector / embedding search (vector section)decision-query-accelerationWarehouse layering (ODS/DWD/DWS/ADS) (layering section),decision-workload-classificationdecision-data-model-selectionMulti-department / workload isolation (isolation section)decision-workload-classificationHot/cold tiering with data lake (lakehouse section),decision-workload-classificationdecision-deployment-modeOutput the architecture design: data flow diagram, table-per-sub-workload mapping, and the key design decisions (model, partition strategy, bucket key, indexes, compression, ingestion method) for each table. -
Generate DDL — Produce CREATE TABLE statements applying ALL constraints from step 1. Calculate explicit bucket counts with the formula in; use the fallback counts above when inputs are incomplete. For each table, cite the best-practices rule that drove the decision.
decision-time-series-design.md
按顺序遵循以下5个步骤:
-
DDL验证 —技能负责处理DDL的正确性。其预检查清单和DDL注意事项适用于所有CREATE TABLE语句。本顾问专注于架构决策(选择哪种模型、哪种分区策略、哪种索引),而非DDL语法。务必计算明确的分桶数量。如果数据量未知,选择保守的默认值:小型维度表设为3,中型表设为8,大型日事实表设为16-32。
doris-best-practices -
工作负载分类 — 阅读。将用户的场景匹配到6种工作负载类型中的一种或多种。复合工作负载(例如:物联网 = 时间序列 + 设备状态 + 日志 + 仪表盘)需分解为多个子表。
references/decision-workload-classification.md -
集群规模规划 — 阅读。估算写入吞吐量、查询QPS、延迟目标和热数据量。仅输出总vCPU和总缓存的规模建议——切勿拆分为单节点规格(在云/存算分离模式下,节点数量通常由平台管理)。如果用户未指定云部署还是本地部署,还需阅读
references/decision-sizing-matrix.md。references/decision-deployment-mode.md -
架构设计 — 根据工作负载分类,阅读相关的决策规则:
工作负载信号 阅读以下规则 仅追加事件、日志、时间序列 ,decision-data-model-selection,decision-time-series-designdecision-ingestion-strategy更新、CDC、设备状态追踪 ,decision-data-model-selection,decision-mutable-statedecision-ingestion-strategy半结构化/多协议JSON (VARIANT章节)decision-data-model-selection仪表盘、预聚合指标 decision-query-acceleration点查询API、高并发查询 (点查询章节)decision-query-acceleration文本搜索、日志搜索、全文检索 (索引章节)decision-query-acceleration向量/嵌入搜索 (向量章节)decision-query-acceleration仓库分层(ODS/DWD/DWS/ADS) (分层章节),decision-workload-classificationdecision-data-model-selection多部门/工作负载隔离 (隔离章节)decision-workload-classification冷热分层搭配数据湖 (湖仓章节),decision-workload-classificationdecision-deployment-mode输出架构设计:数据流图、每个子工作负载对应的表映射,以及每张表的关键设计决策(模型、分区策略、分桶键、索引、压缩方式、数据摄入方法)。 -
生成DDL — 生成应用步骤1中所有约束的CREATE TABLE语句。使用中的公式计算明确的分桶数量;当输入信息不完整时,使用上述的备用数量。对于每张表,注明驱动该决策的最佳实践规则。
decision-time-series-design.md
Output Structure
输出结构
Responses should include these sections (adapt formatting to conversation):
- Workload Summary — Classification, write rate, QPS, latency target, hot data volume
- Sizing Recommendation — Warehouse tier, storage estimate, cache strategy
- Architecture Overview — Data flow from sources → ingestion → Apache Doris → applications
- Table Designs — CREATE TABLE with inline comments citing decision rules
- Rules Checked — For each table, list the rules applied with exact file paths so users can look up the rule for troubleshooting. Format: . Example:
Per [rule-name](doris-best-practices/references/rule-name.md)Table: sensor_readings Rules Applied: - [schema-model-choose-for-workload](doris-best-practices/references/schema-model-choose-for-workload.md) — DUPLICATE for append-only - [schema-bucket-target-size](doris-best-practices/references/schema-bucket-target-size.md) — 10 buckets (21 GB / 2 GB) - [schema-props-compression](doris-best-practices/references/schema-props-compression.md) — ZSTD for IoT data - Decision Provenance — Each recommendation tagged: (from Doris docs),
official(logical inference), orderived(experience heuristic with disclaimer)field
回复应包含以下部分(可根据对话调整格式):
- 工作负载摘要 — 分类、写入速率、QPS、延迟目标、热数据量
- 规模建议 — 仓库层级、存储估算、缓存策略
- 架构概述 — 从数据源 → 数据摄入 → Apache Doris → 应用的数据流
- 表设计 — 带有内联注释的CREATE TABLE语句,注释需引用决策规则
- 已检查规则 — 对于每张表,列出应用的规则及精确文件路径,方便用户查阅规则进行故障排查。格式:。示例:
Per [规则名称](doris-best-practices/references/规则名称.md)Table: sensor_readings Rules Applied: - [schema-model-choose-for-workload](doris-best-practices/references/schema-model-choose-for-workload.md) — 仅追加场景使用DUPLICATE模型 - [schema-bucket-target-size](doris-best-practices/references/schema-bucket-target-size.md) — 10个分桶(21 GB / 2 GB) - [schema-props-compression](doris-best-practices/references/schema-props-compression.md) — IoT数据使用ZSTD压缩 - 决策来源 — 每个建议需标记:(来自Doris官方文档)、
official(逻辑推导)或derived(经验启发,附免责声明)field
Worked Examples
实践示例
For complete input → output examples, read:
- — IoT: 50K sensors, composite workload, 4 tables
references/example-iot-sensor-platform.md - — Logs + traces + metrics, inverted index, ZSTD
references/example-log-observability.md - — MySQL CDC, UNIQUE MoW, sequence column
references/example-cdc-operational-sync.md - — Securities firm: ODS→DWD→DWS→ADS layering, customer 360, compliance, lakehouse, workload isolation
references/example-securities-analytics.md - — Retail/fashion: omnichannel inventory, wide+tall table for user profiling, BITMAP segmentation, multi-brand isolation, peak season scaling
references/example-retail-fashion.md - — Logistics/courier: AGGREGATE for parcel status (MIN/MAX/REPLACE), vehicle GPS with GIS + cooldown_ttl, sorting center KPIs, platform consolidation (Presto+Kudu+ES+HBase→Apache Doris)
references/example-logistics-courier.md - — Web3/crypto: multi-chain VARIANT schema, custody monitoring, TVL/token async MVs, AML risk detection, wallet profiling, session analysis
references/example-web3-exchange.md - — Payment/fintech: partial column update for tx lifecycle, acquiring row-column hybrid (100+ cols), merchant reconciliation, risk engine, log platform replacing ES, Lambda→unified architecture
references/example-payment-fintech.md - — Gaming: retention/funnel analysis, player profiling BITMAP, NL2SQL Agentic analytics via MCP, anti-cheat anomaly detection, lakehouse for offline data
references/example-gaming.md - — AdTech/marketing: dual-path RTB serving + analytics, DSP/ADX, creative analysis with VARIANT + vector, cross-border multi-region, replacing Redis+MySQL+HBase+Hive
references/example-adtech-marketing.md
完整的输入→输出示例,请阅读:
- — 物联网:50K传感器,复合工作负载,4张表
references/example-iot-sensor-platform.md - — 日志+链路追踪+指标,倒排索引,ZSTD压缩
references/example-log-observability.md - — MySQL CDC,UNIQUE MoW,序列列
references/example-cdc-operational-sync.md - — 证券公司:ODS→DWD→DWS→ADS分层,客户360视图,合规性,湖仓,工作负载隔离
references/example-securities-analytics.md - — 零售/时尚:全渠道库存,用户画像宽表,BITMAP分段,多品牌隔离,旺季扩容
references/example-retail-fashion.md - — 物流/快递:包裹状态使用AGGREGATE模型(MIN/MAX/REPLACE),车辆GPS搭配GIS + cooldown_ttl,分拣中心KPI,平台整合(Presto+Kudu+ES+HBase→Apache Doris)
references/example-logistics-courier.md - — Web3/加密货币:多链VARIANT schema,托管监控,TVL/代币异步MV,AML风险检测,钱包画像,会话分析
references/example-web3-exchange.md - — 支付/金融科技:交易生命周期的部分列更新,收单业务行列混合(100+列),商户对账,风险引擎,日志平台替换ES,Lambda→统一架构
references/example-payment-fintech.md - — 游戏:留存/漏斗分析,玩家画像BITMAP,通过MCP实现NL2SQL Agentic分析,反作弊异常检测,离线数据湖仓
references/example-gaming.md - — 广告技术/营销:RTB服务+分析双路径,DSP/ADX,创意分析搭配VARIANT + 向量,跨境多区域,替换Redis+MySQL+HBase+Hive
references/example-adtech-marketing.md
When NOT to Use This Skill
不适用场景
- Reviewing existing DDL → use instead
doris-best-practices - Optimizing a slow query → use query rules
doris-best-practices - CLI / connection setup → use "Connection & CLI" section
doris-best-practices
- 审核现有DDL → 改用
doris-best-practices - 优化慢查询 → 使用的查询规则
doris-best-practices - CLI / 连接设置 → 使用的“连接与CLI”章节
doris-best-practices