cap-apps-performance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Performance Optimization - Critical Best Practices

性能优化 - 关键最佳实践

NEVER Fetch Full Datasets

绝对不要获取完整数据集

CRITICAL RULE: Never fetch entire datasets, especially large ones (100k+ rows). Always use targeted queries with only the columns needed for each visualization.
关键规则: 绝对不要获取完整数据集,尤其是大型数据集(10万行以上)。始终使用针对性查询,仅获取每个可视化所需的列。

Performance Impact

性能影响

typescript
// ❌ TERRIBLE - Fetches 300k rows × 30 columns = 9M data points
const allData = await new Query()
  .select(['col1', 'col2', 'col3', ...]) // All columns
  .fetch('dataset');

// ✅ GOOD - Fetches 300k rows × 2 columns = 600k data points (93% reduction)
const totals = await new Query()
  .select(['Transactions', 'Total Amount (USD)'])
  .fetch('dataset');
// Then aggregate client-side

// ✅ BEST - Server-side aggregation, returns only aggregated results
const totals = await new Query()
  .groupBy('Merchant Category', { 'Total Amount (USD)': 'sum' })
  .fetch('dataset');
// Returns only ~10-20 rows (one per category)
typescript
// ❌ 极差 - 获取30万行×30列=900万个数据点
const allData = await new Query()
  .select(['col1', 'col2', 'col3', ...]) // 所有列
  .fetch('dataset');

// ✅ 良好 - 获取30万行×2列=60万个数据点(减少93%)
const totals = await new Query()
  .select(['Transactions', 'Total Amount (USD)'])
  .fetch('dataset');
// 然后在客户端进行聚合

// ✅ 最优 - 服务端聚合,仅返回聚合结果
const totals = await new Query()
  .groupBy('Merchant Category', { 'Total Amount (USD)': 'sum' })
  .fetch('dataset');
// 仅返回约10-20行(每个类别一行)

Query Optimization Strategy

查询优化策略

1. Use Server-Side Aggregation When Possible

1. 尽可能使用服务端聚合

Priority order:
  1. Best:
    .groupBy()
    with aggregations (server-side)
  2. Good:
    .select()
    with only needed columns, then aggregate client-side
  3. Acceptable:
    .select()
    with multiple columns if aggregation isn't possible
  4. Never: Fetch all columns or entire dataset
typescript
// ✅ BEST - Server aggregates, returns minimal data
const salesByRegion = await new Query()
  .groupBy('region', { 'Sales_Amount': 'sum', 'Order_Count': 'count' })
  .fetch('sales');

// ✅ GOOD - Only fetch columns needed, aggregate client-side
const salesData = await new Query()
  .select(['region', 'Sales_Amount'])
  .fetch('sales');
const totalSales = salesData.reduce((sum, row) => sum + row.Sales_Amount, 0);

// ❌ BAD - Fetches all columns
const allData = await new Query().fetch('sales');
优先级顺序:
  1. 最优: 带聚合操作的
    .groupBy()
    (服务端)
  2. 良好: 仅选择所需列的
    .select()
    ,然后在客户端聚合
  3. 可接受: 如果无法聚合,使用
    .select()
    选择多列
  4. 绝对不要: 获取所有列或完整数据集
typescript
// ✅ 最优 - 服务端聚合,返回最少数据
const salesByRegion = await new Query()
  .groupBy('region', { 'Sales_Amount': 'sum', 'Order_Count': 'count' })
  .fetch('sales');

// ✅ 良好 - 仅获取所需列,在客户端聚合
const salesData = await new Query()
  .select(['region', 'Sales_Amount'])
  .fetch('sales');
const totalSales = salesData.reduce((sum, row) => sum + row.Sales_Amount, 0);

// ❌ 糟糕 - 获取所有列
const allData = await new Query().fetch('sales');

2. One Query Per Visualization

2. 每个可视化对应一个查询

Each visualization should have its own optimized query:
typescript
// ✅ CORRECT - Separate optimized queries
const quickStats = await fetchQuickStats();      // Only Account Key, Account Status
const riskMetrics = await fetchRiskMetrics();    // Only Transactions, KYC Status, etc.
const categories = await fetchCategories();      // Grouped by Merchant Category

// ❌ WRONG - One query for everything
const allData = await fetchAllData();
const quickStats = calculateFromAll(allData);
const riskMetrics = calculateFromAll(allData);
const categories = calculateFromAll(allData);
每个可视化都应有自己的优化查询:
typescript
// ✅ 正确 - 独立的优化查询
const quickStats = await fetchQuickStats();      // 仅获取Account Key、Account Status
const riskMetrics = await fetchRiskMetrics();    // 仅获取Transactions、KYC Status等
const categories = await fetchCategories();      // 按Merchant Category分组

// ❌ 错误 - 一个查询获取所有数据
const allData = await fetchAllData();
const quickStats = calculateFromAll(allData);
const riskMetrics = calculateFromAll(allData);
const categories = calculateFromAll(allData);

3. Column Selection Strategy

3. 列选择策略

Always specify columns explicitly:
  • Only select columns needed for the specific calculation
  • Don't use
    .select()
    without arguments (fetches all columns)
  • For aggregations, only select the grouping column + aggregated columns
typescript
// ✅ CORRECT - Only 2 columns
const data = await new Query()
  .select(['Account Key', 'Account Status'])
  .fetch('dataset');

// ❌ WRONG - Fetches all columns
const data = await new Query().fetch('dataset');
始终明确指定列:
  • 仅选择特定计算所需的列
  • 不要不带参数使用
    .select()
    (会获取所有列)
  • 对于聚合操作,仅选择分组列和聚合列
typescript
// ✅ 正确 - 仅2列
const data = await new Query()
  .select(['Account Key', 'Account Status'])
  .fetch('dataset');

// ❌ 错误 - 获取所有列
const data = await new Query().fetch('dataset');

Common Patterns

常见模式

Pattern 1: Totals Without Grouping

模式1:无分组的总计

When you need totals but no grouping column:
typescript
// Option A: Select only needed columns, sum client-side
const totalsData = await new Query()
  .select(['Transactions', 'Total Amount (USD)'])
  .fetch('dataset');
const totalTransactions = totalsData.reduce((sum, row) => sum + (row.Transactions || 0), 0);
const totalVolume = totalsData.reduce((sum, row) => sum + (row['Total Amount (USD)'] || 0), 0);

// Option B: Use a constant grouping column (if dataset has one)
// Not always possible, but more efficient if available
当你需要总计但不需要分组列时:
typescript
// 选项A:仅选择所需列,在客户端求和
const totalsData = await new Query()
  .select(['Transactions', 'Total Amount (USD)'])
  .fetch('dataset');
const totalTransactions = totalsData.reduce((sum, row) => sum + (row.Transactions || 0), 0);
const totalVolume = totalsData.reduce((sum, row) => sum + (row['Total Amount (USD)'] || 0), 0);

// 选项B:使用常量分组列(如果数据集有)
// 并非总是可行,但如果可用则效率更高

Pattern 2: Counts with Filters

模式2:带过滤的计数

typescript
// ✅ CORRECT - Only fetch the column needed for counting
const kycData = await new Query()
  .select(['KYC Status'])
  .fetch('dataset');
const approvedCount = kycData.filter(row => row['KYC Status'] === 'APPROVED').length;

// Better if possible: Use server-side count aggregation
// (See Query API limitations below)
typescript
// ✅ 正确 - 仅获取计数所需的列
const kycData = await new Query()
  .select(['KYC Status'])
  .fetch('dataset');
const approvedCount = kycData.filter(row => row['KYC Status'] === 'APPROVED').length;

// 如果可能更好:使用服务端计数聚合
// (见下文Query API限制)

Pattern 3: Unique Values

模式3:唯一值

typescript
// ✅ CORRECT - Only fetch the column needed
const statesData = await new Query()
  .select(['State'])
  .fetch('dataset');
const uniqueStates = new Set(statesData.map(row => row.State)).size;
typescript
// ✅ 正确 - 仅获取所需列
const statesData = await new Query()
  .select(['State'])
  .fetch('dataset');
const uniqueStates = new Set(statesData.map(row => row.State)).size;

Query API Limitations

Query API限制

.aggregate()
Doesn't Work

.aggregate()
无法工作

CRITICAL: The
.aggregate()
method shown in some documentation does not work in practice. It causes error:
DA0057: An alias list was provided but it could not be parsed
.
typescript
// ❌ DOES NOT WORK - Causes DA0057 error
const totals = await new Query()
  .aggregate({ 'Transactions': 'sum', 'Total Amount (USD)': 'sum' })
  .fetch('dataset');

// ✅ WORKAROUND - Use .groupBy() with a grouping column
const totals = await new Query()
  .groupBy('some_column', { 'Transactions': 'sum', 'Total Amount (USD)': 'sum' })
  .fetch('dataset');

// ✅ ALTERNATIVE - Select columns and aggregate client-side
const totals = await new Query()
  .select(['Transactions', 'Total Amount (USD)'])
  .fetch('dataset');
// Then sum client-side
关键提示: 某些文档中展示的
.aggregate()
方法在实际中无法工作。会导致错误:
DA0057: An alias list was provided but it could not be parsed
typescript
// ❌ 无法工作 - 导致DA0057错误
const totals = await new Query()
  .aggregate({ 'Transactions': 'sum', 'Total Amount (USD)': 'sum' })
  .fetch('dataset');

// ✅ 替代方案 - 使用带分组列的.groupBy()
const totals = await new Query()
  .groupBy('some_column', { 'Transactions': 'sum', 'Total Amount (USD)': 'sum' })
  .fetch('dataset');

// ✅ 另一种替代方案 - 选择列并在客户端聚合
const totals = await new Query()
  .select(['Transactions', 'Total Amount (USD)'])
  .fetch('dataset');
// 然后在客户端求和

.groupBy()
Requires a Grouping Column

.groupBy()
需要分组列

You cannot use
.groupBy()
with only aggregations - you must provide a grouping column:
typescript
// ❌ DOES NOT WORK - No grouping column
.groupBy({ 'Transactions': 'sum' })

// ✅ WORKS - Has grouping column
.groupBy('region', { 'Transactions': 'sum' })

// ✅ WORKS - Multiple groupBy calls
.groupBy('region')
.groupBy({ 'Transactions': 'sum' })
你不能仅使用聚合操作而不提供分组列来调用
.groupBy()
typescript
// ❌ 无法工作 - 无分组列
.groupBy({ 'Transactions': 'sum' })

// ✅ 可行 - 有分组列
.groupBy('region', { 'Transactions': 'sum' })

// ✅ 可行 - 多次调用groupBy
.groupBy('region')
.groupBy({ 'Transactions': 'sum' })

Performance Monitoring

性能监控

Always check:
  1. Network tab - How much data is being transferred?
  2. Console - Any warnings about large queries?
  3. Response times - Are queries taking too long?
If a query is slow or returns too much data:
  • Reduce columns selected
  • Add filters to reduce rows
  • Use server-side aggregation instead of client-side
  • Consider pagination for large result sets
始终检查:
  1. 网络标签页 - 传输了多少数据?
  2. 控制台 - 有没有关于大型查询的警告?
  3. 响应时间 - 查询是否耗时过长?
如果查询缓慢或返回数据过多:
  • 减少选择的列数
  • 添加过滤条件以减少行数
  • 使用服务端聚合替代客户端聚合
  • 考虑对大型结果集使用分页