trigger-cost-savings

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Trigger.dev Cost Savings Analysis

Trigger.dev 成本节约分析

Analyze task runs and configurations to find cost reduction opportunities.
分析任务运行情况和配置,挖掘降本空间。

Prerequisites: MCP Tools

前置依赖:MCP 工具

This skill requires the Trigger.dev MCP server to analyze live run data.
本技能需要Trigger.dev MCP 服务器来分析实时运行数据。

Check MCP availability

检查 MCP 可用性

Before analysis, verify these MCP tools are available:
  • list_runs
    — list runs with filters (status, task, time period, machine size)
  • get_run_details
    — get run logs, duration, and status
  • get_current_worker
    — get registered tasks and their configurations
If these tools are not available, instruct the user:
To analyze your runs, you need the Trigger.dev MCP server installed.

Run this command to install it:

  npx trigger.dev@latest install-mcp

This launches an interactive wizard that configures the MCP server for your AI client.
Do NOT proceed with run analysis without MCP tools. You can still review source code for static issues (see Static Analysis below).
分析前,请确认以下 MCP 工具可用:
  • list_runs
    — 支持按状态、任务、时间段、机器规格等筛选条件列出运行记录
  • get_run_details
    — 获取运行日志、耗时和状态
  • get_current_worker
    — 获取已注册的任务及其配置
如果这些工具不可用,请告知用户:
To analyze your runs, you need the Trigger.dev MCP server installed.

Run this command to install it:

  npx trigger.dev@latest install-mcp

This launches an interactive wizard that configures the MCP server for your AI client.
没有 MCP 工具时请勿继续进行运行分析,你仍然可以检查源代码中的静态问题(见下文静态分析部分)。

Load latest cost reduction documentation

加载最新的降本文档

Before giving recommendations, fetch the latest guidance:
WebFetch: https://trigger.dev/docs/how-to-reduce-your-spend
Use the fetched content to ensure recommendations are current. If the fetch fails, fall back to the reference documentation in
references/cost-reduction.md
.
给出建议前,请拉取最新的官方指引:
WebFetch: https://trigger.dev/docs/how-to-reduce-your-spend
请使用拉取到的内容确保建议是最新的。如果拉取失败,请回退到
references/cost-reduction.md
中的参考文档。

Analysis Workflow

分析工作流

Step 1: Static Analysis (source code)

步骤 1:静态分析(源代码)

Scan task files in the project for these issues:
  1. Oversized machines — tasks using
    large-1x
    or
    large-2x
    without clear need
  2. Missing
    maxDuration
    — tasks without execution time limits (runaway cost risk)
  3. Excessive retries
    maxAttempts
    > 5 without
    AbortTaskRunError
    for known failures
  4. Missing debounce — high-frequency triggers without debounce configuration
  5. Missing idempotency — payment/critical tasks without idempotency keys
  6. Polling instead of waits
    setTimeout
    /
    setInterval
    /sleep loops instead of
    wait.for()
  7. Short waits
    wait.for()
    with < 5 seconds (not checkpointed, wastes compute)
  8. Sequential instead of batch — multiple
    triggerAndWait()
    calls that could use
    batchTriggerAndWait()
  9. Over-scheduled crons — schedules running more frequently than necessary
扫描项目中的任务文件,检查以下问题:
  1. 机器规格过大 — 无明确需求却使用
    large-1x
    large-2x
    的任务
  2. 缺少
    maxDuration
    配置
    — 没有执行时间限制的任务(存在成本失控风险)
  3. 重试次数过多
    maxAttempts
    超过5次,且没有针对已知故障设置
    AbortTaskRunError
  4. 缺少防抖配置 — 高频触发的任务没有配置防抖
  5. 缺少幂等性配置 — 支付/核心任务没有设置幂等键
  6. 使用轮询替代等待 — 用
    setTimeout
    /
    setInterval
    /睡眠循环替代
    wait.for()
  7. 等待时长过短
    wait.for()
    的等待时长小于5秒(不会做 checkpoint,浪费计算资源)
  8. 顺序调用替代批量调用 — 多个
    triggerAndWait()
    调用可以改用
    batchTriggerAndWait()
  9. 定时任务调度过于频繁 — 调度频率高于实际需求的定时任务

Step 2: Run Analysis (requires MCP tools)

步骤 2:运行分析(需要 MCP 工具)

Use MCP tools to analyze actual usage patterns:
使用 MCP 工具分析实际使用模式:

2a. Identify expensive tasks

2a. 识别高成本任务

list_runs with filters:
- period: "30d" or "7d"
- Sort by duration or cost
- Check across different task IDs
Look for:
  • Tasks with high total compute time (duration x run count)
  • Tasks with high failure rates (wasted retries)
  • Tasks running on large machines with short durations (over-provisioned)
list_runs with filters:
- period: "30d" or "7d"
- Sort by duration or cost
- Check across different task IDs
重点关注:
  • 总计算时长(耗时 × 运行次数)较高的任务
  • 失败率较高的任务(重试浪费资源)
  • 运行在大规格机器上但耗时很短的任务(资源超配)

2b. Analyze failure patterns

2b. 分析失败模式

list_runs with status: "FAILED" or "CRASHED"
For high-failure tasks:
  • Check if failures are retryable (transient) vs permanent
  • Suggest
    AbortTaskRunError
    for known non-retryable errors
  • Calculate wasted compute from failed retries
list_runs with status: "FAILED" or "CRASHED"
针对高失败率任务:
  • 检查失败是可重试的(临时故障)还是永久故障
  • 建议针对已知不可重试的错误抛出
    AbortTaskRunError
  • 计算失败重试浪费的计算资源

2c. Check machine utilization

2c. 检查机器利用率

get_run_details for sample runs of each task
Compare actual resource usage against machine preset:
  • If a task on
    large-2x
    consistently runs in < 1 second, it's over-provisioned
  • If tasks are I/O-bound (API calls, DB queries), they likely don't need large machines
get_run_details for sample runs of each task
对比实际资源使用量和机器预设规格:
  • 如果运行在
    large-2x
    上的任务耗时稳定小于1秒,说明资源超配
  • 如果是 I/O 密集型任务(API 调用、DB 查询),通常不需要大规格机器

2d. Review schedule frequency

2d. 检查调度频率

get_current_worker to list scheduled tasks and their cron patterns
Flag schedules that may be too frequent for their purpose.
get_current_worker to list scheduled tasks and their cron patterns
标记频率高于实际用途的调度任务。

Step 3: Generate Recommendations

步骤 3:生成建议

Present findings as a prioritized list with estimated impact:
markdown
undefined
按优先级列出发现的问题和预估影响:
markdown
undefined

Cost Optimization Report

Cost Optimization Report

High Impact

High Impact

  1. Right-size
    process-images
    machine
    — Currently
    large-2x
    , average run 2s. Switching to
    small-2x
    could reduce this task's cost by ~16x.
    ts
    machine: { preset: "small-2x" }  // was "large-2x"
  1. Right-size
    process-images
    machine
    — Currently
    large-2x
    , average run 2s. Switching to
    small-2x
    could reduce this task's cost by ~16x.
    ts
    machine: { preset: "small-2x" }  // was "large-2x"

Medium Impact

Medium Impact

  1. Add debounce to
    sync-user-data
    — 847 runs/day, often triggered in bursts.
    ts
    debounce: { key: `user-${userId}`, delay: "5s" }
  1. Add debounce to
    sync-user-data
    — 847 runs/day, often triggered in bursts.
    ts
    debounce: { key: `user-${userId}`, delay: "5s" }

Low Impact / Best Practices

Low Impact / Best Practices

  1. Add
    maxDuration
    to
    generate-report
    — No timeout configured.
    ts
    maxDuration: 300  // 5 minutes
undefined
  1. Add
    maxDuration
    to
    generate-report
    — No timeout configured.
    ts
    maxDuration: 300  // 5 minutes
undefined

Machine Preset Costs (relative)

机器预设成本(相对值)

Larger machines cost proportionally more per second of compute:
PresetvCPURAMRelative Cost
micro0.250.25 GB0.25x
small-1x0.50.5 GB1x (baseline)
small-2x11 GB2x
medium-1x12 GB2x
medium-2x24 GB4x
large-1x48 GB8x
large-2x816 GB16x
更大规格的机器每秒计算成本按比例升高:
预设规格vCPURAM相对成本
micro0.250.25 GB0.25x
small-1x0.50.5 GB1x (基准线)
small-2x11 GB2x
medium-1x12 GB2x
medium-2x24 GB4x
large-1x48 GB8x
large-2x816 GB16x

Key Principles

核心原则

  • Waits > 5 seconds are free — checkpointed, no compute charge
  • Start small, scale up — default
    small-1x
    is right for most tasks
  • I/O-bound tasks don't need big machines — API calls, DB queries wait on network
  • Debounce saves the most on high-frequency tasks — consolidates bursts into single runs
  • Idempotency prevents duplicate work — especially important for expensive operations
  • AbortTaskRunError
    stops wasteful retries
    — don't retry permanent failures
See
references/cost-reduction.md
for detailed strategies with code examples.
  • 等待超过5秒免费 — 会做 checkpoint,不计计算费用
  • 从小规格开始,按需扩容 — 默认的
    small-1x
    适合绝大多数任务
  • I/O 密集型任务不需要大机器 — API 调用、DB 查询的耗时主要在等待网络
  • 防抖对高频任务降本效果最明显 — 可以将突发触发的多次运行合并为一次
  • 幂等性避免重复工作 — 对高开销操作尤其重要
  • AbortTaskRunError
    可以停止无意义的重试
    — 不要重试永久故障
详见
references/cost-reduction.md
获取带代码示例的详细策略。