trigger-cost-savings

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Trigger.dev Cost Savings Analysis

Trigger.dev 成本节约分析

Analyze task runs and configurations to find cost reduction opportunities.

分析任务运行情况和配置，挖掘降本空间。

Prerequisites: MCP Tools

前置依赖：MCP 工具

This skill requires the Trigger.dev MCP server to analyze live run data.

本技能需要Trigger.dev MCP 服务器来分析实时运行数据。

Check MCP availability

检查 MCP 可用性

Before analysis, verify these MCP tools are available:

```
list_runs
```
— list runs with filters (status, task, time period, machine size)
```
get_run_details
```
— get run logs, duration, and status
```
get_current_worker
```
— get registered tasks and their configurations

If these tools are not available, instruct the user:

To analyze your runs, you need the Trigger.dev MCP server installed.

Run this command to install it:

  npx trigger.dev@latest install-mcp

This launches an interactive wizard that configures the MCP server for your AI client.

Do NOT proceed with run analysis without MCP tools. You can still review source code for static issues (see Static Analysis below).

分析前，请确认以下 MCP 工具可用：

```
list_runs
```
— 支持按状态、任务、时间段、机器规格等筛选条件列出运行记录
```
get_run_details
```
— 获取运行日志、耗时和状态
```
get_current_worker
```
— 获取已注册的任务及其配置

如果这些工具不可用，请告知用户：

To analyze your runs, you need the Trigger.dev MCP server installed.

Run this command to install it:

  npx trigger.dev@latest install-mcp

This launches an interactive wizard that configures the MCP server for your AI client.

没有 MCP 工具时请勿继续进行运行分析，你仍然可以检查源代码中的静态问题（见下文静态分析部分）。

Load latest cost reduction documentation

加载最新的降本文档

Before giving recommendations, fetch the latest guidance:

WebFetch: https://trigger.dev/docs/how-to-reduce-your-spend

Use the fetched content to ensure recommendations are current. If the fetch fails, fall back to the reference documentation in

references/cost-reduction.md

给出建议前，请拉取最新的官方指引：

WebFetch: https://trigger.dev/docs/how-to-reduce-your-spend

请使用拉取到的内容确保建议是最新的。如果拉取失败，请回退到

references/cost-reduction.md

中的参考文档。

Analysis Workflow

分析工作流

Step 1: Static Analysis (source code)

步骤 1：静态分析（源代码）

Scan task files in the project for these issues:

Oversized machines — tasks using
```
large-1x
```
or
```
large-2x
```
without clear need
Missing
maxDuration
— tasks without execution time limits (runaway cost risk)
Excessive retries —
```
maxAttempts
```
> 5 without
```
AbortTaskRunError
```
for known failures
Missing debounce — high-frequency triggers without debounce configuration
Missing idempotency — payment/critical tasks without idempotency keys
Polling instead of waits —
```
setTimeout
```
/
```
setInterval
```
/sleep loops instead of
```
wait.for()
```
Short waits —
```
wait.for()
```
with < 5 seconds (not checkpointed, wastes compute)
Sequential instead of batch — multiple
```
triggerAndWait()
```
calls that could use
```
batchTriggerAndWait()
```
Over-scheduled crons — schedules running more frequently than necessary

扫描项目中的任务文件，检查以下问题：

机器规格过大 — 无明确需求却使用
```
large-1x
```
或
```
large-2x
```
的任务
缺少
maxDuration
配置 — 没有执行时间限制的任务（存在成本失控风险）
重试次数过多 —
```
maxAttempts
```
超过5次，且没有针对已知故障设置
```
AbortTaskRunError
```
缺少防抖配置 — 高频触发的任务没有配置防抖
缺少幂等性配置 — 支付/核心任务没有设置幂等键
使用轮询替代等待 — 用
```
setTimeout
```
/
```
setInterval
```
/睡眠循环替代
```
wait.for()
```
等待时长过短 —
```
wait.for()
```
的等待时长小于5秒（不会做 checkpoint，浪费计算资源）
顺序调用替代批量调用 — 多个
```
triggerAndWait()
```
调用可以改用
```
batchTriggerAndWait()
```
定时任务调度过于频繁 — 调度频率高于实际需求的定时任务

Step 2: Run Analysis (requires MCP tools)

步骤 2：运行分析（需要 MCP 工具）

Use MCP tools to analyze actual usage patterns:

使用 MCP 工具分析实际使用模式：

2a. Identify expensive tasks

2a. 识别高成本任务

list_runs with filters:
- period: "30d" or "7d"
- Sort by duration or cost
- Check across different task IDs

Look for:

Tasks with high total compute time (duration x run count)
Tasks with high failure rates (wasted retries)
Tasks running on large machines with short durations (over-provisioned)

list_runs with filters:
- period: "30d" or "7d"
- Sort by duration or cost
- Check across different task IDs

重点关注：

总计算时长（耗时 × 运行次数）较高的任务
失败率较高的任务（重试浪费资源）
运行在大规格机器上但耗时很短的任务（资源超配）

2b. Analyze failure patterns

2b. 分析失败模式

list_runs with status: "FAILED" or "CRASHED"

For high-failure tasks:

Check if failures are retryable (transient) vs permanent
Suggest
```
AbortTaskRunError
```
for known non-retryable errors
Calculate wasted compute from failed retries

list_runs with status: "FAILED" or "CRASHED"

针对高失败率任务：

检查失败是可重试的（临时故障）还是永久故障
建议针对已知不可重试的错误抛出
```
AbortTaskRunError
```
计算失败重试浪费的计算资源

2c. Check machine utilization

2c. 检查机器利用率

get_run_details for sample runs of each task

Compare actual resource usage against machine preset:

If a task on
```
large-2x
```
consistently runs in < 1 second, it's over-provisioned
If tasks are I/O-bound (API calls, DB queries), they likely don't need large machines

get_run_details for sample runs of each task

对比实际资源使用量和机器预设规格：

如果运行在
```
large-2x
```
上的任务耗时稳定小于1秒，说明资源超配
如果是 I/O 密集型任务（API 调用、DB 查询），通常不需要大规格机器

2d. Review schedule frequency

2d. 检查调度频率

get_current_worker to list scheduled tasks and their cron patterns

Flag schedules that may be too frequent for their purpose.

get_current_worker to list scheduled tasks and their cron patterns

标记频率高于实际用途的调度任务。

Step 3: Generate Recommendations

步骤 3：生成建议

Present findings as a prioritized list with estimated impact:

markdown

undefined

按优先级列出发现的问题和预估影响：

markdown

undefined

Cost Optimization Report

High Impact

Right-size
process-images
machine — Currently
```
large-2x
```
, average run 2s. Switching to
```
small-2x
```
could reduce this task's cost by ~16x.
ts
```
machine: { preset: "small-2x" }  // was "large-2x"
```

Right-size
process-images
machine — Currently
```
large-2x
```
, average run 2s. Switching to
```
small-2x
```
could reduce this task's cost by ~16x.
ts
```
machine: { preset: "small-2x" }  // was "large-2x"
```

Medium Impact

Add debounce to
sync-user-data
— 847 runs/day, often triggered in bursts.

debounce: { key: `user-${userId}`, delay: "5s" }

Add debounce to
sync-user-data
— 847 runs/day, often triggered in bursts.

debounce: { key: `user-${userId}`, delay: "5s" }

Low Impact / Best Practices

Add
maxDuration
to
generate-report
— No timeout configured.

maxDuration: 300  // 5 minutes

undefined

Add
maxDuration
to
generate-report
— No timeout configured.

maxDuration: 300  // 5 minutes

undefined

Machine Preset Costs (relative)

机器预设成本（相对值）

Larger machines cost proportionally more per second of compute:

Preset	vCPU	RAM	Relative Cost
micro	0.25	0.25 GB	0.25x
small-1x	0.5	0.5 GB	1x (baseline)
small-2x	1	1 GB	2x
medium-1x	1	2 GB	2x
medium-2x	2	4 GB	4x
large-1x	4	8 GB	8x
large-2x	8	16 GB	16x

更大规格的机器每秒计算成本按比例升高：

预设规格	vCPU	RAM	相对成本
micro	0.25	0.25 GB	0.25x
small-1x	0.5	0.5 GB	1x (基准线)
small-2x	1	1 GB	2x
medium-1x	1	2 GB	2x
medium-2x	2	4 GB	4x
large-1x	4	8 GB	8x
large-2x	8	16 GB	16x

Key Principles

核心原则

Waits > 5 seconds are free — checkpointed, no compute charge
Start small, scale up — default
```
small-1x
```
is right for most tasks
I/O-bound tasks don't need big machines — API calls, DB queries wait on network
Debounce saves the most on high-frequency tasks — consolidates bursts into single runs
Idempotency prevents duplicate work — especially important for expensive operations
AbortTaskRunError
stops wasteful retries — don't retry permanent failures

See

references/cost-reduction.md

for detailed strategies with code examples.

等待超过5秒免费 — 会做 checkpoint，不计计算费用
从小规格开始，按需扩容 — 默认的
```
small-1x
```
适合绝大多数任务
I/O 密集型任务不需要大机器 — API 调用、DB 查询的耗时主要在等待网络
防抖对高频任务降本效果最明显 — 可以将突发触发的多次运行合并为一次
幂等性避免重复工作 — 对高开销操作尤其重要
AbortTaskRunError
可以停止无意义的重试 — 不要重试永久故障

详见

references/cost-reduction.md

获取带代码示例的详细策略。