automate-this
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAutomate This
自动化该流程
Analyze a screen recording of a manual process and build working automation for it.
The user records themselves doing something repetitive or tedious, hands you the video file, and you figure out what they're doing, why, and how to script it away.
分析手动流程的屏幕录制视频,为其构建可运行的自动化方案。
用户录制自己执行重复或繁琐任务的过程,将视频文件交给你,由你分析他们的操作内容、目的,以及如何通过脚本将该流程自动化。
Prerequisites Check
前置条件检查
Before analyzing any recording, verify the required tools are available. Run these checks silently and only surface problems:
bash
command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG"
command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER"- ffmpeg is required. If missing, tell the user: (macOS) or the equivalent for their OS.
brew install ffmpeg - Whisper is optional. Only needed if the recording has narration. If missing AND the recording has an audio track, suggest: or
pip install openai-whisper. If the user declines, proceed with visual analysis only.brew install whisper-cpp
在分析任何录制视频之前,先验证所需工具是否可用。静默运行以下检查,仅在出现问题时提示用户:
bash
command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG"
command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER"- 必须安装ffmpeg。如果缺失,请告知用户:macOS系统使用,其他系统使用对应平台的安装命令。
brew install ffmpeg - Whisper为可选工具。仅当录制视频包含旁白时需要。如果缺失且视频有音轨,建议用户安装:或
pip install openai-whisper。若用户拒绝安装,则仅进行视觉分析。brew install whisper-cpp
Phase 1: Extract Content from the Recording
第一阶段:从录制视频中提取内容
Given a video file path (typically on ), extract both visual frames and audio:
~/Desktop/给定视频文件路径(通常在下),同时提取视觉帧和音频:
~/Desktop/Frame Extraction
帧提取
Extract frames at one frame every 2 seconds. This balances coverage with context window limits.
bash
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/automate-this-XXXXXX")
chmod 700 "$WORK_DIR"
mkdir -p "$WORK_DIR/frames"
ffmpeg -y -i "<VIDEO_PATH>" -vf "fps=0.5" -q:v 2 -loglevel warning "$WORK_DIR/frames/frame_%04d.jpg"
ls "$WORK_DIR/frames/" | wc -lUse for all subsequent temp file paths in the session. The per-run directory with mode 0700 ensures extracted frames are only readable by the current user.
$WORK_DIRIf the recording is longer than 5 minutes (more than 150 frames), increase the interval to one frame every 4 seconds to stay within context limits. Tell the user you're sampling less frequently for longer recordings.
每2秒提取一帧。这样可以在覆盖流程和控制上下文窗口限制之间取得平衡。
bash
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/automate-this-XXXXXX")
chmod 700 "$WORK_DIR"
mkdir -p "$WORK_DIR/frames"
ffmpeg -y -i "<VIDEO_PATH>" -vf "fps=0.5" -q:v 2 -loglevel warning "$WORK_DIR/frames/frame_%04d.jpg"
ls "$WORK_DIR/frames/" | wc -l会话中所有后续临时文件路径均使用。每次运行生成的目录权限设为0700,确保提取的帧仅对当前用户可读。
$WORK_DIR如果录制视频时长超过5分钟(帧数量超过150),将提取间隔增加到每4秒一帧,以避免超出上下文限制。告知用户,由于视频较长,将降低采样频率。
Audio Extraction and Transcription
音频提取与转录
Check if the video has an audio track:
bash
ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5If audio exists:
bash
ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav"检查视频是否包含音轨:
bash
ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5如果存在音轨:
bash
ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav"Use whichever whisper binary is available
使用可用的whisper二进制文件
if command -v whisper >/dev/null 2>&1; then
whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/"
cat "$WORK_DIR/audio.txt"
elif command -v whisper-cpp >/dev/null 2>&1; then
whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio"
cat "$WORK_DIR/audio.txt"
else
echo "NO_WHISPER"
fi
If neither whisper binary is available and the recording has audio, inform the user they're missing narration context and ask if they want to install Whisper (`pip install openai-whisper` or `brew install whisper-cpp`) or proceed with visual-only analysis.if command -v whisper >/dev/null 2>&1; then
whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/"
cat "$WORK_DIR/audio.txt"
elif command -v whisper-cpp >/dev/null 2>&1; then
whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio"
cat "$WORK_DIR/audio.txt"
else
echo "NO_WHISPER"
fi
如果两种whisper二进制文件都不可用且视频包含音轨,告知用户缺少旁白上下文,并询问他们是否要安装Whisper(`pip install openai-whisper` 或 `brew install whisper-cpp`),还是仅进行视觉分析。Phase 2: Reconstruct the Process
第二阶段:重建流程
Analyze the extracted frames (and transcript, if available) to build a structured understanding of what the user did. Work through the frames sequentially and identify:
- Applications used — Which apps appear in the recording? (browser, terminal, Finder, mail client, spreadsheet, IDE, etc.)
- Sequence of actions — What did the user do, in order? Click-by-click, step-by-step.
- Data flow — What information moved between steps? (copied text, downloaded files, form inputs, etc.)
- Decision points — Were there moments where the user paused, checked something, or made a choice?
- Repetition patterns — Did the user do the same thing multiple times with different inputs?
- Pain points — Where did the process look slow, error-prone, or tedious? The narration often reveals this directly ("I hate this part," "this always takes forever," "I have to do this for every single one").
Present this reconstruction to the user as a numbered step list and ask them to confirm it's accurate before proposing automation. This is critical — a wrong understanding leads to useless automation.
Format:
Here's what I see you doing in this recording:
1. Open Chrome and navigate to [specific URL]
2. Log in with credentials
3. Click through to the reporting dashboard
4. Download a CSV export
5. Open the CSV in Excel
6. Filter rows where column B is "pending"
7. Copy those rows into a new spreadsheet
8. Email the new spreadsheet to [recipient]
You repeated steps 3-8 three times for different report types.
[If narration was present]: You mentioned that the export step is the slowest
part and that you do this every Monday morning.
Does this match what you were doing? Anything I got wrong or missed?Do NOT proceed to Phase 3 until the user confirms the reconstruction is accurate.
分析提取的帧(以及转录文本,如果有的话),构建对用户操作的结构化理解。按顺序分析帧,识别:
- 使用的应用程序 — 录制视频中出现了哪些应用?(浏览器、终端、Finder、邮件客户端、电子表格、IDE等)
- 操作序列 — 用户按什么顺序执行了哪些操作?逐点击、分步梳理。
- 数据流 — 步骤之间传递了哪些信息?(复制的文本、下载的文件、表单输入等)
- 决策点 — 用户是否有暂停、检查内容或做出选择的时刻?
- 重复模式 — 用户是否多次执行相同操作但输入不同?
- 痛点 — 流程中哪些步骤看起来缓慢、容易出错或繁琐?旁白通常会直接揭示这些问题(比如“我讨厌这部分”“这总是花很长时间”“我必须为每个都做这个”)。
将重建的流程以编号步骤列表的形式呈现给用户,在提出自动化方案前请他们确认内容准确。这一点至关重要——错误的理解会导致无用的自动化。
格式示例:
根据录制视频,我梳理出你的操作如下:
1. 打开Chrome并导航到[具体URL]
2. 使用凭据登录
3. 点击进入报告仪表板
4. 下载CSV导出文件
5. 在Excel中打开该CSV文件
6. 筛选B列为“pending”的行
7. 将这些行复制到新的电子表格中
8. 将新电子表格通过邮件发送给[收件人]
你针对不同的报告类型,重复执行了步骤3-8三次。
[如果有旁白]:你提到导出步骤是最慢的部分,而且你每周一早上都要做这件事。
这与你的实际操作是否一致?有没有我理解错或遗漏的地方?在用户确认重建内容准确之前,不要进入第三阶段。
Phase 3: Environment Fingerprint
第三阶段:环境识别
Before proposing automation, understand what the user actually has to work with. Run these checks:
bash
echo "=== OS ===" && uname -a
echo "=== Shell ===" && echo $SHELL
echo "=== Python ===" && { command -v python3 && python3 --version 2>&1; } || echo "not installed"
echo "=== Node ===" && { command -v node && node --version 2>&1; } || echo "not installed"
echo "=== Homebrew ===" && { command -v brew && echo "installed"; } || echo "not installed"
echo "=== Common Tools ===" && for cmd in curl jq playwright selenium osascript automator crontab; do command -v $cmd >/dev/null 2>&1 && echo "$cmd: yes" || echo "$cmd: no"; doneUse this to constrain proposals to tools the user already has. Never propose automation that requires installing five new things unless the simpler path genuinely doesn't work.
在提出自动化方案之前,先了解用户实际可用的工具。运行以下检查:
bash
echo "=== 操作系统 ===" && uname -a
echo "=== Shell ===" && echo $SHELL
echo "=== Python ===" && { command -v python3 && python3 --version 2>&1; } || echo "未安装"
echo "=== Node ===" && { command -v node && node --version 2>&1; } || echo "未安装"
echo "=== Homebrew ===" && { command -v brew && echo "已安装"; } || echo "未安装"
echo "=== 常用工具 ===" && for cmd in curl jq playwright selenium osascript automator crontab; do command -v $cmd >/dev/null 2>&1 && echo "$cmd: 是" || echo "$cmd: 否"; done根据检查结果,将方案限制在用户已安装的工具范围内。除非更简单的方案确实不可行,否则不要提出需要安装大量新工具的自动化方案。
Phase 4: Propose Automation
第四阶段:提出自动化方案
Based on the reconstructed process and the user's environment, propose automation at up to three tiers. Not every process needs three tiers — use judgment.
基于重建的流程和用户环境,提供最多三个层级的自动化方案。并非所有流程都需要三个层级,请根据实际情况判断。
Tier Structure
层级结构
Tier 1 — Quick Win (under 5 minutes to set up)
The smallest useful automation. A shell alias, a one-liner, a keyboard shortcut, an AppleScript snippet. Automates the single most painful step, not the whole process.
Tier 2 — Script (under 30 minutes to set up)
A standalone script (bash, Python, or Node — whichever the user has) that automates the full process end-to-end. Handles common errors. Can be run manually when needed.
Tier 3 — Full Automation (under 2 hours to set up)
The script from Tier 2, plus: scheduled execution (cron, launchd, or GitHub Actions), logging, error notifications, and any necessary integration scaffolding (API keys, auth tokens, etc.).
第一层级:快速实现(设置时间不到5分钟)
最小的实用自动化方案。比如Shell别名、单行命令、键盘快捷键、AppleScript片段。仅自动化最痛苦的单个步骤,而非整个流程。
第二层级:脚本实现(设置时间不到30分钟)
独立脚本(bash、Python或Node,选择用户已安装的语言),可端到端自动化整个流程。能处理常见错误,可在需要时手动运行。
第三层级:完整自动化(设置时间不到2小时)
基于第二层级的脚本,增加:定时执行(cron、launchd或GitHub Actions)、日志记录、错误通知,以及必要的集成脚手架(API密钥、认证令牌等)。
Proposal Format
方案格式
For each tier, provide:
undefined每个层级的方案需包含:
undefinedTier [N]: [Name]
第[N]层级:[方案名称]
What it automates: [Which steps from the reconstruction]
What stays manual: [Which steps still need a human]
Time savings: [Estimated time saved per run, based on the recording length and repetition count]
Prerequisites: [Anything needed that isn't already installed — ideally nothing]
How it works:
[2-3 sentence plain-English explanation]
The code:
[Complete, working, commented code — not pseudocode]
How to test it:
[Exact steps to verify it works, starting with a dry run if possible]
How to undo:
[How to reverse any changes if something goes wrong]
undefined自动化范围: [重建流程中的哪些步骤]
需手动完成的部分: [哪些步骤仍需要人工操作]
时间节省: [每次运行预计节省的时间,基于录制视频时长和重复次数]
前置条件: [除已安装工具外,还需要的内容——理想情况下无需额外安装]
工作原理:
[2-3句通俗易懂的解释]
代码:
[完整、可运行、带注释的代码——而非伪代码]
测试方法:
[验证方案有效的具体步骤,优先提供试运行选项]
回滚方法:
[如果出现问题,如何撤销所有更改]
undefinedApplication-Specific Automation Strategies
针对不同应用的自动化策略
Use these strategies based on which applications appear in the recording:
Browser-based workflows:
- First choice: Check if the website has a public API. API calls are 10x more reliable than browser automation. Search for API documentation.
- Second choice: or
curlfor simple HTTP requests with known endpoints.wget - Third choice: Playwright or Selenium for workflows that require clicking through UI. Prefer Playwright — it's faster and less flaky.
- Look for patterns: if the user is downloading the same report from a dashboard repeatedly, it's almost certainly available via API or direct URL with query parameters.
Spreadsheet and data workflows:
- Python with pandas for data filtering, transformation, and aggregation.
- If the user is doing simple column operations in Excel, a 5-line Python script replaces the entire manual process.
- for quick command-line CSV manipulation without writing code.
csvkit - If the output needs to stay in Excel format, use openpyxl.
Email workflows:
- macOS: can control Mail.app to send emails with attachments.
osascript - Cross-platform: Python for sending,
smtplibfor reading.imaplib - If the email follows a template, generate the body from a template file with variable substitution.
File management workflows:
- Shell scripts for move/copy/rename patterns.
- +
findfor batch operations.xargs - or
fswatchfor triggered-on-change automation.watchman - If the user is organizing files into folders by date or type, that's a 3-line shell script.
Terminal/CLI workflows:
- Shell aliases for frequently typed commands.
- Shell functions for multi-step sequences.
- Makefiles for project-specific task sets.
- If the user ran the same command with different arguments, that's a loop.
macOS-specific workflows:
- AppleScript/JXA for controlling native apps (Mail, Calendar, Finder, Preview, etc.).
- Shortcuts.app for simple multi-app workflows that don't need code.
- for file-based workflows.
automator - plist files for scheduled tasks (prefer over cron on macOS).
launchd
Cross-application workflows (data moves between apps):
- Identify the data transfer points. Each transfer is an automation opportunity.
- Clipboard-based transfers in the recording suggest the apps don't talk to each other — look for APIs, file-based handoffs, or direct integrations instead.
- If the user copies from App A and pastes into App B, the automation should read from A's data source and write to B's input format directly.
根据录制视频中出现的应用,使用以下策略:
基于浏览器的工作流:
- 首选方案:检查网站是否有公开API。API调用比浏览器自动化可靠10倍。搜索API文档。
- 次选方案:对于已知端点的简单HTTP请求,使用或
curl。wget - 备选方案:对于需要点击UI的工作流,使用Playwright或Selenium。优先选择Playwright——它更快且更稳定。
- 寻找模式:如果用户反复从仪表板下载相同的报告,该报告几乎肯定可以通过API或带查询参数的直接URL获取。
电子表格和数据工作流:
- 使用Python的pandas库进行数据筛选、转换和聚合。
- 如果用户在Excel中执行简单的列操作,一段5行的Python脚本即可替代整个手动流程。
- 使用在命令行快速处理CSV文件,无需编写代码。
csvkit - 如果输出需要保持Excel格式,使用openpyxl库。
邮件工作流:
- macOS系统:使用控制Mail.app发送带附件的邮件。
osascript - 跨平台:使用Python的发送邮件,
smtplib读取邮件。imaplib - 如果邮件遵循固定模板,从模板文件生成邮件正文并替换变量。
文件管理工作流:
- 使用Shell脚本处理移动/复制/重命名操作。
- 使用+
find进行批量操作。xargs - 使用或
fswatch实现触发式自动化(文件变化时执行)。watchman - 如果用户按日期或类型将文件整理到不同文件夹,一段3行的Shell脚本即可完成。
终端/CLI工作流:
- 为频繁输入的命令设置Shell别名。
- 为多步骤序列创建Shell函数。
- 为项目特定任务集编写Makefile。
- 如果用户使用不同参数执行相同命令,使用循环实现。
macOS特定工作流:
- 使用AppleScript/JXA控制原生应用(Mail、Calendar、Finder、Preview等)。
- 使用Shortcuts.app实现无需代码的简单跨应用工作流。
- 使用处理基于文件的工作流。
automator - 使用plist文件实现定时任务(在macOS上优先于cron)。
launchd
跨应用工作流(数据在应用之间传递):
- 识别数据传输点。每个传输点都是自动化的机会。
- 录制视频中基于剪贴板的传输说明应用之间无法直接通信——寻找API、文件传递或直接集成方案替代。
- 如果用户从应用A复制内容并粘贴到应用B,自动化方案应直接从应用A的数据源读取,并写入应用B的输入格式。
Making Proposals Targeted
方案针对性原则
Apply these principles to every proposal:
-
Automate the bottleneck first. The narration and timing in the recording reveal which step is actually painful. A 30-second automation of the worst step beats a 2-hour automation of the whole process.
-
Match the user's skill level. If the recording shows someone comfortable in a terminal, propose shell scripts. If it shows someone navigating GUIs, propose something with a simple trigger (double-click a script, run a Shortcut, or type one command).
-
Estimate real time savings. Count the recording duration and multiply by how often they do it. "This recording is 4 minutes. You said you do this daily. That's 17 hours per year. Tier 1 cuts it to 30 seconds each time — you get 16 hours back."
-
Handle the 80% case. The first version of the automation should cover the common path perfectly. Edge cases can be handled in Tier 3 or flagged for manual intervention.
-
Preserve human checkpoints. If the recording shows the user reviewing or approving something mid-process, keep that as a manual step. Don't automate judgment calls.
-
Propose dry runs. Every script should have a mode where it shows what it would do without doing it.flags, preview output, or confirmation prompts before destructive actions.
--dry-run -
Account for auth and secrets. If the process involves logging in or using credentials, never hardcode them. Use environment variables, keychain access (macOScommand), or prompt for them at runtime.
security -
Consider failure modes. What happens if the website is down? If the file doesn't exist? If the format changes? Good proposals mention this and handle it.
每个方案都应遵循以下原则:
-
优先自动化瓶颈步骤。录制视频中的旁白和时长会揭示哪个步骤最痛苦。一个30秒的瓶颈步骤自动化,胜过一个2小时的全流程自动化。
-
匹配用户技能水平。如果录制视频显示用户熟悉终端,建议使用Shell脚本。如果用户仅使用GUI操作,建议使用简单触发方式(双击脚本、运行Shortcut或输入单个命令)。
-
估算实际时间节省。计算录制视频的时长,并乘以执行频率。比如“这段录制时长4分钟,你说你每天都要做这件事。每年将花费17小时。第一层级方案可将每次操作时间缩短到30秒——每年能为你节省16小时。”
-
覆盖80%的常见场景。自动化的第一个版本应完美覆盖常见流程。边缘情况可在第三层级方案中处理,或标记为需要人工干预。
-
保留人工检查点。如果录制视频显示用户在流程中进行审查或批准操作,保留该步骤为手动操作。不要自动化需要判断的步骤。
-
提供试运行选项。每个脚本都应包含试运行模式,仅展示将要执行的操作而不实际执行。比如参数、预览输出或破坏性操作前的确认提示。
--dry-run -
处理认证和密钥。如果流程涉及登录或使用凭据,绝对不要硬编码。使用环境变量、钥匙串访问(macOS的命令)或运行时提示输入。
security -
考虑故障模式。如果网站宕机怎么办?文件不存在怎么办?格式变化怎么办?优秀的方案会提及这些情况并处理。
Phase 5: Build and Test
第五阶段:构建与测试
When the user picks a tier:
- Write the complete automation code to a file (suggest a sensible location — the user's project directory if one exists, or otherwise).
~/Desktop/ - Walk through a dry run or test with the user watching.
- If the test works, show how to run it for real.
- If it fails, diagnose and fix — don't give up after one attempt.
当用户选择某个层级的方案后:
- 将完整的自动化代码写入文件(建议存储在合理位置——如果用户有项目目录则放在该目录,否则放在)。
~/Desktop/ - 与用户一起进行试运行或测试。
- 如果测试通过,展示如何实际运行。
- 如果测试失败,诊断并修复——不要一次失败就放弃。
Cleanup
清理
After analysis is complete (regardless of outcome), clean up extracted frames and audio:
bash
rm -rf "$WORK_DIR"Tell the user you're cleaning up temporary files so they know nothing is left behind.
分析完成后(无论结果如何),清理提取的帧和音频:
bash
rm -rf "$WORK_DIR"告知用户你正在清理临时文件,让他们知道没有残留内容。