macpilot-screenshot-ocr

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

MacPilot Screenshot & OCR

MacPilot 截图与OCR

Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple's built-in Vision OCR.
使用MacPilot捕获整个屏幕、特定区域或应用窗口的截图,并借助苹果内置的Vision OCR从图像或屏幕区域中提取文本。

When to Use

适用场景

Use this skill when:
  • You need to capture what's currently on screen
  • You need to extract text from an image file
  • You need to read text from a specific area of the screen
  • You need to capture a specific app window
  • You need to verify visual state of an application
  • You need to capture screen recordings
在以下场景中使用该功能:
  • 需要捕获当前屏幕内容
  • 需要从图像文件中提取文本
  • 需要读取屏幕特定区域的文本
  • 需要捕获特定应用窗口
  • 需要验证应用的视觉状态
  • 需要录制屏幕

Screenshot Commands

截图命令

Full Screen

全屏截图

bash
macpilot screenshot --json                           # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json      # Capture to specific path
macpilot screenshot --with-permissions --json        # Use CGWindowListCreateImage directly
bash
macpilot screenshot --json                           # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json      # Capture to specific path
macpilot screenshot --with-permissions --json        # Use CGWindowListCreateImage directly

Specific Region

特定区域截图

bash
macpilot screenshot --region 100,200,800,600 --json
bash
macpilot screenshot --region 100,200,800,600 --json

Region format: x,y,width,height (from top-left corner)

Region format: x,y,width,height (from top-left corner)

undefined
undefined

Specific Window

特定窗口截图

bash
macpilot screenshot --window "Safari" --json         # Capture Safari window
macpilot screenshot --window "Finder" --json         # Capture Finder window
bash
macpilot screenshot --window "Safari" --json         # Capture Safari window
macpilot screenshot --window "Finder" --json         # Capture Finder window

All Windows

所有窗口截图

bash
macpilot screenshot --all-windows --json             # Each window separately
bash
macpilot screenshot --all-windows --json             # Each window separately

Specific Display

特定显示器截图

bash
macpilot screenshot --display 1 --json               # Second display (0-indexed)
bash
macpilot screenshot --display 1 --json               # Second display (0-indexed)

Format Options

格式选项

bash
macpilot screenshot --format png ~/Desktop/shot.png  # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg  # JPEG (smaller files)
bash
macpilot screenshot --format png ~/Desktop/shot.png  # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg  # JPEG (smaller files)

OCR Commands

OCR命令

Extract Text from Image File

从图像文件提取文本

bash
macpilot ocr scan /path/to/image.png --json
macpilot ocr scan ~/Desktop/screenshot.png --json
bash
macpilot ocr scan /path/to/image.png --json
macpilot ocr scan ~/Desktop/screenshot.png --json

Extract Text from Screen Region

从屏幕区域提取文本

bash
macpilot ocr scan 100 200 800 600 --json
bash
macpilot ocr scan 100 200 800 600 --json

Arguments: x y width height (captures region then OCRs it)

Arguments: x y width height (captures region then OCRs it)

undefined
undefined

Multi-Language OCR

多语言OCR

bash
macpilot ocr scan image.png --language en-US --json       # English
macpilot ocr scan image.png --language ja --json           # Japanese
macpilot ocr scan image.png --language zh-Hans --json      # Simplified Chinese
macpilot ocr scan image.png --language de --json           # German
macpilot ocr scan image.png --language fr --json           # French
bash
macpilot ocr scan image.png --language en-US --json       # English
macpilot ocr scan image.png --language ja --json           # Japanese
macpilot ocr scan image.png --language zh-Hans --json      # Simplified Chinese
macpilot ocr scan image.png --language de --json           # German
macpilot ocr scan image.png --language fr --json           # French

OCR Click (Find and Click Text on Screen)

OCR点击(在屏幕上查找并点击文本)

bash
macpilot ocr click "Submit" --json                    # Find text on screen and click it
macpilot ocr click "OK" --app Finder --json           # Click text in specific app
macpilot ocr click "Accept" --timeout 10 --json       # Retry until text appears (10s)
OCR click takes a screenshot, runs OCR, finds the matching text (case-insensitive), and clicks at its center coordinates. Use
--timeout
to poll and retry when waiting for text to appear.
bash
macpilot ocr click "Submit" --json                    # Find text on screen and click it
macpilot ocr click "OK" --app Finder --json           # Click text in specific app
macpilot ocr click "Accept" --timeout 10 --json       # Retry until text appears (10s)
OCR点击功能会先捕获屏幕截图,运行OCR,找到匹配的文本(不区分大小写),然后点击其中心坐标。使用
--timeout
参数可在等待文本出现时进行轮询重试。

Screen Recording (ScreenCaptureKit)

屏幕录制(ScreenCaptureKit)

Start Recording

开始录制

bash
macpilot screen record start --output ~/Desktop/recording.mov --json
macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json  # Region
macpilot screen record start --output rec.mov --window Safari --json          # Window
macpilot screen record start --output rec.mov --display 1 --json              # Display
macpilot screen record start --output rec.mov --audio --json                  # With audio
macpilot screen record start --output rec.mov --quality high --fps 60 --json  # Quality
bash
macpilot screen record start --output ~/Desktop/recording.mov --json
macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json  # Region
macpilot screen record start --output rec.mov --window Safari --json          # Window
macpilot screen record start --output rec.mov --display 1 --json              # Display
macpilot screen record start --output rec.mov --audio --json                  # With audio
macpilot screen record start --output rec.mov --quality high --fps 60 --json  # Quality

Control Recording

控制录制

bash
macpilot screen record stop --json         # Stop and save
macpilot screen record status --json       # Check if recording
macpilot screen record pause --json        # Pause recording
macpilot screen record resume --json       # Resume recording
Quality options:
low
(1 Mbps),
medium
(5 Mbps, default),
high
(10 Mbps). FPS default: 30.
bash
macpilot screen record stop --json         # Stop and save
macpilot screen record status --json       # Check if recording
macpilot screen record pause --json        # Pause recording
macpilot screen record resume --json       # Resume recording
质量选项:
low
(1 Mbps)、
medium
(5 Mbps,默认)、
high
(10 Mbps)。默认帧率:30。

Display Information

显示器信息

bash
macpilot display-info --json
bash
macpilot display-info --json

Returns: all displays with resolution, position, scale factor

Returns: all displays with resolution, position, scale factor

undefined
undefined

Workflow Patterns

工作流模式

Capture and OCR in One Flow

一次完成截图与OCR

bash
undefined
bash
undefined

Take screenshot of specific region

Take screenshot of specific region

macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json

Extract text from it

Extract text from it

macpilot ocr scan ~/tmp/capture.png --json
undefined
macpilot ocr scan ~/tmp/capture.png --json
undefined

Quick Screen Region OCR

快速屏幕区域OCR

bash
undefined
bash
undefined

Directly OCR a screen region without saving

Directly OCR a screen region without saving

macpilot ocr scan 200 100 600 400 --json
undefined
macpilot ocr scan 200 100 600 400 --json
undefined

Find and Click Text (No Coordinate Math)

查找并点击文本(无需计算坐标)

bash
undefined
bash
undefined

Instead of screenshot > OCR > parse > click, just:

Instead of screenshot > OCR > parse > click, just:

macpilot ocr click "Submit" --json macpilot ocr click "Next" --timeout 5 --json # Wait up to 5s for text to appear
undefined
macpilot ocr click "Submit" --json macpilot ocr click "Next" --timeout 5 --json # Wait up to 5s for text to appear
undefined

Verify UI State

验证UI状态

bash
undefined
bash
undefined

Screenshot a window to see its current state

Screenshot a window to see its current state

macpilot screenshot --window "Safari" ~/tmp/safari.png --json
macpilot screenshot --window "Safari" ~/tmp/safari.png --json

Read the image to verify content

Read the image to verify content

macpilot ocr scan ~/tmp/safari.png --json
undefined
macpilot ocr scan ~/tmp/safari.png --json
undefined

Record an Automation

录制自动化操作

bash
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stop
bash
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stop

Tips

提示

  • Screen Recording permission must be granted to MacPilot.app in System Settings
  • PNG format is best for screenshots with text (lossless); JPEG for photos
  • OCR works best on high-contrast text; increase screenshot region size if text is small
  • Use
    display-info
    to get screen dimensions before capturing specific regions
  • The coordinate system starts at top-left (0,0) with x increasing right and y increasing down
  • On Retina displays, coordinates are in logical points (not physical pixels)
  • 必须在系统设置中为MacPilot.app授予屏幕录制权限
  • PNG格式最适合含文本的截图(无损);JPEG适合照片
  • OCR在高对比度文本上效果最佳;如果文本较小,可增大截图区域
  • 在捕获特定区域前,使用
    display-info
    获取屏幕尺寸
  • 坐标系从左上角(0,0)开始,x轴向右递增,y轴向下递增
  • 在Retina显示器上,坐标以逻辑点为单位(而非物理像素)