macpilot-screenshot-ocr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMacPilot Screenshot & OCR
MacPilot 截图与OCR
Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple's built-in Vision OCR.
使用MacPilot捕获整个屏幕、特定区域或应用窗口的截图,并借助苹果内置的Vision OCR从图像或屏幕区域中提取文本。
When to Use
适用场景
Use this skill when:
- You need to capture what's currently on screen
- You need to extract text from an image file
- You need to read text from a specific area of the screen
- You need to capture a specific app window
- You need to verify visual state of an application
- You need to capture screen recordings
在以下场景中使用该功能:
- 需要捕获当前屏幕内容
- 需要从图像文件中提取文本
- 需要读取屏幕特定区域的文本
- 需要捕获特定应用窗口
- 需要验证应用的视觉状态
- 需要录制屏幕
Screenshot Commands
截图命令
Full Screen
全屏截图
bash
macpilot screenshot --json # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json # Capture to specific path
macpilot screenshot --with-permissions --json # Use CGWindowListCreateImage directlybash
macpilot screenshot --json # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json # Capture to specific path
macpilot screenshot --with-permissions --json # Use CGWindowListCreateImage directlySpecific Region
特定区域截图
bash
macpilot screenshot --region 100,200,800,600 --jsonbash
macpilot screenshot --region 100,200,800,600 --jsonRegion format: x,y,width,height (from top-left corner)
Region format: x,y,width,height (from top-left corner)
undefinedundefinedSpecific Window
特定窗口截图
bash
macpilot screenshot --window "Safari" --json # Capture Safari window
macpilot screenshot --window "Finder" --json # Capture Finder windowbash
macpilot screenshot --window "Safari" --json # Capture Safari window
macpilot screenshot --window "Finder" --json # Capture Finder windowAll Windows
所有窗口截图
bash
macpilot screenshot --all-windows --json # Each window separatelybash
macpilot screenshot --all-windows --json # Each window separatelySpecific Display
特定显示器截图
bash
macpilot screenshot --display 1 --json # Second display (0-indexed)bash
macpilot screenshot --display 1 --json # Second display (0-indexed)Format Options
格式选项
bash
macpilot screenshot --format png ~/Desktop/shot.png # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg # JPEG (smaller files)bash
macpilot screenshot --format png ~/Desktop/shot.png # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg # JPEG (smaller files)OCR Commands
OCR命令
Extract Text from Image File
从图像文件提取文本
bash
macpilot ocr scan /path/to/image.png --json
macpilot ocr scan ~/Desktop/screenshot.png --jsonbash
macpilot ocr scan /path/to/image.png --json
macpilot ocr scan ~/Desktop/screenshot.png --jsonExtract Text from Screen Region
从屏幕区域提取文本
bash
macpilot ocr scan 100 200 800 600 --jsonbash
macpilot ocr scan 100 200 800 600 --jsonArguments: x y width height (captures region then OCRs it)
Arguments: x y width height (captures region then OCRs it)
undefinedundefinedMulti-Language OCR
多语言OCR
bash
macpilot ocr scan image.png --language en-US --json # English
macpilot ocr scan image.png --language ja --json # Japanese
macpilot ocr scan image.png --language zh-Hans --json # Simplified Chinese
macpilot ocr scan image.png --language de --json # German
macpilot ocr scan image.png --language fr --json # Frenchbash
macpilot ocr scan image.png --language en-US --json # English
macpilot ocr scan image.png --language ja --json # Japanese
macpilot ocr scan image.png --language zh-Hans --json # Simplified Chinese
macpilot ocr scan image.png --language de --json # German
macpilot ocr scan image.png --language fr --json # FrenchOCR Click (Find and Click Text on Screen)
OCR点击(在屏幕上查找并点击文本)
bash
macpilot ocr click "Submit" --json # Find text on screen and click it
macpilot ocr click "OK" --app Finder --json # Click text in specific app
macpilot ocr click "Accept" --timeout 10 --json # Retry until text appears (10s)OCR click takes a screenshot, runs OCR, finds the matching text (case-insensitive), and clicks at its center coordinates. Use to poll and retry when waiting for text to appear.
--timeoutbash
macpilot ocr click "Submit" --json # Find text on screen and click it
macpilot ocr click "OK" --app Finder --json # Click text in specific app
macpilot ocr click "Accept" --timeout 10 --json # Retry until text appears (10s)OCR点击功能会先捕获屏幕截图,运行OCR,找到匹配的文本(不区分大小写),然后点击其中心坐标。使用参数可在等待文本出现时进行轮询重试。
--timeoutScreen Recording (ScreenCaptureKit)
屏幕录制(ScreenCaptureKit)
Start Recording
开始录制
bash
macpilot screen record start --output ~/Desktop/recording.mov --json
macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json # Region
macpilot screen record start --output rec.mov --window Safari --json # Window
macpilot screen record start --output rec.mov --display 1 --json # Display
macpilot screen record start --output rec.mov --audio --json # With audio
macpilot screen record start --output rec.mov --quality high --fps 60 --json # Qualitybash
macpilot screen record start --output ~/Desktop/recording.mov --json
macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json # Region
macpilot screen record start --output rec.mov --window Safari --json # Window
macpilot screen record start --output rec.mov --display 1 --json # Display
macpilot screen record start --output rec.mov --audio --json # With audio
macpilot screen record start --output rec.mov --quality high --fps 60 --json # QualityControl Recording
控制录制
bash
macpilot screen record stop --json # Stop and save
macpilot screen record status --json # Check if recording
macpilot screen record pause --json # Pause recording
macpilot screen record resume --json # Resume recordingQuality options: (1 Mbps), (5 Mbps, default), (10 Mbps). FPS default: 30.
lowmediumhighbash
macpilot screen record stop --json # Stop and save
macpilot screen record status --json # Check if recording
macpilot screen record pause --json # Pause recording
macpilot screen record resume --json # Resume recording质量选项:(1 Mbps)、(5 Mbps,默认)、(10 Mbps)。默认帧率:30。
lowmediumhighDisplay Information
显示器信息
bash
macpilot display-info --jsonbash
macpilot display-info --jsonReturns: all displays with resolution, position, scale factor
Returns: all displays with resolution, position, scale factor
undefinedundefinedWorkflow Patterns
工作流模式
Capture and OCR in One Flow
一次完成截图与OCR
bash
undefinedbash
undefinedTake screenshot of specific region
Take screenshot of specific region
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
Extract text from it
Extract text from it
macpilot ocr scan ~/tmp/capture.png --json
undefinedmacpilot ocr scan ~/tmp/capture.png --json
undefinedQuick Screen Region OCR
快速屏幕区域OCR
bash
undefinedbash
undefinedDirectly OCR a screen region without saving
Directly OCR a screen region without saving
macpilot ocr scan 200 100 600 400 --json
undefinedmacpilot ocr scan 200 100 600 400 --json
undefinedFind and Click Text (No Coordinate Math)
查找并点击文本(无需计算坐标)
bash
undefinedbash
undefinedInstead of screenshot > OCR > parse > click, just:
Instead of screenshot > OCR > parse > click, just:
macpilot ocr click "Submit" --json
macpilot ocr click "Next" --timeout 5 --json # Wait up to 5s for text to appear
undefinedmacpilot ocr click "Submit" --json
macpilot ocr click "Next" --timeout 5 --json # Wait up to 5s for text to appear
undefinedVerify UI State
验证UI状态
bash
undefinedbash
undefinedScreenshot a window to see its current state
Screenshot a window to see its current state
macpilot screenshot --window "Safari" ~/tmp/safari.png --json
macpilot screenshot --window "Safari" ~/tmp/safari.png --json
Read the image to verify content
Read the image to verify content
macpilot ocr scan ~/tmp/safari.png --json
undefinedmacpilot ocr scan ~/tmp/safari.png --json
undefinedRecord an Automation
录制自动化操作
bash
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stopbash
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stopTips
提示
- Screen Recording permission must be granted to MacPilot.app in System Settings
- PNG format is best for screenshots with text (lossless); JPEG for photos
- OCR works best on high-contrast text; increase screenshot region size if text is small
- Use to get screen dimensions before capturing specific regions
display-info - The coordinate system starts at top-left (0,0) with x increasing right and y increasing down
- On Retina displays, coordinates are in logical points (not physical pixels)
- 必须在系统设置中为MacPilot.app授予屏幕录制权限
- PNG格式最适合含文本的截图(无损);JPEG适合照片
- OCR在高对比度文本上效果最佳;如果文本较小,可增大截图区域
- 在捕获特定区域前,使用获取屏幕尺寸
display-info - 坐标系从左上角(0,0)开始,x轴向右递增,y轴向下递增
- 在Retina显示器上,坐标以逻辑点为单位(而非物理像素)