excel-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Excel Automation

Excel自动化

Create professional Excel files, parse complex financial models, and control Excel on macOS.
创建专业格式的Excel文件、解析复杂财务模型并在macOS上控制Excel。

Quick Start

快速开始

bash
undefined
bash
undefined

Create a formatted Excel report

Create a formatted Excel report

uv run --with openpyxl scripts/create_formatted_excel.py output.xlsx
uv run --with openpyxl scripts/create_formatted_excel.py output.xlsx

Parse a complex xlsm that openpyxl can't handle

Parse a complex xlsm that openpyxl can't handle

uv run scripts/parse_complex_excel.py model.xlsm # List sheets uv run scripts/parse_complex_excel.py model.xlsm "DCF" # Extract a sheet uv run scripts/parse_complex_excel.py model.xlsm --fix # Fix corrupted names
uv run scripts/parse_complex_excel.py model.xlsm # List sheets uv run scripts/parse_complex_excel.py model.xlsm "DCF" # Extract a sheet uv run scripts/parse_complex_excel.py model.xlsm --fix # Fix corrupted names

Control Excel via AppleScript (with timeout to prevent hangs)

Control Excel via AppleScript (with timeout to prevent hangs)

timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
undefined
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
undefined

Overview

概述

Three capabilities:
CapabilityToolWhen to Use
Create formatted Excel
openpyxl
Reports, mockups, dashboards
Parse complex xlsm/xlsx
zipfile
+
xml.etree
Financial models, VBA workbooks, >1MB files
Control Excel windowAppleScript (
osascript
)
Zoom, scroll, select cells programmatically
三大功能:
功能工具适用场景
创建格式化Excel文件
openpyxl
报表、原型、仪表盘
解析复杂xlsm/xlsx文件
zipfile
+
xml.etree
财务模型、VBA工作簿、大于1MB的文件
控制Excel窗口AppleScript (
osascript
)
程序化调整缩放、滚动、选择单元格

Tool Selection Decision Tree

工具选择决策树

Is the file simple (data export, no VBA, <1MB)?
├─ YES → openpyxl or pandas
└─ NO
   ├─ Is it .xlsm or from investment bank / >1MB?
   │   └─ YES → zipfile + xml.etree.ElementTree (stdlib)
   └─ Is it truly .xls (BIFF format)?
       └─ YES → xlrd
Signals of "complex" Excel: file >1MB,
.xlsm
extension, from investment bank/broker, contains VBA macros.
IMPORTANT: Always run
file <path>
first — extensions lie. A
.xls
file may actually be a ZIP-based xlsx.
Is the file simple (data export, no VBA, <1MB)?
├─ YES → openpyxl or pandas
└─ NO
   ├─ Is it .xlsm or from investment bank / >1MB?
   │   └─ YES → zipfile + xml.etree.ElementTree (stdlib)
   └─ Is it truly .xls (BIFF format)?
       └─ YES → xlrd
“复杂”Excel的特征:文件大小超过1MB、扩展名为
.xlsm
、来自投资银行/券商、包含VBA宏。
重要提示:始终先运行
file <path>
命令——文件扩展名不可信。
.xls
文件实际上可能是基于ZIP的xlsx格式。

Creating Excel Files (openpyxl)

创建Excel文件(openpyxl)

Professional Color Convention (Investment Banking Standard)

专业配色规范(投资银行标准)

ColorRGB CodeMeaning
Blue
0000FF
User input / assumption
Black
000000
Calculated value
Green
008000
Cross-sheet reference
White on dark blue
FFFFFF
on
4472C4
Section headers
Dark blue text
1F4E79
Title
颜色RGB代码含义
蓝色
0000FF
用户输入/假设值
黑色
000000
计算值
绿色
008000
跨工作表引用
深蓝底白字
FFFFFF
on
4472C4
章节标题
深蓝色文本
1F4E79
主标题

Core Formatting Patterns

核心格式模板

python
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
python
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment

Fonts

Fonts

BLUE_FONT = Font(color="0000FF", size=10, name="Calibri") BLACK_FONT_BOLD = Font(color="000000", size=10, name="Calibri", bold=True) GREEN_FONT = Font(color="008000", size=10, name="Calibri") HEADER_FONT = Font(color="FFFFFF", size=12, name="Calibri", bold=True)
BLUE_FONT = Font(color="0000FF", size=10, name="Calibri") BLACK_FONT_BOLD = Font(color="000000", size=10, name="Calibri", bold=True) GREEN_FONT = Font(color="008000", size=10, name="Calibri") HEADER_FONT = Font(color="FFFFFF", size=12, name="Calibri", bold=True)

Fills

Fills

DARK_BLUE_FILL = PatternFill("solid", fgColor="4472C4") LIGHT_BLUE_FILL = PatternFill("solid", fgColor="D9E1F2") INPUT_GREEN_FILL = PatternFill("solid", fgColor="E2EFDA") LIGHT_GRAY_FILL = PatternFill("solid", fgColor="F2F2F2")
DARK_BLUE_FILL = PatternFill("solid", fgColor="4472C4") LIGHT_BLUE_FILL = PatternFill("solid", fgColor="D9E1F2") INPUT_GREEN_FILL = PatternFill("solid", fgColor="E2EFDA") LIGHT_GRAY_FILL = PatternFill("solid", fgColor="F2F2F2")

Borders

Borders

THIN_BORDER = Border(bottom=Side(style="thin", color="B2B2B2")) BOTTOM_DOUBLE = Border(bottom=Side(style="double", color="000000"))
undefined
THIN_BORDER = Border(bottom=Side(style="thin", color="B2B2B2")) BOTTOM_DOUBLE = Border(bottom=Side(style="double", color="000000"))
undefined

Number Format Codes

数字格式代码

FormatCodeExample
Currency
'$#,##0'
$1,234
Currency with decimals
'$#,##0.00'
$1,234.56
Percentage
'0.0%'
12.3%
Percentage (2 decimal)
'0.00%'
12.34%
Number with commas
'#,##0'
1,234
Multiplier
'0.0x'
1.5x
格式代码示例
货币
'$#,##0'
$1,234
带小数的货币
'$#,##0.00'
$1,234.56
百分比
'0.0%'
$12.3%
两位小数的百分比
'0.00%'
$12.34%
带千分位的数字
'#,##0'
1,234
倍数
'0.0x'
1.5x

Conditional Formatting (Sensitivity Tables)

条件格式(敏感性分析表)

Red-to-green gradient for sensitivity analysis:
python
from openpyxl.formatting.rule import ColorScaleRule

rule = ColorScaleRule(
    start_type="min", start_color="F8696B",   # Red (low)
    mid_type="percentile", mid_value=50, mid_color="FFEB84",  # Yellow (mid)
    end_type="max", end_color="63BE7B"         # Green (high)
)
ws.conditional_formatting.add(f"B2:F6", rule)
用于敏感性分析的红-黄-绿渐变:
python
from openpyxl.formatting.rule import ColorScaleRule

rule = ColorScaleRule(
    start_type="min", start_color="F8696B",   # Red (low)
    mid_type="percentile", mid_value=50, mid_color="FFEB84",  # Yellow (mid)
    end_type="max", end_color="63BE7B"         # Green (high)
)
ws.conditional_formatting.add(f"B2:F6", rule)

Execution

执行命令

bash
uv run --with openpyxl scripts/create_formatted_excel.py
Full template script: See
scripts/create_formatted_excel.py
bash
uv run --with openpyxl scripts/create_formatted_excel.py
完整模板脚本:参见
scripts/create_formatted_excel.py

Parsing Complex Excel (zipfile + xml)

解析复杂Excel文件(zipfile + xml)

When openpyxl fails on complex xlsm files (corrupted DefinedNames, complex VBA), use stdlib directly.
当openpyxl无法处理复杂xlsm文件(如损坏的DefinedNames、复杂VBA)时,直接使用标准库。

XLSX Internal ZIP Structure

XLSX内部ZIP结构

file.xlsx (ZIP archive)
├── [Content_Types].xml
├── xl/
│   ├── workbook.xml          ← Sheet names + order
│   ├── sharedStrings.xml     ← All text values (lookup table)
│   ├── worksheets/
│   │   ├── sheet1.xml        ← Cell data for sheet 1
│   │   ├── sheet2.xml        ← Cell data for sheet 2
│   │   └── ...
│   └── _rels/
│       └── workbook.xml.rels ← Maps rId → sheetN.xml
└── _rels/.rels
file.xlsx (ZIP archive)
├── [Content_Types].xml
├── xl/
│   ├── workbook.xml          ← Sheet names + order
│   ├── sharedStrings.xml     ← All text values (lookup table)
│   ├── worksheets/
│   │   ├── sheet1.xml        ← Cell data for sheet 1
│   │   ├── sheet2.xml        ← Cell data for sheet 2
│   │   └── ...
│   └── _rels/
│       └── workbook.xml.rels ← Maps rId → sheetN.xml
└── _rels/.rels

Sheet Name Resolution (Two-Step)

工作表名称解析(两步法)

Sheet names in
workbook.xml
link to physical files via
_rels/workbook.xml.rels
:
python
import zipfile
import xml.etree.ElementTree as ET

MAIN_NS = 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'
REL_NS = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
RELS_NS = 'http://schemas.openxmlformats.org/package/2006/relationships'

def get_sheet_path(zf, sheet_name):
    """Resolve sheet name to physical XML file path inside ZIP."""
    # Step 1: workbook.xml → find rId for the sheet name
    wb_xml = ET.fromstring(zf.read('xl/workbook.xml'))
    sheets = wb_xml.findall(f'.//{{{MAIN_NS}}}sheet')
    rid = None
    for s in sheets:
        if s.get('name') == sheet_name:
            rid = s.get(f'{{{REL_NS}}}id')
            break
    if not rid:
        raise ValueError(f"Sheet '{sheet_name}' not found")

    # Step 2: workbook.xml.rels → map rId to file path
    rels_xml = ET.fromstring(zf.read('xl/_rels/workbook.xml.rels'))
    for rel in rels_xml.findall(f'{{{RELS_NS}}}Relationship'):
        if rel.get('Id') == rid:
            return 'xl/' + rel.get('Target')

    raise ValueError(f"No file mapping for {rid}")
工作表名称在
workbook.xml
中通过
_rels/workbook.xml.rels
关联到物理文件:
python
import zipfile
import xml.etree.ElementTree as ET

MAIN_NS = 'http://schemas.openxmlformats.org/spreadsheetml/2006/main'
REL_NS = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'
RELS_NS = 'http://schemas.openxmlformats.org/package/2006/relationships'

def get_sheet_path(zf, sheet_name):
    """Resolve sheet name to physical XML file path inside ZIP."""
    # Step 1: workbook.xml → find rId for the sheet name
    wb_xml = ET.fromstring(zf.read('xl/workbook.xml'))
    sheets = wb_xml.findall(f'.//{{{MAIN_NS}}}sheet')
    rid = None
    for s in sheets:
        if s.get('name') == sheet_name:
            rid = s.get(f'{{{REL_NS}}}id')
            break
    if not rid:
        raise ValueError(f"Sheet '{sheet_name}' not found")

    # Step 2: workbook.xml.rels → map rId to file path
    rels_xml = ET.fromstring(zf.read('xl/_rels/workbook.xml.rels'))
    for rel in rels_xml.findall(f'{{{RELS_NS}}}Relationship'):
        if rel.get('Id') == rid:
            return 'xl/' + rel.get('Target')

    raise ValueError(f"No file mapping for {rid}")

Cell Data Extraction

单元格数据提取

python
def extract_cells(zf, sheet_path):
    """Extract all cell values from a sheet XML."""
    # Build shared strings lookup
    shared = []
    try:
        ss_xml = ET.fromstring(zf.read('xl/sharedStrings.xml'))
        for si in ss_xml.findall(f'{{{MAIN_NS}}}si'):
            texts = si.itertext()
            shared.append(''.join(texts))
    except KeyError:
        pass  # No shared strings

    # Parse sheet cells
    sheet_xml = ET.fromstring(zf.read(sheet_path))
    rows = sheet_xml.findall(f'.//{{{MAIN_NS}}}row')

    data = {}
    for row in rows:
        for cell in row.findall(f'{{{MAIN_NS}}}c'):
            ref = cell.get('r')         # e.g., "A1"
            cell_type = cell.get('t')   # "s" = shared string, None = number
            val_el = cell.find(f'{{{MAIN_NS}}}v')

            if val_el is not None and val_el.text:
                if cell_type == 's':
                    data[ref] = shared[int(val_el.text)]
                else:
                    try:
                        data[ref] = float(val_el.text)
                    except ValueError:
                        data[ref] = val_el.text
    return data
python
def extract_cells(zf, sheet_path):
    """Extract all cell values from a sheet XML."""
    # Build shared strings lookup
    shared = []
    try:
        ss_xml = ET.fromstring(zf.read('xl/sharedStrings.xml'))
        for si in ss_xml.findall(f'{{{MAIN_NS}}}si'):
            texts = si.itertext()
            shared.append(''.join(texts))
    except KeyError:
        pass  # No shared strings

    # Parse sheet cells
    sheet_xml = ET.fromstring(zf.read(sheet_path))
    rows = sheet_xml.findall(f'.//{{{MAIN_NS}}}row')

    data = {}
    for row in rows:
        for cell in row.findall(f'{{{MAIN_NS}}}c'):
            ref = cell.get('r')         # e.g., "A1"
            cell_type = cell.get('t')   # "s" = shared string, None = number
            val_el = cell.find(f'{{{MAIN_NS}}}v')

            if val_el is not None and val_el.text:
                if cell_type == 's':
                    data[ref] = shared[int(val_el.text)]
                else:
                    try:
                        data[ref] = float(val_el.text)
                    except ValueError:
                        data[ref] = val_el.text
    return data

Fixing Corrupted DefinedNames

修复损坏的DefinedNames

Investment bank xlsm files often have corrupted
<definedName>
entries containing "Formula removed":
python
def fix_defined_names(zf_in_path, zf_out_path):
    """Remove corrupted DefinedNames and repackage."""
    import shutil, tempfile
    with tempfile.TemporaryDirectory() as tmp:
        tmp = Path(tmp)
        with zipfile.ZipFile(zf_in_path, 'r') as zf:
            zf.extractall(tmp)

        wb_xml_path = tmp / 'xl' / 'workbook.xml'
        tree = ET.parse(wb_xml_path)
        root = tree.getroot()

        ns = {'main': MAIN_NS}
        defined_names = root.find('.//main:definedNames', ns)
        if defined_names is not None:
            for name in list(defined_names):
                if name.text and "Formula removed" in name.text:
                    defined_names.remove(name)

        tree.write(wb_xml_path, encoding='utf-8', xml_declaration=True)

        with zipfile.ZipFile(zf_out_path, 'w', zipfile.ZIP_DEFLATED) as zf:
            for fp in tmp.rglob('*'):
                if fp.is_file():
                    zf.write(fp, fp.relative_to(tmp))
Full template script: See
scripts/parse_complex_excel.py
投资银行的xlsm文件通常包含损坏的
<definedName>
条目,内容为“Formula removed”:
python
def fix_defined_names(zf_in_path, zf_out_path):
    """Remove corrupted DefinedNames and repackage."""
    import shutil, tempfile
    with tempfile.TemporaryDirectory() as tmp:
        tmp = Path(tmp)
        with zipfile.ZipFile(zf_in_path, 'r') as zf:
            zf.extractall(tmp)

        wb_xml_path = tmp / 'xl' / 'workbook.xml'
        tree = ET.parse(wb_xml_path)
        root = tree.getroot()

        ns = {'main': MAIN_NS}
        defined_names = root.find('.//main:definedNames', ns)
        if defined_names is not None:
            for name in list(defined_names):
                if name.text and "Formula removed" in name.text:
                    defined_names.remove(name)

        tree.write(wb_xml_path, encoding='utf-8', xml_declaration=True)

        with zipfile.ZipFile(zf_out_path, 'w', zipfile.ZIP_DEFLATED) as zf:
            for fp in tmp.rglob('*'):
                if fp.is_file():
                    zf.write(fp, fp.relative_to(tmp))
完整模板脚本:参见
scripts/parse_complex_excel.py

Controlling Excel on macOS (AppleScript)

在macOS上控制Excel(AppleScript)

All commands verified on macOS with Microsoft Excel.
所有命令均已在macOS系统的Microsoft Excel上验证。

Verified Commands

已验证命令

bash
undefined
bash
undefined

Activate Excel (bring to front)

Activate Excel (bring to front)

osascript -e 'tell application "Microsoft Excel" to activate'
osascript -e 'tell application "Microsoft Excel" to activate'

Open a file

Open a file

osascript -e 'tell application "Microsoft Excel" to open POSIX file "/path/to/file.xlsx"'
osascript -e 'tell application "Microsoft Excel" to open POSIX file "/path/to/file.xlsx"'

Set zoom level (percentage)

Set zoom level (percentage)

osascript -e 'tell application "Microsoft Excel" set zoom of active window to 120 end tell'
osascript -e 'tell application "Microsoft Excel" set zoom of active window to 120 end tell'

Scroll to specific row

Scroll to specific row

osascript -e 'tell application "Microsoft Excel" set scroll row of active window to 45 end tell'
osascript -e 'tell application "Microsoft Excel" set scroll row of active window to 45 end tell'

Scroll to specific column

Scroll to specific column

osascript -e 'tell application "Microsoft Excel" set scroll column of active window to 3 end tell'
osascript -e 'tell application "Microsoft Excel" set scroll column of active window to 3 end tell'

Select a cell range

Select a cell range

osascript -e 'tell application "Microsoft Excel" select range "A1" of active sheet end tell'
osascript -e 'tell application "Microsoft Excel" select range "A1" of active sheet end tell'

Select a specific sheet by name

Select a specific sheet by name

osascript -e 'tell application "Microsoft Excel" activate object sheet "DCF" of active workbook end tell'
undefined
osascript -e 'tell application "Microsoft Excel" activate object sheet "DCF" of active workbook end tell'
undefined

Timing and Timeout

计时与超时设置

Always add
sleep 1
between AppleScript commands and subsequent operations (e.g., screenshot) to allow UI rendering.
IMPORTANT:
osascript
will hang indefinitely if Excel is not running or not responding. Always wrap with
timeout
:
bash
undefined
在AppleScript命令和后续操作(如截图)之间始终添加
sleep 1
,以等待UI渲染完成。
重要提示:如果Excel未运行或无响应,
osascript
会无限期挂起。务必用
timeout
命令包裹:
bash
undefined

Safe pattern: 5-second timeout

Safe pattern: 5-second timeout

timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'
timeout 5 osascript -e 'tell application "Microsoft Excel" to activate'

Check exit code: 124 = timed out

Check exit code: 124 = timed out

if [ $? -eq 124 ]; then echo "Excel not responding — is it running?" fi
undefined
if [ $? -eq 124 ]; then echo "Excel not responding — is it running?" fi
undefined

Common Mistakes

常见错误

MistakeCorrection
openpyxl fails on complex xlsm → try monkey-patchingSwitch to
zipfile
+
xml.etree
immediately
Count Chinese characters with
wc -c
Use
wc -m
(chars, not bytes; Chinese = 3 bytes/char)
Trust file extensionRun
file <path>
first to confirm actual format
openpyxl
load_workbook
hangs on large xlsm
Use
zipfile
for targeted extraction instead of loading entire workbook
错误操作修正方法
openpyxl处理复杂xlsm失败 → 尝试猴子补丁立即切换为
zipfile
+
xml.etree
方案
wc -c
统计中文字符数
使用
wc -m
(统计字符数而非字节数;中文字符=3字节/字符)
信任文件扩展名先运行
file <path>
命令确认实际格式
openpyxl的
load_workbook
在处理大型xlsm时挂起
使用
zipfile
进行针对性提取,而非加载整个工作簿

Important Notes

重要注意事项

  • Execute Python scripts with
    uv run --with openpyxl
    (never use system Python)
  • LibreOffice (
    soffice --headless
    ) can convert formats and recalculate formulas
  • Detailed formatting reference: See
    references/formatting-reference.md
  • 使用
    uv run --with openpyxl
    执行Python脚本(切勿使用系统Python)
  • LibreOffice (
    soffice --headless
    )可用于格式转换和公式重计算
  • 详细格式参考:参见
    references/formatting-reference.md