xlsx

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Requirements for Outputs

输出要求

All Excel files

所有Excel文件要求

Zero Formula Errors

零公式错误

  • Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
  • 所有交付的Excel模型必须不存在任何公式错误(#REF!、#DIV/0!、#VALUE!、#N/A、#NAME?)

Preserve Existing Templates (when updating templates)

保留现有模板(更新模板时适用)

  • Study and EXACTLY match existing format, style, and conventions when modifying files
  • Never impose standardized formatting on files with established patterns
  • Existing template conventions ALWAYS override these guidelines
  • 修改文件时需研究并完全匹配现有格式、样式和规范
  • 不得强制给已有既定格式的文件套用标准化格式
  • 现有模板规范的优先级始终高于本指南要求

Financial models

财务模型要求

Color Coding Standards

颜色编码标准

Unless otherwise stated by the user or existing template
除非用户或现有模板另有说明,否则遵循以下规范

Industry-Standard Color Conventions

行业标准颜色规范

  • Blue text (RGB: 0,0,255): Hardcoded inputs, and numbers users will change for scenarios
  • Black text (RGB: 0,0,0): ALL formulas and calculations
  • Green text (RGB: 0,128,0): Links pulling from other worksheets within same workbook
  • Red text (RGB: 255,0,0): External links to other files
  • Yellow background (RGB: 255,255,0): Key assumptions needing attention or cells that need to be updated
  • 蓝色文字(RGB: 0,0,255):硬编码输入值,以及用户需要根据场景修改的数字
  • 黑色文字(RGB: 0,0,0):所有公式和计算逻辑
  • 绿色文字(RGB: 0,128,0):同一工作簿内其他工作表的链接引用
  • 红色文字(RGB: 255,0,0):指向其他文件的外部链接
  • 黄色背景(RGB: 255,255,0):需要注意的关键假设,或需要更新的单元格

Number Formatting Standards

数字格式标准

Required Format Rules

强制格式规则

  • Years: Format as text strings (e.g., "2024" not "2,024")
  • Currency: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
  • Zeros: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
  • Percentages: Default to 0.0% format (one decimal)
  • Multiples: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
  • Negative numbers: Use parentheses (123) not minus -123
  • 年份:格式设置为文本字符串(例如"2024"而非"2,024")
  • 货币:使用$#,##0格式;始终在表头注明单位(例如"Revenue ($mm)")
  • 零值:通过数字格式将所有零值显示为"-",百分比也适用该规则(例如格式设置为"$#,##0;($#,##0);-")
  • 百分比:默认使用0.0%格式(保留1位小数)
  • 倍数:估值倍数(EV/EBITDA、P/E)格式设置为0.0x
  • 负数:使用括号(123)表示,而非减号-123

Formula Construction Rules

公式编写规则

Assumptions Placement

假设值放置要求

  • Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
  • Use cell references instead of hardcoded values in formulas
  • Example: Use =B5*(1+$B$6) instead of =B5*1.05
  • 所有假设值(增长率、利润率、倍数等)都需放在单独的假设单元格中
  • 公式中使用单元格引用,而非硬编码数值
  • 示例:使用=B5*(1+$B$6)而非=B5*1.05

Formula Error Prevention

公式错误预防

  • Verify all cell references are correct
  • Check for off-by-one errors in ranges
  • Ensure consistent formulas across all projection periods
  • Test with edge cases (zero values, negative numbers)
  • Verify no unintended circular references
  • 验证所有单元格引用正确
  • 检查范围是否存在差一错误
  • 确保所有预测周期使用的公式一致
  • 使用边界值(零值、负数)测试
  • 确认不存在意外的循环引用

Documentation Requirements for Hardcodes

硬编码值的文档要求

  • Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
  • Examples:
    • "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
    • "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
    • "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
    • "Source: FactSet, 8/20/2025, Consensus Estimates Screen"
  • 在单元格旁添加注释(如果位于表格末尾),格式为:"来源: [系统/文档], [日期], [具体引用位置], [适用时附上URL]"
  • 示例:
    • "来源: 公司10-K报告, 2024财年, 第45页, 收入附注, [SEC EDGAR链接]"
    • "来源: 公司10-Q报告, 2025年第二季度, 附件99.1, [SEC EDGAR链接]"
    • "来源: Bloomberg Terminal, 2025/8/15, AAPL US Equity"
    • "来源: FactSet, 2025/8/20, 一致预期筛选结果"

XLSX creation, editing, and analysis

XLSX创建、编辑与分析

Overview

概述

A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
用户可能会要求你创建、编辑或分析.xlsx文件的内容,针对不同任务有不同的工具和工作流可供使用。

Important Requirements

重要要求

LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the
recalc.py
script. The script automatically configures LibreOffice on first run
重新计算公式需要使用LibreOffice:你可以默认环境已安装LibreOffice,可通过
recalc.py
脚本重新计算公式值,脚本首次运行时会自动配置LibreOffice

Reading and analyzing data

读取与分析数据

Data analysis with pandas

使用pandas进行数据分析

For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:
python
import pandas as pd
进行数据分析、可视化和基础操作时,使用pandas,它提供了强大的数据处理能力:
python
import pandas as pd

Read Excel

读取Excel

df = pd.read_excel('file.xlsx') # Default: first sheet all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
df = pd.read_excel('file.xlsx') # 默认读取第一个工作表 all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # 读取所有工作表,返回字典格式

Analyze

分析

df.head() # Preview data df.info() # Column info df.describe() # Statistics
df.head() # 预览数据 df.info() # 查看列信息 df.describe() # 生成统计指标

Write Excel

写入Excel

df.to_excel('output.xlsx', index=False)
undefined
df.to_excel('output.xlsx', index=False)
undefined

Excel File Workflows

Excel文件工作流

CRITICAL: Use Formulas, Not Hardcoded Values

关键规则:使用公式而非硬编码值

Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
始终使用Excel公式,不要在Python中计算后硬编码结果,这可以保证电子表格保持动态可更新的特性。

❌ WRONG - Hardcoding Calculated Values

❌ 错误用法:硬编码计算结果

python
undefined
python
undefined

Bad: Calculating in Python and hardcoding result

错误:在Python中计算后硬编码结果

total = df['Sales'].sum() sheet['B10'] = total # Hardcodes 5000
total = df['Sales'].sum() sheet['B10'] = total # 硬编码为5000

Bad: Computing growth rate in Python

错误:在Python中计算增长率

growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue'] sheet['C5'] = growth # Hardcodes 0.15
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue'] sheet['C5'] = growth # 硬编码为0.15

Bad: Python calculation for average

错误:在Python中计算平均值

avg = sum(values) / len(values) sheet['D20'] = avg # Hardcodes 42.5
undefined
avg = sum(values) / len(values) sheet['D20'] = avg # 硬编码为42.5
undefined

✅ CORRECT - Using Excel Formulas

✅ 正确用法:使用Excel公式

python
undefined
python
undefined

Good: Let Excel calculate the sum

正确:让Excel计算总和

sheet['B10'] = '=SUM(B2:B9)'
sheet['B10'] = '=SUM(B2:B9)'

Good: Growth rate as Excel formula

正确:增长率使用Excel公式实现

sheet['C5'] = '=(C4-C2)/C2'
sheet['C5'] = '=(C4-C2)/C2'

Good: Average using Excel function

正确:使用Excel函数计算平均值

sheet['D20'] = '=AVERAGE(D2:D19)'

This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
sheet['D20'] = '=AVERAGE(D2:D19)'

该规则适用于所有计算:总和、百分比、比率、差值等。电子表格应当能够在源数据变更时自动重新计算。

Common Workflow

通用工作流

  1. Choose tool: pandas for data, openpyxl for formulas/formatting
  2. Create/Load: Create new workbook or load existing file
  3. Modify: Add/edit data, formulas, and formatting
  4. Save: Write to file
  5. Recalculate formulas (MANDATORY IF USING FORMULAS): Use the recalc.py script
    bash
    python recalc.py output.xlsx
  6. Verify and fix any errors:
    • The script returns JSON with error details
    • If
      status
      is
      errors_found
      , check
      error_summary
      for specific error types and locations
    • Fix the identified errors and recalculate again
    • Common errors to fix:
      • #REF!
        : Invalid cell references
      • #DIV/0!
        : Division by zero
      • #VALUE!
        : Wrong data type in formula
      • #NAME?
        : Unrecognized formula name
  1. 选择工具:处理数据用pandas,处理公式/格式用openpyxl
  2. 创建/加载:新建工作簿或加载现有文件
  3. 修改:添加/编辑数据、公式和格式
  4. 保存:写入到文件
  5. 重新计算公式(使用公式时必须执行):使用recalc.py脚本
    bash
    python recalc.py output.xlsx
  6. 验证并修复所有错误
    • 脚本会返回包含错误详情的JSON
    • 如果
      status
      errors_found
      ,查看
      error_summary
      获取具体错误类型和位置
    • 修复识别到的错误后再次重新计算
    • 常见需要修复的错误:
      • #REF!
        :无效的单元格引用
      • #DIV/0!
        :除零错误
      • #VALUE!
        :公式中数据类型错误
      • #NAME?
        :无法识别的公式名称

Creating new Excel files

创建新Excel文件

python
undefined
python
undefined

Using openpyxl for formulas and formatting

使用openpyxl处理公式和格式

from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook() sheet = wb.active
from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook() sheet = wb.active

Add data

添加数据

sheet['A1'] = 'Hello' sheet['B1'] = 'World' sheet.append(['Row', 'of', 'data'])
sheet['A1'] = 'Hello' sheet['B1'] = 'World' sheet.append(['Row', 'of', 'data'])

Add formula

添加公式

sheet['B2'] = '=SUM(A1:A10)'
sheet['B2'] = '=SUM(A1:A10)'

Formatting

设置格式

sheet['A1'].font = Font(bold=True, color='FF0000') sheet['A1'].fill = PatternFill('solid', start_color='FFFF00') sheet['A1'].alignment = Alignment(horizontal='center')
sheet['A1'].font = Font(bold=True, color='FF0000') sheet['A1'].fill = PatternFill('solid', start_color='FFFF00') sheet['A1'].alignment = Alignment(horizontal='center')

Column width

设置列宽

sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
undefined
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
undefined

Editing existing Excel files

编辑现有Excel文件

python
undefined
python
undefined

Using openpyxl to preserve formulas and formatting

使用openpyxl保留公式和格式

from openpyxl import load_workbook
from openpyxl import load_workbook

Load existing file

加载现有文件

wb = load_workbook('existing.xlsx') sheet = wb.active # or wb['SheetName'] for specific sheet
wb = load_workbook('existing.xlsx') sheet = wb.active # 也可以用wb['SheetName']指定特定工作表

Working with multiple sheets

处理多个工作表

for sheet_name in wb.sheetnames: sheet = wb[sheet_name] print(f"Sheet: {sheet_name}")
for sheet_name in wb.sheetnames: sheet = wb[sheet_name] print(f"工作表: {sheet_name}")

Modify cells

修改单元格

sheet['A1'] = 'New Value' sheet.insert_rows(2) # Insert row at position 2 sheet.delete_cols(3) # Delete column 3
sheet['A1'] = 'New Value' sheet.insert_rows(2) # 在第2行位置插入行 sheet.delete_cols(3) # 删除第3列

Add new sheet

添加新工作表

new_sheet = wb.create_sheet('NewSheet') new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
undefined
new_sheet = wb.create_sheet('NewSheet') new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
undefined

Recalculating formulas

重新计算公式

Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided
recalc.py
script to recalculate formulas:
bash
python recalc.py <excel_file> [timeout_seconds]
Example:
bash
python recalc.py output.xlsx 30
The script:
  • Automatically sets up LibreOffice macro on first run
  • Recalculates all formulas in all sheets
  • Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
  • Returns JSON with detailed error locations and counts
  • Works on both Linux and macOS
通过openpyxl创建或修改的Excel文件仅包含字符串格式的公式,没有计算结果。使用提供的
recalc.py
脚本重新计算公式:
bash
python recalc.py <excel_file> [timeout_seconds]
示例:
bash
python recalc.py output.xlsx 30
该脚本:
  • 首次运行时自动设置LibreOffice宏
  • 重新计算所有工作表中的所有公式
  • 扫描所有单元格查找Excel错误(#REF!、#DIV/0!等)
  • 返回包含详细错误位置和数量的JSON
  • 兼容Linux和macOS系统

Formula Verification Checklist

公式验证检查清单

Quick checks to ensure formulas work correctly:
快速检查确保公式正常运行:

Essential Verification

基础验证

  • Test 2-3 sample references: Verify they pull correct values before building full model
  • Column mapping: Confirm Excel columns match (e.g., column 64 = BL, not BK)
  • Row offset: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)
  • 测试2-3个示例引用:在构建完整模型前验证引用能获取正确值
  • 列映射:确认Excel列匹配(例如第64列是BL而非BK)
  • 行偏移:注意Excel行从1开始索引(DataFrame第5行对应Excel第6行)

Common Pitfalls

常见陷阱

  • NaN handling: Check for null values with
    pd.notna()
  • Far-right columns: FY data often in columns 50+
  • Multiple matches: Search all occurrences, not just first
  • Division by zero: Check denominators before using
    /
    in formulas (#DIV/0!)
  • Wrong references: Verify all cell references point to intended cells (#REF!)
  • Cross-sheet references: Use correct format (Sheet1!A1) for linking sheets
  • NaN处理:使用
    pd.notna()
    检查空值
  • 最右列:财年数据通常位于第50列之后
  • 多个匹配项:搜索所有匹配结果,而非仅第一个
  • 除零错误:在公式中使用
    /
    前检查分母(避免#DIV/0!)
  • 引用错误:验证所有单元格引用指向目标单元格(避免#REF!)
  • 跨工作表引用:链接不同工作表时使用正确格式(Sheet1!A1)

Formula Testing Strategy

公式测试策略

  • Start small: Test formulas on 2-3 cells before applying broadly
  • Verify dependencies: Check all cells referenced in formulas exist
  • Test edge cases: Include zero, negative, and very large values
  • 从小规模开始:先在2-3个单元格上测试公式,再大范围应用
  • 验证依赖项:检查公式引用的所有单元格都存在
  • 测试边界值:包含零值、负数和极大值测试

Interpreting recalc.py Output

解读recalc.py输出

The script returns JSON with error details:
json
{
  "status": "success",  // or "errors_found"
  "total_errors": 0,  // Total error count
  "total_formulas": 42,  // Number of formulas in file
  "error_summary": {  // Only present if errors found
    "#REF!": {
      "count": 2,
      "locations": ["Sheet1!B5", "Sheet1!C10"]
    }
  }
}
脚本返回包含错误详情的JSON:
json
{
  "status": "success",  // 或者 "errors_found"
  "total_errors": 0,  // 总错误数
  "total_formulas": 42,  // 文件中的公式数量
  "error_summary": {  // 仅当发现错误时存在
    "#REF!": {
      "count": 2,
      "locations": ["Sheet1!B5", "Sheet1!C10"]
    }
  }
}

Best Practices

最佳实践

Library Selection

库选择

  • pandas: Best for data analysis, bulk operations, and simple data export
  • openpyxl: Best for complex formatting, formulas, and Excel-specific features
  • pandas:最适合数据分析、批量操作和简单数据导出
  • openpyxl:最适合复杂格式设置、公式处理和Excel专属功能

Working with openpyxl

使用openpyxl的注意事项

  • Cell indices are 1-based (row=1, column=1 refers to cell A1)
  • Use
    data_only=True
    to read calculated values:
    load_workbook('file.xlsx', data_only=True)
  • Warning: If opened with
    data_only=True
    and saved, formulas are replaced with values and permanently lost
  • For large files: Use
    read_only=True
    for reading or
    write_only=True
    for writing
  • Formulas are preserved but not evaluated - use recalc.py to update values
  • 单元格索引从1开始(row=1, column=1对应单元格A1)
  • 使用
    data_only=True
    读取计算结果:
    load_workbook('file.xlsx', data_only=True)
  • 警告:如果使用
    data_only=True
    打开文件并保存,公式会被替换为值并永久丢失
  • 处理大文件:读取时使用
    read_only=True
    ,写入时使用
    write_only=True
  • 公式会被保留但不会被计算,使用recalc.py更新值

Working with pandas

使用pandas的注意事项

  • Specify data types to avoid inference issues:
    pd.read_excel('file.xlsx', dtype={'id': str})
  • For large files, read specific columns:
    pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
  • Handle dates properly:
    pd.read_excel('file.xlsx', parse_dates=['date_column'])
  • 指定数据类型避免自动推断错误:
    pd.read_excel('file.xlsx', dtype={'id': str})
  • 处理大文件时读取指定列:
    pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
  • 正确处理日期:
    pd.read_excel('file.xlsx', parse_dates=['date_column'])

Code Style Guidelines

代码风格指南

IMPORTANT: When generating Python code for Excel operations:
  • Write minimal, concise Python code without unnecessary comments
  • Avoid verbose variable names and redundant operations
  • Avoid unnecessary print statements
For Excel files themselves:
  • Add comments to cells with complex formulas or important assumptions
  • Document data sources for hardcoded values
  • Include notes for key calculations and model sections
重要提示:生成Excel操作的Python代码时:
  • 编写简洁的Python代码,不要添加不必要的注释
  • 避免过长的变量名和冗余操作
  • 避免不必要的print语句
针对Excel文件本身的要求
  • 给包含复杂公式或重要假设的单元格添加注释
  • 为硬编码值标注数据来源
  • 为关键计算和模型模块添加说明