xlsx
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRequirements for Outputs
输出要求
All Excel files
所有Excel文件
Zero Formula Errors
零公式错误
- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
- 所有Excel模型交付时必须确保公式零错误(#REF!、#DIV/0!、#VALUE!、#N/A、#NAME?)
Preserve Existing Templates (when updating templates)
保留现有模板(更新模板时)
- Study and EXACTLY match existing format, style, and conventions when modifying files
- Never impose standardized formatting on files with established patterns
- Existing template conventions ALWAYS override these guidelines
- 修改文件时,需研究并严格匹配现有格式、样式与约定
- 切勿对已有固定格式的文件套用标准化格式
- 现有模板约定始终优先于本指南
Financial models
财务模型
Color Coding Standards
颜色编码标准
Unless otherwise stated by the user or existing template
除非用户另有说明或存在现有模板约定
Industry-Standard Color Conventions
行业标准颜色规范
- Blue text (RGB: 0,0,255): Hardcoded inputs, and numbers users will change for scenarios
- Black text (RGB: 0,0,0): ALL formulas and calculations
- Green text (RGB: 0,128,0): Links pulling from other worksheets within same workbook
- Red text (RGB: 255,0,0): External links to other files
- Yellow background (RGB: 255,255,0): Key assumptions needing attention or cells that need to be updated
- 蓝色文本(RGB: 0,0,255):手动输入值,以及用户可根据场景修改的数值
- 黑色文本(RGB: 0,0,0):所有公式与计算结果
- 绿色文本(RGB: 0,128,0):指向同一工作簿内其他工作表的链接
- 红色文本(RGB: 255,0,0):指向其他文件的外部链接
- 黄色背景(RGB: 255,255,0):需重点关注的关键假设或需更新的单元格
Number Formatting Standards
数字格式标准
Required Format Rules
必填格式规则
- Years: Format as text strings (e.g., "2024" not "2,024")
- Currency: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
- Zeros: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
- Percentages: Default to 0.0% format (one decimal)
- Multiples: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
- Negative numbers: Use parentheses (123) not minus -123
- 年份:格式化为文本字符串(例如:"2024" 而非 "2,024")
- 货币:使用$#,##0格式;必须在表头中注明单位(如"Revenue ($mm)")
- 零值:使用数字格式将所有零值显示为"-",包括百分比(例如:"$#,##0;($#,##0);-")
- 百分比:默认保留一位小数的0.0%格式
- 倍数:估值倍数(EV/EBITDA、P/E等)格式化为0.0x
- 负数:使用括号包裹(123)而非负号-123
Formula Construction Rules
公式构建规则
Assumptions Placement
假设值放置
- Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
- Use cell references instead of hardcoded values in formulas
- Example: Use =B5*(1+$B$6) instead of =B5*1.05
- 所有假设值(增长率、利润率、倍数等)需放在独立的假设单元格中
- 公式中使用单元格引用而非硬编码数值
- 示例:使用=B5*(1+$B$6) 而非=B5*1.05
Formula Error Prevention
公式错误预防
- Verify all cell references are correct
- Check for off-by-one errors in ranges
- Ensure consistent formulas across all projection periods
- Test with edge cases (zero values, negative numbers)
- Verify no unintended circular references
- 验证所有单元格引用的正确性
- 检查范围是否存在差一错误
- 确保所有预测周期的公式保持一致
- 使用边缘案例测试(零值、负数)
- 验证无意外循环引用
Documentation Requirements for Hardcodes
硬编码值的文档要求
- Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
- Examples:
- "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
- "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
- "Source: FactSet, 8/20/2025, Consensus Estimates Screen"
- 在旁边单元格添加注释(若位于表格末尾)。格式:"来源: [系统/文档], [日期], [具体引用], [适用URL]"
- 示例:
- "来源: 公司10-K报告, FY2024, 第45页, 收入附注, [SEC EDGAR URL]"
- "来源: 公司10-Q报告, 2025年第二季度, Exhibit 99.1, [SEC EDGAR URL]"
- "来源: Bloomberg Terminal, 2025/8/15, AAPL US Equity"
- "来源: FactSet, 2025/8/20, 一致预期屏幕"
XLSX creation, editing, and analysis
XLSX文件的创建、编辑与分析
Overview
概述
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
用户可能要求你创建、编辑或分析.xlsx文件内容。针对不同任务,你可使用不同工具与工作流。
Important Requirements
重要要求
LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the script. The script automatically configures LibreOffice on first run
recalc.py公式重计算需依赖LibreOffice:可假设已安装LibreOffice,用于通过脚本重新计算公式值。该脚本会在首次运行时自动配置LibreOffice
recalc.pyReading and analyzing data
数据读取与分析
Data analysis with pandas
使用pandas进行数据分析
For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:
python
import pandas as pd对于数据分析、可视化与基础操作,使用pandas,它提供强大的数据处理能力:
python
import pandas as pdRead Excel
读取Excel文件
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
df = pd.read_excel('file.xlsx') # 默认读取第一个工作表
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # 读取所有工作表为字典
Analyze
分析数据
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
df.head() # 预览数据
df.info() # 列信息
df.describe() # 统计信息
Write Excel
写入Excel文件
df.to_excel('output.xlsx', index=False)
undefineddf.to_excel('output.xlsx', index=False)
undefinedExcel File Workflows
Excel文件工作流
CRITICAL: Use Formulas, Not Hardcoded Values
关键规则:使用公式而非硬编码值
Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
始终使用Excel公式,而非在Python中计算后硬编码结果。 这样可确保电子表格保持动态可更新。
❌ WRONG - Hardcoding Calculated Values
❌ 错误示例 - 硬编码计算值
python
undefinedpython
undefinedBad: Calculating in Python and hardcoding result
错误:在Python中计算后硬编码结果
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
total = df['Sales'].sum()
sheet['B10'] = total # 硬编码5000
Bad: Computing growth rate in Python
错误:在Python中计算增长率
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # 硬编码0.15
Bad: Python calculation for average
错误:在Python中计算平均值
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5
undefinedavg = sum(values) / len(values)
sheet['D20'] = avg # 硬编码42.5
undefined✅ CORRECT - Using Excel Formulas
✅ 正确示例 - 使用Excel公式
python
undefinedpython
undefinedGood: Let Excel calculate the sum
正确:让Excel计算求和
sheet['B10'] = '=SUM(B2:B9)'
sheet['B10'] = '=SUM(B2:B9)'
Good: Growth rate as Excel formula
正确:增长率以Excel公式表示
sheet['C5'] = '=(C4-C2)/C2'
sheet['C5'] = '=(C4-C2)/C2'
Good: Average using Excel function
正确:使用Excel函数计算平均值
sheet['D20'] = '=AVERAGE(D2:D19)'
This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.sheet['D20'] = '=AVERAGE(D2:D19)'
此规则适用于所有计算 - 总计、百分比、比率、差值等。当源数据更改时,电子表格应能重新计算结果。Common Workflow
通用工作流
- Choose tool: pandas for data, openpyxl for formulas/formatting
- Create/Load: Create new workbook or load existing file
- Modify: Add/edit data, formulas, and formatting
- Save: Write to file
- Recalculate formulas (MANDATORY IF USING FORMULAS): Use the recalc.py script
bash
python recalc.py output.xlsx - Verify and fix any errors:
- The script returns JSON with error details
- If is
status, checkerrors_foundfor specific error types and locationserror_summary - Fix the identified errors and recalculate again
- Common errors to fix:
- : Invalid cell references
#REF! - : Division by zero
#DIV/0! - : Wrong data type in formula
#VALUE! - : Unrecognized formula name
#NAME?
- 选择工具:数据分析用pandas,公式/格式设置用openpyxl
- 创建/加载:创建新工作簿或加载现有文件
- 修改:添加/编辑数据、公式与格式
- 保存:写入文件
- 重新计算公式(使用公式时必须执行):使用recalc.py脚本
bash
python recalc.py output.xlsx - 验证并修复错误:
- 脚本会返回包含错误详情的JSON
- 如果为
status,查看errors_found获取具体错误类型与位置error_summary - 修复已识别的错误并重新计算
- 常见需修复的错误:
- :无效单元格引用
#REF! - :除以零
#DIV/0! - :公式中数据类型错误
#VALUE! - :无法识别的公式名称
#NAME?
Creating new Excel files
创建新Excel文件
python
undefinedpython
undefinedUsing openpyxl for formulas and formatting
使用openpyxl处理公式与格式
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
Add data
添加数据
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
Add formula
添加公式
sheet['B2'] = '=SUM(A1:A10)'
sheet['B2'] = '=SUM(A1:A10)'
Formatting
设置格式
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
Column width
设置列宽
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
undefinedsheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
undefinedEditing existing Excel files
编辑现有Excel文件
python
undefinedpython
undefinedUsing openpyxl to preserve formulas and formatting
使用openpyxl保留公式与格式
from openpyxl import load_workbook
from openpyxl import load_workbook
Load existing file
加载现有文件
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
wb = load_workbook('existing.xlsx')
sheet = wb.active # 或使用wb['SheetName']指定工作表
Working with multiple sheets
处理多个工作表
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"工作表: {sheet_name}")
Modify cells
修改单元格
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
sheet['A1'] = '新值'
sheet.insert_rows(2) # 在第2行插入新行
sheet.delete_cols(3) # 删除第3列
Add new sheet
添加新工作表
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
undefinednew_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = '数据'
wb.save('modified.xlsx')
undefinedRecalculating formulas
重新计算公式
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided script to recalculate formulas:
recalc.pybash
python recalc.py <excel_file> [timeout_seconds]Example:
bash
python recalc.py output.xlsx 30The script:
- Automatically sets up LibreOffice macro on first run
- Recalculates all formulas in all sheets
- Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
- Returns JSON with detailed error locations and counts
- Works on both Linux and macOS
使用openpyxl创建或修改的Excel文件中,公式以字符串形式存在,未计算结果。需使用提供的脚本重新计算公式:
recalc.pybash
python recalc.py <excel_file> [timeout_seconds]示例:
bash
python recalc.py output.xlsx 30该脚本功能:
- 首次运行时自动配置LibreOffice宏
- 重新计算所有工作表中的所有公式
- 扫描所有单元格查找Excel错误(#REF!、#DIV/0!等)
- 返回包含错误位置与计数详情的JSON
- 支持Linux与macOS系统
Formula Verification Checklist
公式验证检查清单
Quick checks to ensure formulas work correctly:
快速检查确保公式正常工作:
Essential Verification
核心验证项
- Test 2-3 sample references: Verify they pull correct values before building full model
- Column mapping: Confirm Excel columns match (e.g., column 64 = BL, not BK)
- Row offset: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)
- 测试2-3个示例引用:构建完整模型前,验证引用是否能获取正确值
- 列映射:确认Excel列匹配(例如:第64列=BL,而非BK)
- 行偏移:注意Excel行是从1开始索引的(DataFrame第5行对应Excel第6行)
Common Pitfalls
常见陷阱
- NaN handling: Check for null values with
pd.notna() - Far-right columns: FY data often in columns 50+
- Multiple matches: Search all occurrences, not just first
- Division by zero: Check denominators before using in formulas (#DIV/0!)
/ - Wrong references: Verify all cell references point to intended cells (#REF!)
- Cross-sheet references: Use correct format (Sheet1!A1) for linking sheets
- NaN值处理:使用检查空值
pd.notna() - 右侧列数据:财年数据通常位于第50列之后
- 多匹配项:搜索所有匹配结果,而非仅第一个
- 除以零:使用前检查分母是否为零(避免#DIV/0!)
/ - 引用错误:验证所有单元格引用指向目标单元格(避免#REF!)
- 跨工作表引用:使用正确格式(Sheet1!A1)链接工作表
Formula Testing Strategy
公式测试策略
- Start small: Test formulas on 2-3 cells before applying broadly
- Verify dependencies: Check all cells referenced in formulas exist
- Test edge cases: Include zero, negative, and very large values
- 从小规模开始:先在2-3个单元格测试公式,再批量应用
- 验证依赖关系:检查公式中引用的所有单元格是否存在
- 测试边缘案例:包含零值、负值与极大值
Interpreting recalc.py Output
解读recalc.py输出
The script returns JSON with error details:
json
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}脚本返回包含错误详情的JSON:
json
{
"status": "success", // 或"errors_found"
"total_errors": 0, // 错误总数
"total_formulas": 42, // 文件中公式数量
"error_summary": { // 仅当存在错误时显示
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}Best Practices
最佳实践
Library Selection
库选择
- pandas: Best for data analysis, bulk operations, and simple data export
- openpyxl: Best for complex formatting, formulas, and Excel-specific features
- pandas:最适合数据分析、批量操作与简单数据导出
- openpyxl:最适合复杂格式设置、公式处理与Excel专属功能
Working with openpyxl
openpyxl使用注意事项
- Cell indices are 1-based (row=1, column=1 refers to cell A1)
- Use to read calculated values:
data_only=Trueload_workbook('file.xlsx', data_only=True) - Warning: If opened with and saved, formulas are replaced with values and permanently lost
data_only=True - For large files: Use for reading or
read_only=Truefor writingwrite_only=True - Formulas are preserved but not evaluated - use recalc.py to update values
- 单元格索引从1开始(row=1, column=1对应单元格A1)
- 可使用读取计算后的值:
data_only=Trueload_workbook('file.xlsx', data_only=True) - 警告:若使用打开并保存文件,公式会被值替换且永久丢失
data_only=True - 处理大文件:读取时使用,写入时使用
read_only=Truewrite_only=True - 公式会被保留但不会自动计算 - 需使用recalc.py更新结果
Working with pandas
pandas使用注意事项
- Specify data types to avoid inference issues:
pd.read_excel('file.xlsx', dtype={'id': str}) - For large files, read specific columns:
pd.read_excel('file.xlsx', usecols=['A', 'C', 'E']) - Handle dates properly:
pd.read_excel('file.xlsx', parse_dates=['date_column'])
- 指定数据类型以避免推断错误:
pd.read_excel('file.xlsx', dtype={'id': str}) - 处理大文件时,读取指定列:
pd.read_excel('file.xlsx', usecols=['A', 'C', 'E']) - 正确处理日期:
pd.read_excel('file.xlsx', parse_dates=['date_column'])
Code Style Guidelines
代码风格指南
IMPORTANT: When generating Python code for Excel operations:
- Write minimal, concise Python code without unnecessary comments
- Avoid verbose variable names and redundant operations
- Avoid unnecessary print statements
For Excel files themselves:
- Add comments to cells with complex formulas or important assumptions
- Document data sources for hardcoded values
- Include notes for key calculations and model sections
重要提示:生成Excel操作相关Python代码时:
- 编写简洁的Python代码,避免不必要的注释
- 避免冗长的变量名与冗余操作
- 避免不必要的打印语句
对于Excel文件本身:
- 为包含复杂公式或重要假设的单元格添加注释
- 为硬编码值添加数据源说明
- 为关键计算与模型部分添加注释