xlsx

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

XLSX Processing Skill

XLSX处理技能

Overview

概述

Comprehensive Excel manipulation using pandas for data analysis and openpyxl for formulas, formatting, and Excel-specific features.
借助pandas进行数据分析,同时利用openpyxl处理公式、格式设置及Excel专属特性,实现全面的Excel操作。

Quick Start

快速开始

python
import pandas as pd
from openpyxl import Workbook
python
import pandas as pd
from openpyxl import Workbook

Read with pandas

Read with pandas

df = pd.read_excel("data.xlsx") print(df.head())
df = pd.read_excel("data.xlsx") print(df.head())

Create with openpyxl

Create with openpyxl

wb = Workbook() ws = wb.active ws["A1"] = "Hello" ws["B1"] = "World" wb.save("output.xlsx")
undefined
wb = Workbook() ws = wb.active ws["A1"] = "Hello" ws["B1"] = "World" wb.save("output.xlsx")
undefined

When to Use

适用场景

  • Reading and analyzing Excel data with pandas
  • Creating formatted spreadsheets programmatically
  • Building financial models with formulas
  • Generating reports with charts and graphs
  • Automating data entry and updates
  • Converting between Excel and other formats
  • Batch processing multiple spreadsheets
  • Creating templates for repeated use
  • 用pandas读取和分析Excel数据
  • 以编程方式创建带格式的电子表格
  • 搭建带公式的财务模型
  • 生成带图表的报表
  • 自动化数据录入与更新
  • 实现Excel与其他格式之间的转换
  • 批量处理多个电子表格
  • 创建可重复使用的模板

Requirements for All Output

所有输出的要求

  • Zero formula errors mandatory
  • Preserve existing templates when updating
  • Always use Excel formulas instead of calculating values in Python and hardcoding them
  • Follow industry-standard color coding for financial models
  • 必须确保公式无错误
  • 更新时保留现有模板
  • 始终使用Excel公式,而非在Python中计算后硬编码值
  • 遵循财务建模的行业标准颜色编码

Financial Model Color Standards

财务模型颜色标准

  • Blue: Input cells (hardcoded values)
  • Black: Formula cells (calculated values)
  • Green: Links to other worksheets
  • Red: Links to external files
  • 蓝色:输入单元格(硬编码值)
  • 黑色:公式单元格(计算值)
  • 绿色:链接至其他工作表
  • 红色:链接至外部文件

Number Formatting Rules

数字格式规则

  • Years displayed as text (no commas)
  • Currency with appropriate units (K, M, B)
  • Zeros displayed as "-"
  • 年份以文本格式显示(无逗号)
  • 货币使用合适的单位(K、M、B)
  • 零值显示为“-”

Reading Excel Files

读取Excel文件

With Pandas

使用Pandas

python
import pandas as pd
python
import pandas as pd

Read entire file

Read entire file

df = pd.read_excel("data.xlsx")
df = pd.read_excel("data.xlsx")

Read specific sheet

Read specific sheet

df = pd.read_excel("data.xlsx", sheet_name="Sheet2")
df = pd.read_excel("data.xlsx", sheet_name="Sheet2")

Read with options

Read with options

df = pd.read_excel( "data.xlsx", sheet_name=0, header=0, usecols="A:D", skiprows=2 )
df = pd.read_excel( "data.xlsx", sheet_name=0, header=0, usecols="A:D", skiprows=2 )

Read all sheets

Read all sheets

dfs = pd.read_excel("data.xlsx", sheet_name=None) for sheet_name, df in dfs.items(): print(f"Sheet: {sheet_name}") print(df.head())
undefined
dfs = pd.read_excel("data.xlsx", sheet_name=None) for sheet_name, df in dfs.items(): print(f"Sheet: {sheet_name}") print(df.head())
undefined

With Openpyxl

使用Openpyxl

python
from openpyxl import load_workbook

wb = load_workbook("data.xlsx")
ws = wb.active
python
from openpyxl import load_workbook

wb = load_workbook("data.xlsx")
ws = wb.active

Read cell values

Read cell values

for row in ws.iter_rows(min_row=1, max_row=10, values_only=True): print(row)
for row in ws.iter_rows(min_row=1, max_row=10, values_only=True): print(row)

Read specific cell

Read specific cell

value = ws["A1"].value
undefined
value = ws["A1"].value
undefined

Creating Excel Files

创建Excel文件

With Pandas

使用Pandas

python
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Sales': [1000, 1500, 1200],
    'Region': ['East', 'West', 'North']
})
python
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Sales': [1000, 1500, 1200],
    'Region': ['East', 'West', 'North']
})

Simple export

Simple export

df.to_excel("output.xlsx", index=False)
df.to_excel("output.xlsx", index=False)

Multiple sheets

Multiple sheets

with pd.ExcelWriter("output.xlsx") as writer: df.to_excel(writer, sheet_name="Sales", index=False) df.to_excel(writer, sheet_name="Backup", index=False)
undefined
with pd.ExcelWriter("output.xlsx") as writer: df.to_excel(writer, sheet_name="Sales", index=False) df.to_excel(writer, sheet_name="Backup", index=False)
undefined

With Openpyxl (Formulas)

使用Openpyxl(带公式)

python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils import get_column_letter

wb = Workbook()
ws = wb.active
ws.title = "Financial Model"
python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils import get_column_letter

wb = Workbook()
ws = wb.active
ws.title = "Financial Model"

Headers

Headers

headers = ["Item", "Q1", "Q2", "Q3", "Q4", "Total"] for col, header in enumerate(headers, 1): cell = ws.cell(row=1, column=col, value=header) cell.font = Font(bold=True) cell.alignment = Alignment(horizontal="center")
headers = ["Item", "Q1", "Q2", "Q3", "Q4", "Total"] for col, header in enumerate(headers, 1): cell = ws.cell(row=1, column=col, value=header) cell.font = Font(bold=True) cell.alignment = Alignment(horizontal="center")

Data with formulas

Data with formulas

data = [ ("Revenue", 1000, 1200, 1100, 1400), ("Expenses", 800, 900, 850, 1000), ]
for row, (item, *values) in enumerate(data, 2): ws.cell(row=row, column=1, value=item) for col, value in enumerate(values, 2): ws.cell(row=row, column=col, value=value) # Total formula ws.cell(row=row, column=6, value=f"=SUM(B{row}:E{row})")
data = [ ("Revenue", 1000, 1200, 1100, 1400), ("Expenses", 800, 900, 850, 1000), ]
for row, (item, *values) in enumerate(data, 2): ws.cell(row=row, column=1, value=item) for col, value in enumerate(values, 2): ws.cell(row=row, column=col, value=value) # Total formula ws.cell(row=row, column=6, value=f"=SUM(B{row}:E{row})")

Profit row with formula

Profit row with formula

ws.cell(row=4, column=1, value="Profit") for col in range(2, 7): col_letter = get_column_letter(col) ws.cell(row=4, column=col, value=f"={col_letter}2-{col_letter}3")
wb.save("financial_model.xlsx")
undefined
ws.cell(row=4, column=1, value="Profit") for col in range(2, 7): col_letter = get_column_letter(col) ws.cell(row=4, column=col, value=f"={col_letter}2-{col_letter}3")
wb.save("financial_model.xlsx")
undefined

Formatting

格式设置

Cell Styles

单元格样式

python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.styles.numbers import FORMAT_CURRENCY_USD_SIMPLE

wb = Workbook()
ws = wb.active
python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.styles.numbers import FORMAT_CURRENCY_USD_SIMPLE

wb = Workbook()
ws = wb.active

Font styling

Font styling

ws["A1"] = "Styled Cell" ws["A1"].font = Font( name="Arial", size=14, bold=True, italic=False, color="2E74B5" )
ws["A1"] = "Styled Cell" ws["A1"].font = Font( name="Arial", size=14, bold=True, italic=False, color="2E74B5" )

Fill color

Fill color

ws["A2"] = "Blue Background" ws["A2"].fill = PatternFill( start_color="0000FF", end_color="0000FF", fill_type="solid" )
ws["A2"] = "Blue Background" ws["A2"].fill = PatternFill( start_color="0000FF", end_color="0000FF", fill_type="solid" )

Number format

Number format

ws["A3"] = 1234567.89 ws["A3"].number_format = FORMAT_CURRENCY_USD_SIMPLE
ws["A3"] = 1234567.89 ws["A3"].number_format = FORMAT_CURRENCY_USD_SIMPLE

Alignment

Alignment

ws["A4"] = "Centered" ws["A4"].alignment = Alignment( horizontal="center", vertical="center" )
ws["A4"] = "Centered" ws["A4"].alignment = Alignment( horizontal="center", vertical="center" )

Borders

Borders

thin_border = Border( left=Side(style="thin"), right=Side(style="thin"), top=Side(style="thin"), bottom=Side(style="thin") ) ws["A5"].border = thin_border
wb.save("formatted.xlsx")
undefined
thin_border = Border( left=Side(style="thin"), right=Side(style="thin"), top=Side(style="thin"), bottom=Side(style="thin") ) ws["A5"].border = thin_border
wb.save("formatted.xlsx")
undefined

Column Width and Row Height

列宽与行高

python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active
python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

Set column width

Set column width

ws.column_dimensions["A"].width = 20 ws.column_dimensions["B"].width = 15
ws.column_dimensions["A"].width = 20 ws.column_dimensions["B"].width = 15

Set row height

Set row height

ws.row_dimensions[1].height = 30
ws.row_dimensions[1].height = 30

Auto-fit (approximate)

Auto-fit (approximate)

for col in ws.columns: max_length = 0 column = col[0].column_letter for cell in col: if cell.value: max_length = max(max_length, len(str(cell.value))) ws.column_dimensions[column].width = max_length + 2
wb.save("sized.xlsx")
undefined
for col in ws.columns: max_length = 0 column = col[0].column_letter for cell in col: if cell.value: max_length = max(max_length, len(str(cell.value))) ws.column_dimensions[column].width = max_length + 2
wb.save("sized.xlsx")
undefined

Charts

图表

python
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference

wb = Workbook()
ws = wb.active
python
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference

wb = Workbook()
ws = wb.active

Data

Data

data = [ ["Month", "Sales"], ["Jan", 100], ["Feb", 120], ["Mar", 140], ["Apr", 110], ] for row in data: ws.append(row)
data = [ ["Month", "Sales"], ["Jan", 100], ["Feb", 120], ["Mar", 140], ["Apr", 110], ] for row in data: ws.append(row)

Create chart

Create chart

chart = BarChart() chart.type = "col" chart.title = "Monthly Sales" chart.y_axis.title = "Sales ($)" chart.x_axis.title = "Month"
data = Reference(ws, min_col=2, min_row=1, max_row=5) cats = Reference(ws, min_col=1, min_row=2, max_row=5) chart.add_data(data, titles_from_data=True) chart.set_categories(cats) chart.shape = 4 # Rectangle
ws.add_chart(chart, "D2") wb.save("with_chart.xlsx")
undefined
chart = BarChart() chart.type = "col" chart.title = "Monthly Sales" chart.y_axis.title = "Sales ($)" chart.x_axis.title = "Month"
data = Reference(ws, min_col=2, min_row=1, max_row=5) cats = Reference(ws, min_col=1, min_row=2, max_row=5) chart.add_data(data, titles_from_data=True) chart.set_categories(cats) chart.shape = 4 # Rectangle
ws.add_chart(chart, "D2") wb.save("with_chart.xlsx")
undefined

Formula Verification

公式验证

Check for Errors

检查错误

python
from openpyxl import load_workbook

wb = load_workbook("model.xlsx", data_only=False)
ws = wb.active

errors = ["#REF!", "#DIV/0!", "#VALUE!", "#N/A", "#NAME?", "#NULL!", "#NUM!"]

for row in ws.iter_rows():
    for cell in row:
        if cell.value and isinstance(cell.value, str):
            for error in errors:
                if error in str(cell.value):
                    print(f"Error {error} in cell {cell.coordinate}")
python
from openpyxl import load_workbook

wb = load_workbook("model.xlsx", data_only=False)
ws = wb.active

errors = ["#REF!", "#DIV/0!", "#VALUE!", "#N/A", "#NAME?", "#NULL!", "#NUM!"]

for row in ws.iter_rows():
    for cell in row:
        if cell.value and isinstance(cell.value, str):
            for error in errors:
                if error in str(cell.value):
                    print(f"Error {error} in cell {cell.coordinate}")

Validate Formulas

验证公式

Before saving, always:
  1. Check cell references are correct
  2. Avoid off-by-one errors
  3. Test edge cases (empty cells, zeros)
  4. Verify formula logic
保存前,务必:
  1. 检查单元格引用是否正确
  2. 避免差一错误
  3. 测试边缘情况(空单元格、零值)
  4. 验证公式逻辑

Execution Checklist

执行检查清单

  • Choose appropriate tool (pandas vs openpyxl)
  • Verify input file exists and is valid
  • Test formulas with sample data
  • Apply consistent formatting
  • Validate all formulas produce correct results
  • Check for Excel errors (#REF!, #DIV/0!, etc.)
  • Verify charts display correctly
  • Test in Excel/LibreOffice before delivery
  • 选择合适的工具(pandas vs openpyxl)
  • 验证输入文件存在且有效
  • 用示例数据测试公式
  • 应用统一的格式设置
  • 验证所有公式生成正确结果
  • 检查是否存在Excel错误(#REF!、#DIV/0!等)
  • 验证图表显示正常
  • 交付前在Excel/LibreOffice中测试

Error Handling

错误处理

Common Errors

常见错误

Error: InvalidFileException
  • Cause: File is not a valid .xlsx (possibly .xls)
  • Solution: Convert to .xlsx or use xlrd for .xls files
Error: Circular reference
  • Cause: Formula references itself
  • Solution: Review formula logic and break the cycle
Error: #REF! in formulas
  • Cause: Cell reference is invalid (deleted row/column)
  • Solution: Use named ranges or validate references
Error: Memory issues with large files
  • Cause: Loading entire file into memory
  • Solution: Use
    read_only=True
    or
    write_only=True
    mode
错误:InvalidFileException
  • 原因:文件不是有效的.xlsx格式(可能是.xls)
  • 解决方案:转换为.xlsx格式,或使用xlrd处理.xls文件
错误:循环引用
  • 原因:公式引用自身
  • 解决方案:检查公式逻辑并打破循环
错误:公式中出现#REF!
  • 原因:单元格引用无效(已删除行/列)
  • 解决方案:使用命名范围或验证引用
错误:处理大文件时出现内存问题
  • 原因:将整个文件加载到内存中
  • 解决方案:使用
    read_only=True
    write_only=True
    模式

Metrics

指标

MetricTypical Value
Read speed (pandas)~50,000 rows/second
Write speed (pandas)~30,000 rows/second
Formula cells~10,000 cells/second
Chart creation~5 charts/second
Memory usage~100MB per 100K rows
指标典型值
读取速度(pandas)~50,000行/秒
写入速度(pandas)~30,000行/秒
公式单元格处理速度~10,000单元格/秒
图表创建速度~5个图表/秒
内存占用每10万行约100MB

Workflow

工作流程

  1. Choose appropriate tool (pandas or openpyxl)
  2. Create or load workbook
  3. Modify as needed
  4. Save file
  5. For formula-based files: Run formula recalculation
  6. Verify and fix errors
  1. 选择合适的工具(pandas或openpyxl)
  2. 创建或加载工作簿
  3. 根据需要修改内容
  4. 保存文件
  5. 对于含公式的文件:运行公式重计算
  6. 验证并修复错误

Quick Reference

快速参考

TaskTool
Data analysispandas
Simple read/writepandas
Formulasopenpyxl
Formattingopenpyxl
Chartsopenpyxl
Pivot tablesUse Excel or xlwings
任务工具
数据分析pandas
简单读写pandas
公式处理openpyxl
格式设置openpyxl
图表创建openpyxl
数据透视表使用Excel或xlwings

Dependencies

依赖项

bash
pip install pandas openpyxl xlrd
Optional:
  • xlwings (Windows/Mac Excel automation)
  • xlsxwriter (alternative writer)

bash
pip install pandas openpyxl xlrd
可选依赖:
  • xlwings(Windows/Mac Excel自动化)
  • xlsxwriter(替代写入工具)

Version History

版本历史

  • 1.1.0 (2026-01-02): Added Quick Start, When to Use, Execution Checklist, Error Handling, Metrics sections; updated frontmatter with version, category, related_skills
  • 1.0.0 (2024-10-15): Initial release with pandas, openpyxl, financial model standards
  • 1.1.0(2026-01-02):新增快速开始、适用场景、执行检查清单、错误处理、指标章节;更新前置内容,添加版本、分类、相关技能信息
  • 1.0.0(2024-10-15):初始版本,包含pandas、openpyxl及财务模型标准相关内容