xlsx
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseXLSX Processing Skill
XLSX处理技能
Overview
概述
Comprehensive Excel manipulation using pandas for data analysis and openpyxl for formulas, formatting, and Excel-specific features.
借助pandas进行数据分析,同时利用openpyxl处理公式、格式设置及Excel专属特性,实现全面的Excel操作。
Quick Start
快速开始
python
import pandas as pd
from openpyxl import Workbookpython
import pandas as pd
from openpyxl import WorkbookRead with pandas
Read with pandas
df = pd.read_excel("data.xlsx")
print(df.head())
df = pd.read_excel("data.xlsx")
print(df.head())
Create with openpyxl
Create with openpyxl
wb = Workbook()
ws = wb.active
ws["A1"] = "Hello"
ws["B1"] = "World"
wb.save("output.xlsx")
undefinedwb = Workbook()
ws = wb.active
ws["A1"] = "Hello"
ws["B1"] = "World"
wb.save("output.xlsx")
undefinedWhen to Use
适用场景
- Reading and analyzing Excel data with pandas
- Creating formatted spreadsheets programmatically
- Building financial models with formulas
- Generating reports with charts and graphs
- Automating data entry and updates
- Converting between Excel and other formats
- Batch processing multiple spreadsheets
- Creating templates for repeated use
- 用pandas读取和分析Excel数据
- 以编程方式创建带格式的电子表格
- 搭建带公式的财务模型
- 生成带图表的报表
- 自动化数据录入与更新
- 实现Excel与其他格式之间的转换
- 批量处理多个电子表格
- 创建可重复使用的模板
Requirements for All Output
所有输出的要求
- Zero formula errors mandatory
- Preserve existing templates when updating
- Always use Excel formulas instead of calculating values in Python and hardcoding them
- Follow industry-standard color coding for financial models
- 必须确保公式无错误
- 更新时保留现有模板
- 始终使用Excel公式,而非在Python中计算后硬编码值
- 遵循财务建模的行业标准颜色编码
Financial Model Color Standards
财务模型颜色标准
- Blue: Input cells (hardcoded values)
- Black: Formula cells (calculated values)
- Green: Links to other worksheets
- Red: Links to external files
- 蓝色:输入单元格(硬编码值)
- 黑色:公式单元格(计算值)
- 绿色:链接至其他工作表
- 红色:链接至外部文件
Number Formatting Rules
数字格式规则
- Years displayed as text (no commas)
- Currency with appropriate units (K, M, B)
- Zeros displayed as "-"
- 年份以文本格式显示(无逗号)
- 货币使用合适的单位(K、M、B)
- 零值显示为“-”
Reading Excel Files
读取Excel文件
With Pandas
使用Pandas
python
import pandas as pdpython
import pandas as pdRead entire file
Read entire file
df = pd.read_excel("data.xlsx")
df = pd.read_excel("data.xlsx")
Read specific sheet
Read specific sheet
df = pd.read_excel("data.xlsx", sheet_name="Sheet2")
df = pd.read_excel("data.xlsx", sheet_name="Sheet2")
Read with options
Read with options
df = pd.read_excel(
"data.xlsx",
sheet_name=0,
header=0,
usecols="A:D",
skiprows=2
)
df = pd.read_excel(
"data.xlsx",
sheet_name=0,
header=0,
usecols="A:D",
skiprows=2
)
Read all sheets
Read all sheets
dfs = pd.read_excel("data.xlsx", sheet_name=None)
for sheet_name, df in dfs.items():
print(f"Sheet: {sheet_name}")
print(df.head())
undefineddfs = pd.read_excel("data.xlsx", sheet_name=None)
for sheet_name, df in dfs.items():
print(f"Sheet: {sheet_name}")
print(df.head())
undefinedWith Openpyxl
使用Openpyxl
python
from openpyxl import load_workbook
wb = load_workbook("data.xlsx")
ws = wb.activepython
from openpyxl import load_workbook
wb = load_workbook("data.xlsx")
ws = wb.activeRead cell values
Read cell values
for row in ws.iter_rows(min_row=1, max_row=10, values_only=True):
print(row)
for row in ws.iter_rows(min_row=1, max_row=10, values_only=True):
print(row)
Read specific cell
Read specific cell
value = ws["A1"].value
undefinedvalue = ws["A1"].value
undefinedCreating Excel Files
创建Excel文件
With Pandas
使用Pandas
python
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Sales': [1000, 1500, 1200],
'Region': ['East', 'West', 'North']
})python
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Sales': [1000, 1500, 1200],
'Region': ['East', 'West', 'North']
})Simple export
Simple export
df.to_excel("output.xlsx", index=False)
df.to_excel("output.xlsx", index=False)
Multiple sheets
Multiple sheets
with pd.ExcelWriter("output.xlsx") as writer:
df.to_excel(writer, sheet_name="Sales", index=False)
df.to_excel(writer, sheet_name="Backup", index=False)
undefinedwith pd.ExcelWriter("output.xlsx") as writer:
df.to_excel(writer, sheet_name="Sales", index=False)
df.to_excel(writer, sheet_name="Backup", index=False)
undefinedWith Openpyxl (Formulas)
使用Openpyxl(带公式)
python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils import get_column_letter
wb = Workbook()
ws = wb.active
ws.title = "Financial Model"python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils import get_column_letter
wb = Workbook()
ws = wb.active
ws.title = "Financial Model"Headers
Headers
headers = ["Item", "Q1", "Q2", "Q3", "Q4", "Total"]
for col, header in enumerate(headers, 1):
cell = ws.cell(row=1, column=col, value=header)
cell.font = Font(bold=True)
cell.alignment = Alignment(horizontal="center")
headers = ["Item", "Q1", "Q2", "Q3", "Q4", "Total"]
for col, header in enumerate(headers, 1):
cell = ws.cell(row=1, column=col, value=header)
cell.font = Font(bold=True)
cell.alignment = Alignment(horizontal="center")
Data with formulas
Data with formulas
data = [
("Revenue", 1000, 1200, 1100, 1400),
("Expenses", 800, 900, 850, 1000),
]
for row, (item, *values) in enumerate(data, 2):
ws.cell(row=row, column=1, value=item)
for col, value in enumerate(values, 2):
ws.cell(row=row, column=col, value=value)
# Total formula
ws.cell(row=row, column=6, value=f"=SUM(B{row}:E{row})")
data = [
("Revenue", 1000, 1200, 1100, 1400),
("Expenses", 800, 900, 850, 1000),
]
for row, (item, *values) in enumerate(data, 2):
ws.cell(row=row, column=1, value=item)
for col, value in enumerate(values, 2):
ws.cell(row=row, column=col, value=value)
# Total formula
ws.cell(row=row, column=6, value=f"=SUM(B{row}:E{row})")
Profit row with formula
Profit row with formula
ws.cell(row=4, column=1, value="Profit")
for col in range(2, 7):
col_letter = get_column_letter(col)
ws.cell(row=4, column=col, value=f"={col_letter}2-{col_letter}3")
wb.save("financial_model.xlsx")
undefinedws.cell(row=4, column=1, value="Profit")
for col in range(2, 7):
col_letter = get_column_letter(col)
ws.cell(row=4, column=col, value=f"={col_letter}2-{col_letter}3")
wb.save("financial_model.xlsx")
undefinedFormatting
格式设置
Cell Styles
单元格样式
python
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.styles.numbers import FORMAT_CURRENCY_USD_SIMPLE
wb = Workbook()
ws = wb.activepython
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment
from openpyxl.styles.numbers import FORMAT_CURRENCY_USD_SIMPLE
wb = Workbook()
ws = wb.activeFont styling
Font styling
ws["A1"] = "Styled Cell"
ws["A1"].font = Font(
name="Arial",
size=14,
bold=True,
italic=False,
color="2E74B5"
)
ws["A1"] = "Styled Cell"
ws["A1"].font = Font(
name="Arial",
size=14,
bold=True,
italic=False,
color="2E74B5"
)
Fill color
Fill color
ws["A2"] = "Blue Background"
ws["A2"].fill = PatternFill(
start_color="0000FF",
end_color="0000FF",
fill_type="solid"
)
ws["A2"] = "Blue Background"
ws["A2"].fill = PatternFill(
start_color="0000FF",
end_color="0000FF",
fill_type="solid"
)
Number format
Number format
ws["A3"] = 1234567.89
ws["A3"].number_format = FORMAT_CURRENCY_USD_SIMPLE
ws["A3"] = 1234567.89
ws["A3"].number_format = FORMAT_CURRENCY_USD_SIMPLE
Alignment
Alignment
ws["A4"] = "Centered"
ws["A4"].alignment = Alignment(
horizontal="center",
vertical="center"
)
ws["A4"] = "Centered"
ws["A4"].alignment = Alignment(
horizontal="center",
vertical="center"
)
Borders
Borders
thin_border = Border(
left=Side(style="thin"),
right=Side(style="thin"),
top=Side(style="thin"),
bottom=Side(style="thin")
)
ws["A5"].border = thin_border
wb.save("formatted.xlsx")
undefinedthin_border = Border(
left=Side(style="thin"),
right=Side(style="thin"),
top=Side(style="thin"),
bottom=Side(style="thin")
)
ws["A5"].border = thin_border
wb.save("formatted.xlsx")
undefinedColumn Width and Row Height
列宽与行高
python
from openpyxl import Workbook
wb = Workbook()
ws = wb.activepython
from openpyxl import Workbook
wb = Workbook()
ws = wb.activeSet column width
Set column width
ws.column_dimensions["A"].width = 20
ws.column_dimensions["B"].width = 15
ws.column_dimensions["A"].width = 20
ws.column_dimensions["B"].width = 15
Set row height
Set row height
ws.row_dimensions[1].height = 30
ws.row_dimensions[1].height = 30
Auto-fit (approximate)
Auto-fit (approximate)
for col in ws.columns:
max_length = 0
column = col[0].column_letter
for cell in col:
if cell.value:
max_length = max(max_length, len(str(cell.value)))
ws.column_dimensions[column].width = max_length + 2
wb.save("sized.xlsx")
undefinedfor col in ws.columns:
max_length = 0
column = col[0].column_letter
for cell in col:
if cell.value:
max_length = max(max_length, len(str(cell.value)))
ws.column_dimensions[column].width = max_length + 2
wb.save("sized.xlsx")
undefinedCharts
图表
python
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
wb = Workbook()
ws = wb.activepython
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
wb = Workbook()
ws = wb.activeData
Data
data = [
["Month", "Sales"],
["Jan", 100],
["Feb", 120],
["Mar", 140],
["Apr", 110],
]
for row in data:
ws.append(row)
data = [
["Month", "Sales"],
["Jan", 100],
["Feb", 120],
["Mar", 140],
["Apr", 110],
]
for row in data:
ws.append(row)
Create chart
Create chart
chart = BarChart()
chart.type = "col"
chart.title = "Monthly Sales"
chart.y_axis.title = "Sales ($)"
chart.x_axis.title = "Month"
data = Reference(ws, min_col=2, min_row=1, max_row=5)
cats = Reference(ws, min_col=1, min_row=2, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
chart.shape = 4 # Rectangle
ws.add_chart(chart, "D2")
wb.save("with_chart.xlsx")
undefinedchart = BarChart()
chart.type = "col"
chart.title = "Monthly Sales"
chart.y_axis.title = "Sales ($)"
chart.x_axis.title = "Month"
data = Reference(ws, min_col=2, min_row=1, max_row=5)
cats = Reference(ws, min_col=1, min_row=2, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
chart.shape = 4 # Rectangle
ws.add_chart(chart, "D2")
wb.save("with_chart.xlsx")
undefinedFormula Verification
公式验证
Check for Errors
检查错误
python
from openpyxl import load_workbook
wb = load_workbook("model.xlsx", data_only=False)
ws = wb.active
errors = ["#REF!", "#DIV/0!", "#VALUE!", "#N/A", "#NAME?", "#NULL!", "#NUM!"]
for row in ws.iter_rows():
for cell in row:
if cell.value and isinstance(cell.value, str):
for error in errors:
if error in str(cell.value):
print(f"Error {error} in cell {cell.coordinate}")python
from openpyxl import load_workbook
wb = load_workbook("model.xlsx", data_only=False)
ws = wb.active
errors = ["#REF!", "#DIV/0!", "#VALUE!", "#N/A", "#NAME?", "#NULL!", "#NUM!"]
for row in ws.iter_rows():
for cell in row:
if cell.value and isinstance(cell.value, str):
for error in errors:
if error in str(cell.value):
print(f"Error {error} in cell {cell.coordinate}")Validate Formulas
验证公式
Before saving, always:
- Check cell references are correct
- Avoid off-by-one errors
- Test edge cases (empty cells, zeros)
- Verify formula logic
保存前,务必:
- 检查单元格引用是否正确
- 避免差一错误
- 测试边缘情况(空单元格、零值)
- 验证公式逻辑
Execution Checklist
执行检查清单
- Choose appropriate tool (pandas vs openpyxl)
- Verify input file exists and is valid
- Test formulas with sample data
- Apply consistent formatting
- Validate all formulas produce correct results
- Check for Excel errors (#REF!, #DIV/0!, etc.)
- Verify charts display correctly
- Test in Excel/LibreOffice before delivery
- 选择合适的工具(pandas vs openpyxl)
- 验证输入文件存在且有效
- 用示例数据测试公式
- 应用统一的格式设置
- 验证所有公式生成正确结果
- 检查是否存在Excel错误(#REF!、#DIV/0!等)
- 验证图表显示正常
- 交付前在Excel/LibreOffice中测试
Error Handling
错误处理
Common Errors
常见错误
Error: InvalidFileException
- Cause: File is not a valid .xlsx (possibly .xls)
- Solution: Convert to .xlsx or use xlrd for .xls files
Error: Circular reference
- Cause: Formula references itself
- Solution: Review formula logic and break the cycle
Error: #REF! in formulas
- Cause: Cell reference is invalid (deleted row/column)
- Solution: Use named ranges or validate references
Error: Memory issues with large files
- Cause: Loading entire file into memory
- Solution: Use or
read_only=Truemodewrite_only=True
错误:InvalidFileException
- 原因:文件不是有效的.xlsx格式(可能是.xls)
- 解决方案:转换为.xlsx格式,或使用xlrd处理.xls文件
错误:循环引用
- 原因:公式引用自身
- 解决方案:检查公式逻辑并打破循环
错误:公式中出现#REF!
- 原因:单元格引用无效(已删除行/列)
- 解决方案:使用命名范围或验证引用
错误:处理大文件时出现内存问题
- 原因:将整个文件加载到内存中
- 解决方案:使用或
read_only=True模式write_only=True
Metrics
指标
| Metric | Typical Value |
|---|---|
| Read speed (pandas) | ~50,000 rows/second |
| Write speed (pandas) | ~30,000 rows/second |
| Formula cells | ~10,000 cells/second |
| Chart creation | ~5 charts/second |
| Memory usage | ~100MB per 100K rows |
| 指标 | 典型值 |
|---|---|
| 读取速度(pandas) | ~50,000行/秒 |
| 写入速度(pandas) | ~30,000行/秒 |
| 公式单元格处理速度 | ~10,000单元格/秒 |
| 图表创建速度 | ~5个图表/秒 |
| 内存占用 | 每10万行约100MB |
Workflow
工作流程
- Choose appropriate tool (pandas or openpyxl)
- Create or load workbook
- Modify as needed
- Save file
- For formula-based files: Run formula recalculation
- Verify and fix errors
- 选择合适的工具(pandas或openpyxl)
- 创建或加载工作簿
- 根据需要修改内容
- 保存文件
- 对于含公式的文件:运行公式重计算
- 验证并修复错误
Quick Reference
快速参考
| Task | Tool |
|---|---|
| Data analysis | pandas |
| Simple read/write | pandas |
| Formulas | openpyxl |
| Formatting | openpyxl |
| Charts | openpyxl |
| Pivot tables | Use Excel or xlwings |
| 任务 | 工具 |
|---|---|
| 数据分析 | pandas |
| 简单读写 | pandas |
| 公式处理 | openpyxl |
| 格式设置 | openpyxl |
| 图表创建 | openpyxl |
| 数据透视表 | 使用Excel或xlwings |
Dependencies
依赖项
bash
pip install pandas openpyxl xlrdOptional:
- xlwings (Windows/Mac Excel automation)
- xlsxwriter (alternative writer)
bash
pip install pandas openpyxl xlrd可选依赖:
- xlwings(Windows/Mac Excel自动化)
- xlsxwriter(替代写入工具)
Version History
版本历史
- 1.1.0 (2026-01-02): Added Quick Start, When to Use, Execution Checklist, Error Handling, Metrics sections; updated frontmatter with version, category, related_skills
- 1.0.0 (2024-10-15): Initial release with pandas, openpyxl, financial model standards
- 1.1.0(2026-01-02):新增快速开始、适用场景、执行检查清单、错误处理、指标章节;更新前置内容,添加版本、分类、相关技能信息
- 1.0.0(2024-10-15):初始版本,包含pandas、openpyxl及财务模型标准相关内容