Output Requirements
All Excel Files
Professional Fonts
- Unless specified otherwise by the user, use a consistent professional font (e.g., Arial, Times New Roman) for all deliverables
Zero Formula Errors
- Each Excel model must be delivered with zero formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
Preserve Existing Templates (When Updating)
- When modifying files, research and fully match existing formatting, styles, and conventions
- Do not impose standardized formatting on files with established patterns
- Existing template conventions always override these guidelines
Financial Modeling
Color Coding Standards
Unless specified otherwise by the user or existing templates
Industry Standard Color Conventions
- Blue Text (RGB: 0,0,255): Hard-coded inputs and numbers users will change for scenarios
- Black Text (RGB: 0,0,0): All formulas and calculations
- Green Text (RGB: 0,128,0): Links extracted from other worksheets within the same workbook
- Red Text (RGB: 255,0,0): External links to other files
- Yellow Background (RGB: 255,255,0): Key assumptions requiring attention or cells needing updates
Number Format Standards
Mandatory Format Rules
- Years: Formatted as text strings (e.g., "2024" instead of "2,024")
- Currency: Use $#,##0 format; always specify units in headers ("Revenue (USD Millions)")
- Zeros: Use number formatting to display all zeros as "-", including percentages (e.g., "$#,##0;($#,##0);-")
- Percentages: Default to 0.0% format (one decimal place)
- Multiples: Formatted as 0.0x for valuation multiples (EV/EBITDA, P/E)
- Negative Numbers: Use parentheses (123) instead of hyphens -123
Formula Construction Rules
Assumption Placement
- Place all assumptions (growth rates, profit margins, multiples, etc.) in separate assumption cells
- Use cell references in formulas instead of hard-coded values
- Example: Use =B5*(1+$B$6) instead of =B5*1.05
Formula Error Prevention
- Verify all cell references are correct
- Check for off-by-one errors in ranges
- Ensure formulas are consistent across all forecast periods
- Test with edge cases (zero values, negative numbers)
- Verify there are no accidental circular references
Documentation Requirements for Hard-Coded Values
- Add comments next to cells (if at the end of a table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL (if applicable)]"
- Examples:
- "Source: Company 10-K, FY2024, Page 45, Revenue Statement, [SEC EDGAR URL]"
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
- "Source: Bloomberg Terminal, August 15, 2025, AAPL US Equity"
- "Source: FactSet, August 20, 2025, Consensus Estimates Screen"
XLSX Creation, Editing, and Analysis
Overview
Users may ask you to create, edit, or analyze the content of .xlsx files. You have different tools and workflows for different tasks.
Key Requirements
LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the
script. This script automatically configures LibreOffice on first run, including in sandbox environments with restricted Unix sockets (handled by
scripts/office/soffice.py
)
Reading and Analyzing Data
Data Analysis with pandas
For data analysis, visualization, and basic operations, use pandas, which provides powerful data processing capabilities:
python
import pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # Default: first worksheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All worksheets as dictionary
# Analysis
df.head() # Preview data
df.info() # Column information
df.describe() # Statistical information
# Write Excel
df.to_excel('output.xlsx', index=False)
Excel File Workflow
Critical: Use Formulas, Not Hard-Coded Values
Always use Excel formulas instead of calculating values in Python and hard-coding them. This ensures the spreadsheet remains dynamic and updatable.
❌ Incorrect - Hard-Coded Calculated Values
python
# Bad: Calculate in Python and hard-code the result
total = df['Sales'].sum()
sheet['B10'] = total # Hard-codes 5000
# Bad: Calculate growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hard-codes 0.15
# Bad: Calculate average in Python
avg = sum(values) / len(values)
sheet['D20'] = avg # Hard-codes 42.5
✅ Correct - Use Excel Formulas
python
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Calculate average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'
This applies to all calculations—totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
Common Workflows
- Select Tool: Use pandas for data, openpyxl for formulas/formatting
- Create/Load: Create a new workbook or load an existing file
- Modify: Add/edit data, formulas, and formatting
- Save: Write to file
- Recalculate Formulas (Mandatory if using formulas): Use the scripts/recalc.py script
bash
python scripts/recalc.py output.xlsx
- Verify and Fix Any Errors:
- The script returns JSON with error details
- If is , check for specific error types and locations
- Fix identified errors and recalculate
- Common errors to fix:
- : Invalid cell reference
- : Division by zero
- : Incorrect data type in formula
- : Unrecognized formula name
Creating a New Excel File
python
# Use openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Format
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
Editing an Existing Excel File
python
# Use openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific worksheets
# Process multiple worksheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Worksheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new worksheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
Recalculating Formulas
Excel files created or modified by openpyxl contain formulas as strings but no calculated values. Use the provided
script to recalculate formulas:
bash
python scripts/recalc.py <excel_file> [timeout_seconds]
Example:
bash
python scripts/recalc.py output.xlsx 30
The script:
- Automatically sets up LibreOffice macros on first run
- Recalculates all formulas in all worksheets
- Scans all cells for Excel errors (#REF!, #DIV/0!, etc.)
- Returns JSON with detailed error locations and counts
- Works on both Linux and macOS
Formula Verification Checklist
Quick checks to ensure formulas work correctly:
Basic Verification
Common Pitfalls
Formula Testing Strategy
Interpreting scripts/recalc.py Output
The script returns JSON with error details:
json
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors are found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
Best Practices
Library Selection
- pandas: Best for data analysis, bulk operations, and simple data export
- openpyxl: Best for complex formatting, formulas, and Excel-specific features
Using openpyxl
- Cell indexing starts at 1 (row=1, column=1 refers to cell A1)
- Use to read calculated values:
load_workbook('file.xlsx', data_only=True)
- Warning: If you open and save with , formulas will be replaced with values and lost permanently
- For large files: Use for reading or for writing
- Formulas are preserved but not evaluated - use scripts/recalc.py to update values
Using pandas
- Specify data types to avoid inference issues:
pd.read_excel('file.xlsx', dtype={'id': str})
- For large files, read specific columns:
pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
- Handle dates correctly:
pd.read_excel('file.xlsx', parse_dates=['date_column'])
Code Style Guidelines
Important: When generating Python code for Excel operations:
- Write minimal, concise Python code with no unnecessary comments
- Avoid verbose variable names and redundant operations
- Avoid unnecessary print statements
For Excel Files Themselves:
- Add comments to cells with complex formulas or important assumptions
- Document data sources for hard-coded values
- Include comments for key calculations and model sections