Tax Filing Skill
Prepare federal and state income tax returns: read source documents, compute taxes, fill official PDF forms.
Year-agnostic — always look up current-year brackets, deductions, and credits. Never reuse prior-year values.
Folder Structure
Organize all work into subfolders of the working directory:
working_dir/
source/ ← user's source documents (W-2, 1099s, prior return, CSVs)
work/ ← ALL intermediate files (extracted data, field maps, computations)
tax_data.txt ← extracted figures from source docs
computations.txt ← all tax math (federal, state, capital gains)
f1040_fields.json ← field discovery dumps
f8949_fields.json
f1040sd_fields.json
ca540_fields.json
expected_*.json ← verification expected values
forms/ ← blank downloaded PDF forms
f1040_blank.pdf
f8949_blank.pdf
f1040sd_blank.pdf
ca540_blank.pdf
output/ ← final filled PDFs + fill script
fill_YEAR.py ← the fill script
f1040_filled.pdf
f8949_filled.pdf
f1040sd_filled.pdf
ca540_filled.pdf
Create these folders at the start. Keep the working directory clean — no loose files.
Context Budget Rules
These rules prevent context blowouts that cause compaction:
- NEVER read PDFs with the Read tool. Each page becomes ~250KB of base64 images (a 9-page return = 1.8 MB). Extract text instead:
bash
python3 -c "
import pdfplumber
with pdfplumber.open('source/document.pdf') as pdf:
for p in pdf.pages: print(p.extract_text())
"
- NEVER read the same document twice. Save extracted figures to on first read.
- Run field discovery ONCE per form as a bulk JSON dump to . Do NOT use repeatedly.
- Save all computed values to so they survive compaction.
Workflow
Step 1: Gather Source Documents
Ask the user what documents they have. Read files from
(move them there if needed). Use pdfplumber for PDFs, Read tool for CSVs.
Save all extracted figures to
immediately — one section per document with every relevant number.
Step 2: Confirm Filing Details — MANDATORY
You MUST ask the user every one of these questions and WAIT for answers before proceeding. Do NOT skip this step even if you think you know the answers from memory or source documents. Tax returns are legal documents.
- Filing status (Single, MFJ, MFS, HOH, QSS)
- Dependents (number, names)
- State of residence
- Standard vs. itemized deduction preference
- Digital asset / cryptocurrency transactions (Yes/No) — stock trades are NOT digital assets
- Health coverage status (for CA)
- Any estimated tax payments made
- Any other credits or adjustments
Do NOT proceed to Step 3 until the user has answered. "Same as last year" counts as confirmation.
Step 3: Look Up Year-Specific Values
Research from IRS.gov and FTB.ca.gov:
- Federal tax brackets, standard deduction, QDCG 0%/15%/20% thresholds
- State tax brackets, standard deduction, personal exemption credit
Step 4: Compute Federal Return
- Gross Income: W-2 wages (1a) + interest (2b) + dividends (3b) + capital gain/loss (7)
- Adjustments → AGI (Line 11)
- Deductions → Taxable Income (Line 15)
- Tax: use QDCG worksheet if qualified dividends/capital gains exist
- Credits, other taxes → Total Tax (Line 24)
- Payments (withholding, estimated) → Refund/Owed
- If refund: collect direct deposit info (routing, account, type)
Save all line values to
.
Step 5: Compute Capital Gains (if applicable)
- Form 8949: individual transactions (Part I short-term, Part II long-term)
- Schedule D: totals, $3,000 loss limitation, carryover calculation
- Net gain/loss → 1040 Line 7
Rounding rule: Form 8949 and Schedule D must use exact cents matching the 1099-B / 1099-DA source documents. Only Form 1040 rounds to the nearest whole dollar (Line 7 = round of Schedule D Line 16). Do NOT round amounts on 8949 or Schedule D.
Step 6: Compute State Return (CA Form 540)
- Federal AGI → CA adjustments → CA taxable income
- Tax from brackets − exemption credits → total tax
- Withholding → Refund/Owed
Step 7: Download Blank PDF Forms
IRS: Use
for prior-year forms (
is always current year):
https://www.irs.gov/pub/irs-prior/f1040--YEAR.pdf
https://www.irs.gov/pub/irs-prior/f8949--YEAR.pdf
https://www.irs.gov/pub/irs-prior/f1040sd--YEAR.pdf
Verify each download has
header (not an HTML error page).
Step 8: Discover Field Names & Fill Forms
Discovery — ONCE per form, use
bash
python scripts/discover_fields.py forms/f1040_blank.pdf --compact > work/f1040_fields.json
python scripts/discover_fields.py forms/f8949_blank.pdf --compact > work/f8949_fields.json
python scripts/discover_fields.py forms/f1040sd_blank.pdf --compact > work/f1040sd_fields.json
python scripts/discover_fields.py forms/ca540_blank.pdf --compact > work/ca540_fields.json
outputs a minimal
{field_name: description}
mapping — each field name is paired with its tooltip/speak description so you can map line numbers to field names directly without manual inspection. Radio buttons include their option values (e.g.
{"/2": "Single", "/1": "MFJ"}
).
Do NOT use
repeatedly or
(which dumps raw metadata and wastes context).
HARD FAIL: If discovery returns 0 human-readable descriptions, STOP. Do not guess field names.
Fill Script
- — appends to text field keys. Required for IRS forms.
fill_irs_pdf(in, out, fields, checkboxes, radio_values)
— IRS forms. for filing status, yes/no, checking/savings.
fill_pdf(in, out, fields, checkboxes)
— CA forms. Matches by chain + keys.
Step 9: Verify
bash
python scripts/verify_filled.py output/f1040_filled.pdf work/expected_f1040.json
Fix any failures, re-run fill script.
Step 10: Present Results
Show a summary table, verification checklist, capital loss carryover (if any), then:
- Sign your returns — unsigned returns are rejected
- Payment instructions (if owed) — IRS Direct Pay, FTB Web Pay, deadline April 15
- Direct deposit — recommend it for refunds; ask for bank info if not provided
- Filing options — e-file (Free File, CalFile) or mailing addresses
Step 11: MFJ vs Single Comparison (if Married Filing Jointly)
After completing the MFJ return, compute what each spouse would owe if they filed Single instead. This helps the couple understand the tax impact of their filing status choice.
For each spouse, compute a hypothetical Single return:
- Income: Use only that spouse's W-2, 1099-INT, 1099-DIV, and 1099-B/1099-DA
- Standard deduction: Single amount (typically half of MFJ)
- QBI deduction: Based on that spouse's 199A dividends only
- Tax: Use Single brackets and QDCG worksheet with Single 0%/15%/20% thresholds
- Credits: Foreign tax credit only if that spouse paid foreign tax
- Additional Medicare Tax: Use the $200K Single threshold (not $250K MFJ)
- Withholding: That spouse's W-2 Box 2 only
Present a side-by-side comparison table:
| MFJ (actual) | Both Single | Difference |
|---|
| Combined tax | | | |
| Combined withheld | | | |
| Combined owed | | | |
Include key takeaways — especially the Additional Medicare Tax threshold difference ($250K MFJ vs $200K Single per spouse), which is often the largest driver of the MFJ vs Single gap.
Key Gotchas
Context
- NEVER use Read tool on PDFs — use pdfplumber
- NEVER read same document twice — save to
- Field discovery once per form with — no (wastes context), no repeated
Field Discovery
- Field names change between years — always discover fresh
- XFA template is in → array, NOT from brute-force xref scanning
- Do NOT use for XFA — use regex (IRS XML has broken namespaces)
PDF Filling
- Remove XFA from AcroForm, set NeedAppearances=True, use auto_regenerate=False
- Checkboxes: set both and to or
- IRS fields need suffix — use
- IRS checkboxes match by directly; radio groups match by key via
Rounding
- Form 8949 & Schedule D: Report exact cents (e.g. "11.36", "-240.00") to match 1099-B / 1099-DA source documents. Never round these.
- Form 1040: Round all amounts to the nearest whole dollar per IRS instructions. Line 7 (capital gain) = nearest-dollar rounding of Schedule D Line 16.
- CA 540: Round to nearest whole dollar.
Form-Specific
- 1040: First few fields (-) are fiscal year headers, not name fields. SSN = 9 digits, no dashes. Digital assets = crypto only, not stocks.
- 8949: Box A/B/C checkboxes are 3-way radio buttons. Totals at high field numbers (e.g. -), not after last data row. Schedule D lines 1b/8b (from 8949), not 1a/8a.
- Schedule D: Some fields have suffix (read-only) — skip those.
- CA 540: Field names are (page+sequence, NOT line numbers). Checkboxes end with , radio buttons use named AP keys.
- Downloads: Prior-year IRS = , current =