tax-filing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Tax Filing Skill

税务申报技能

Prepare federal and state income tax returns: read source documents, compute taxes, fill official PDF forms.
Year-agnostic — always look up current-year brackets, deductions, and credits. Never reuse prior-year values.
准备联邦和州级所得税纳税申报:读取源文档、计算税款、填写官方PDF表格。
不局限于特定年份 —— 始终查询当前年份的税级、扣除项和抵免额。绝不重复使用往年数值。

Folder Structure

文件夹结构

Organize all work into subfolders of the working directory:
working_dir/
  source/              ← user's source documents (W-2, 1099s, prior return, CSVs)
  work/                ← ALL intermediate files (extracted data, field maps, computations)
    tax_data.txt       ← extracted figures from source docs
    computations.txt   ← all tax math (federal, state, capital gains)
    f1040_fields.json  ← field discovery dumps
    f8949_fields.json
    f1040sd_fields.json
    ca540_fields.json
    expected_*.json    ← verification expected values
  forms/               ← blank downloaded PDF forms
    f1040_blank.pdf
    f8949_blank.pdf
    f1040sd_blank.pdf
    ca540_blank.pdf
  output/              ← final filled PDFs + fill script
    fill_YEAR.py       ← the fill script
    f1040_filled.pdf
    f8949_filled.pdf
    f1040sd_filled.pdf
    ca540_filled.pdf
Create these folders at the start. Keep the working directory clean — no loose files.
将所有工作内容整理到工作目录的子文件夹中:
working_dir/
  source/              ← 用户的源文档(W-2、1099s、往年申报表、CSV文件)
  work/                ← 所有中间文件(提取的数据、字段映射、计算结果)
    tax_data.txt       ← 从源文档提取的数值
    computations.txt   ← 所有税务计算过程(联邦、州级、资本利得)
    f1040_fields.json  ← 字段探测结果文件
    f8949_fields.json
    f1040sd_fields.json
    ca540_fields.json
    expected_*.json    ← 验证用预期值
  forms/               ← 下载的空白PDF表单
    f1040_blank.pdf
    f8949_blank.pdf
    f1040sd_blank.pdf
    ca540_blank.pdf
  output/              ← 最终填写完成的PDF文件 + 填写脚本
    fill_YEAR.py       ← 填写脚本
    f1040_filled.pdf
    f8949_filled.pdf
    f1040sd_filled.pdf
    ca540_filled.pdf
在工作开始时创建这些文件夹。保持工作目录整洁——不要有零散文件。

Context Budget Rules

上下文预算规则

These rules prevent context blowouts that cause compaction:
  1. NEVER read PDFs with the Read tool. Each page becomes ~250KB of base64 images (a 9-page return = 1.8 MB). Extract text instead:
    bash
    python3 -c "
    import pdfplumber
    with pdfplumber.open('source/document.pdf') as pdf:
        for p in pdf.pages: print(p.extract_text())
    "
  2. NEVER read the same document twice. Save extracted figures to
    work/tax_data.txt
    on first read.
  3. Run field discovery ONCE per form as a bulk JSON dump to
    work/
    . Do NOT use
    --search
    repeatedly.
  4. Save all computed values to
    work/computations.txt
    so they survive compaction.
这些规则可防止因上下文膨胀导致的信息压缩:
  1. 绝不要使用Read工具读取PDF文件。每页会生成约250KB的base64图像(一份9页的申报表=1.8 MB)。请改用文本提取:
    bash
    python3 -c "
    import pdfplumber
    with pdfplumber.open('source/document.pdf') as pdf:
        for p in pdf.pages: print(p.extract_text())
    "
  2. 绝不要重复读取同一文档。首次读取时将提取的数值保存到
    work/tax_data.txt
    中。
  3. 每个表单仅执行一次字段探测,将结果批量导出为JSON文件保存到
    work/
    目录。不要重复使用
    --search
    参数。
  4. 将所有计算值保存到
    work/computations.txt
    ,确保它们不会因信息压缩丢失。

Workflow

工作流程

Step 1: Gather Source Documents

步骤1:收集源文档

Ask the user what documents they have. Read files from
source/
(move them there if needed). Use pdfplumber for PDFs, Read tool for CSVs.
Save all extracted figures to
work/tax_data.txt
immediately — one section per document with every relevant number.
询问用户拥有哪些文档。从
source/
目录读取文件(如有需要可将文件移至该目录)。PDF文件使用pdfplumber处理,CSV文件使用Read工具处理。
立即将所有提取的数值保存到
work/tax_data.txt
中——按文档分章节记录所有相关数值。

Step 2: Confirm Filing Details — MANDATORY

步骤2:确认申报详情——必填项

You MUST ask the user every one of these questions and WAIT for answers before proceeding. Do NOT skip this step even if you think you know the answers from memory or source documents. Tax returns are legal documents.
  • Filing status (Single, MFJ, MFS, HOH, QSS)
  • Dependents (number, names)
  • State of residence
  • Standard vs. itemized deduction preference
  • Digital asset / cryptocurrency transactions (Yes/No) — stock trades are NOT digital assets
  • Health coverage status (for CA)
  • Any estimated tax payments made
  • Any other credits or adjustments
Do NOT proceed to Step 3 until the user has answered. "Same as last year" counts as confirmation.
你必须询问用户以下所有问题,并等待用户答复后再继续。 即使你认为可以通过记忆或源文档得知答案,也绝不要跳过此步骤。纳税申报表是法律文件。
  • 申报状态(单身、联合申报、单独申报、户主、合格丧偶者)
  • 受抚养人(数量、姓名)
  • 居住州
  • 标准扣除或分项扣除偏好
  • 数字资产/加密货币交易(是/否)——股票交易不属于数字资产
  • 医疗保险覆盖状态(针对加州)
  • 是否已缴纳预估税款
  • 其他任何抵免或调整项
在用户答复前不要进入步骤3。“与去年相同”视为确认答复。

Step 3: Look Up Year-Specific Values

步骤3:查询特定年份的数值

Research from IRS.gov and FTB.ca.gov:
  • Federal tax brackets, standard deduction, QDCG 0%/15%/20% thresholds
  • State tax brackets, standard deduction, personal exemption credit
Save to
work/computations.txt
.
从IRS.gov和FTB.ca.gov查询以下信息:
  • 联邦税级、标准扣除额、合格股息与资本利得(QDCG)0%/15%/20%阈值
  • 州级税级、标准扣除额、个人免税额抵免
保存到
work/computations.txt
中。

Step 4: Compute Federal Return

步骤4:计算联邦申报表

  1. Gross Income: W-2 wages (1a) + interest (2b) + dividends (3b) + capital gain/loss (7)
  2. Adjustments → AGI (Line 11)
  3. Deductions → Taxable Income (Line 15)
  4. Tax: use QDCG worksheet if qualified dividends/capital gains exist
  5. Credits, other taxes → Total Tax (Line 24)
  6. Payments (withholding, estimated) → Refund/Owed
  7. If refund: collect direct deposit info (routing, account, type)
Save all line values to
work/computations.txt
.
  1. 总收入:W-2工资(第1a行)+ 利息(第2b行)+ 股息(第3b行)+ 资本利得/损失(第7行)
  2. 调整项 → 调整后总收入(AGI,第11行)
  3. 扣除项 → 应纳税所得额(第15行)
  4. 税款:若存在合格股息/资本利得,使用QDCG工作表计算
  5. 抵免额、其他税款 → 总税款(第24行)
  6. 已缴款项(预扣税、预估税款)→ 应退/应缴金额
  7. 若有退税:收集直接存款信息(路由号、账户号、账户类型)
将所有行的数值保存到
work/computations.txt
中。

Step 5: Compute Capital Gains (if applicable)

步骤5:计算资本利得(如适用)

  1. Form 8949: individual transactions (Part I short-term, Part II long-term)
  2. Schedule D: totals, $3,000 loss limitation, carryover calculation
  3. Net gain/loss → 1040 Line 7
Rounding rule: Form 8949 and Schedule D must use exact cents matching the 1099-B / 1099-DA source documents. Only Form 1040 rounds to the nearest whole dollar (Line 7 = round of Schedule D Line 16). Do NOT round amounts on 8949 or Schedule D.
  1. 表格8949:单独交易记录(第一部分短期,第二部分长期)
  2. 附表D:汇总金额、3000美元损失限额、结转计算
  3. 净利得/损失 → 1040表格第7行
舍入规则:表格8949和附表D必须使用与1099-B / 1099-DA源文档完全一致的精确美分数值。仅1040表格可四舍五入至最接近的整数(第7行=附表D第16行的四舍五入值)。不要对8949或附表D上的金额进行舍入。

Step 6: Compute State Return (CA Form 540)

步骤6:计算州级申报表(加州540表格)

  1. Federal AGI → CA adjustments → CA taxable income
  2. Tax from brackets − exemption credits → total tax
  3. Withholding → Refund/Owed
  1. 联邦AGI → 加州调整项 → 加州应纳税所得额
  2. 税级计算税款 − 免税额抵免 → 总税款
  3. 预扣税 → 应退/应缴金额

Step 7: Download Blank PDF Forms

步骤7:下载空白PDF表单

Save to
forms/
directory.
IRS: Use
/irs-prior/
for prior-year forms (
/irs-pdf/
is always current year):
https://www.irs.gov/pub/irs-prior/f1040--YEAR.pdf
https://www.irs.gov/pub/irs-prior/f8949--YEAR.pdf
https://www.irs.gov/pub/irs-prior/f1040sd--YEAR.pdf
CA:
ftb.ca.gov/forms/YEAR/
for state forms.
Verify each download has
%PDF-
header (not an HTML error page).
保存到
forms/
目录。
IRS:往年表单使用
/irs-prior/
路径(
/irs-pdf/
始终为当前年份):
https://www.irs.gov/pub/irs-prior/f1040--YEAR.pdf
https://www.irs.gov/pub/irs-prior/f8949--YEAR.pdf
https://www.irs.gov/pub/irs-prior/f1040sd--YEAR.pdf
加州:州级表单可从
ftb.ca.gov/forms/YEAR/
获取。
验证每个下载文件的开头是否包含
%PDF-
(确保不是HTML错误页面)。

Step 8: Discover Field Names & Fill Forms

步骤8:探测字段名称并填写表单

Discovery — ONCE per form, use
--compact

字段探测——每个表单仅执行一次,使用
--compact
参数

bash
python scripts/discover_fields.py forms/f1040_blank.pdf --compact > work/f1040_fields.json
python scripts/discover_fields.py forms/f8949_blank.pdf --compact > work/f8949_fields.json
python scripts/discover_fields.py forms/f1040sd_blank.pdf --compact > work/f1040sd_fields.json
python scripts/discover_fields.py forms/ca540_blank.pdf --compact > work/ca540_fields.json
--compact
outputs a minimal
{field_name: description}
mapping — each field name is paired with its tooltip/speak description so you can map line numbers to field names directly without manual inspection. Radio buttons include their option values (e.g.
{"/2": "Single", "/1": "MFJ"}
).
Do NOT use
--search
repeatedly or
--json
(which dumps raw metadata and wastes context).
HARD FAIL: If discovery returns 0 human-readable descriptions, STOP. Do not guess field names.
bash
python scripts/discover_fields.py forms/f1040_blank.pdf --compact > work/f1040_fields.json
python scripts/discover_fields.py forms/f8949_blank.pdf --compact > work/f8949_fields.json
python scripts/discover_fields.py forms/f1040sd_blank.pdf --compact > work/f1040sd_fields.json
python scripts/discover_fields.py forms/ca540_blank.pdf --compact > work/ca540_fields.json
--compact
参数会输出极简的
{字段名称: 描述}
映射——每个字段名称与其提示/说明配对,你可以直接将行号映射到字段名称,无需手动检查。单选按钮包含其选项值(例如
{"/2": "Single", "/1": "MFJ"}
)。
不要重复使用
--search
参数或
--json
参数(后者会导出原始元数据,浪费上下文)。
严重错误处理:如果探测结果返回0个可读描述,请停止操作。不要猜测字段名称。

Fill Script

填写脚本

Write
output/fill_YEAR.py
using
scripts/fill_forms.py
:
  • add_suffix(d)
    — appends
    [0]
    to text field keys. Required for IRS forms.
  • fill_irs_pdf(in, out, fields, checkboxes, radio_values)
    — IRS forms.
    radio_values
    for filing status, yes/no, checking/savings.
  • fill_pdf(in, out, fields, checkboxes)
    — CA forms. Matches by
    /Parent
    chain +
    /AP/N
    keys.
Output filled PDFs to
output/
.
使用
scripts/fill_forms.py
编写
output/fill_YEAR.py
  • add_suffix(d)
    —— 为文本字段键添加
    [0]
    后缀。IRS表单必填。
  • fill_irs_pdf(in, out, fields, checkboxes, radio_values)
    —— 用于IRS表单。
    radio_values
    用于申报状态、是/否选项、支票/储蓄账户类型。
  • fill_pdf(in, out, fields, checkboxes)
    —— 用于加州表单。通过
    /Parent
    链 +
    /AP/N
    键匹配字段。
将填写完成的PDF文件输出到
output/
目录。

Step 9: Verify

步骤9:验证

bash
python scripts/verify_filled.py output/f1040_filled.pdf work/expected_f1040.json
Fix any failures, re-run fill script.
bash
python scripts/verify_filled.py output/f1040_filled.pdf work/expected_f1040.json
修复所有错误,重新运行填写脚本。

Step 10: Present Results

步骤10:呈现结果

Show a summary table, verification checklist, capital loss carryover (if any), then:
  • Sign your returns — unsigned returns are rejected
  • Payment instructions (if owed) — IRS Direct Pay, FTB Web Pay, deadline April 15
  • Direct deposit — recommend it for refunds; ask for bank info if not provided
  • Filing options — e-file (Free File, CalFile) or mailing addresses
展示汇总表格、验证清单、资本损失结转(如有),然后:
  • 签署申报表 —— 未签署的申报表会被退回
  • 付款说明(如有应缴金额)—— IRS直接支付、FTB在线支付,截止日期为4月15日
  • 直接存款 —— 建议使用此方式获取退税;若未提供银行信息,请询问用户
  • 申报选项 —— 电子申报(免费申报、加州申报系统)或邮寄地址

Step 11: MFJ vs Single Comparison (if Married Filing Jointly)

步骤11:联合申报与单身申报对比(若为联合申报)

After completing the MFJ return, compute what each spouse would owe if they filed Single instead. This helps the couple understand the tax impact of their filing status choice.
For each spouse, compute a hypothetical Single return:
  1. Income: Use only that spouse's W-2, 1099-INT, 1099-DIV, and 1099-B/1099-DA
  2. Standard deduction: Single amount (typically half of MFJ)
  3. QBI deduction: Based on that spouse's 199A dividends only
  4. Tax: Use Single brackets and QDCG worksheet with Single 0%/15%/20% thresholds
  5. Credits: Foreign tax credit only if that spouse paid foreign tax
  6. Additional Medicare Tax: Use the $200K Single threshold (not $250K MFJ)
  7. Withholding: That spouse's W-2 Box 2 only
Present a side-by-side comparison table:
MFJ (actual)Both SingleDifference
Combined tax
Combined withheld
Combined owed
Include key takeaways — especially the Additional Medicare Tax threshold difference ($250K MFJ vs $200K Single per spouse), which is often the largest driver of the MFJ vs Single gap.
完成联合申报后,计算每位配偶单独申报单身时的应缴税款。这有助于夫妇了解其申报状态选择对税款的影响。
为每位配偶计算假设的单身申报表:
  1. 收入:仅使用该配偶的W-2、1099-INT、1099-DIV和1099-B/1099-DA
  2. 标准扣除额:单身扣除额(通常为联合申报的一半)
  3. 合格业务收入(QBI)扣除额:仅基于该配偶的199A股息
  4. 税款:使用单身税级和QDCG工作表,以及单身的0%/15%/20%阈值
  5. 抵免额:仅当该配偶缴纳过外国税款时,可申请外国税款抵免
  6. 额外医疗保险税:使用20万美元的单身阈值(而非联合申报的25万美元)
  7. 预扣税:仅该配偶W-2表格第2栏的金额
展示并排对比表格:
联合申报(实际)双方单独申报单身差额
合计税款
合计预扣税
合计应缴金额
包含关键要点——尤其是额外医疗保险税的阈值差异(联合申报25万美元 vs 每位配偶单身申报20万美元),这通常是联合申报与单身申报差距的主要原因。

Key Gotchas

关键注意事项

Context

上下文

  • NEVER use Read tool on PDFs — use pdfplumber
  • NEVER read same document twice — save to
    work/tax_data.txt
  • Field discovery once per form with
    --compact
    — no
    --json
    (wastes context), no repeated
    --search
  • 绝不要使用Read工具读取PDF文件——使用pdfplumber
  • 绝不要重复读取同一文档——保存到
    work/tax_data.txt
  • 每个表单仅执行一次字段探测,使用
    --compact
    参数——不要使用
    --json
    (浪费上下文),不要重复使用
    --search

Field Discovery

字段探测

  • Field names change between years — always discover fresh
  • XFA template is in
    /AcroForm
    /XFA
    array, NOT from brute-force xref scanning
  • Do NOT use
    xml.etree
    for XFA — use regex (IRS XML has broken namespaces)
  • 字段名称每年都会变化——始终重新探测
  • XFA模板位于
    /AcroForm
    /XFA
    数组中,不要通过暴力扫描xref获取
  • 不要使用
    xml.etree
    处理XFA——使用正则表达式(IRS的XML存在命名空间问题)

PDF Filling

PDF填写

  • Remove XFA from AcroForm, set NeedAppearances=True, use auto_regenerate=False
  • Checkboxes: set both
    /V
    and
    /AS
    to
    /1
    or
    /Off
  • IRS fields need
    [0]
    suffix — use
    add_suffix()
  • IRS checkboxes match by
    /T
    directly; radio groups match by
    /AP/N
    key via
    radio_values
  • 从AcroForm中移除XFA,设置NeedAppearances=True,使用auto_regenerate=False
  • 复选框:将
    /V
    /AS
    均设置为
    /1
    /Off
  • IRS字段需要
    [0]
    后缀——使用
    add_suffix()
  • IRS复选框直接通过
    /T
    匹配;单选组通过
    radio_values
    中的
    /AP/N
    键匹配

Rounding

舍入规则

  • Form 8949 & Schedule D: Report exact cents (e.g. "11.36", "-240.00") to match 1099-B / 1099-DA source documents. Never round these.
  • Form 1040: Round all amounts to the nearest whole dollar per IRS instructions. Line 7 (capital gain) = nearest-dollar rounding of Schedule D Line 16.
  • CA 540: Round to nearest whole dollar.
  • 表格8949 & 附表D:报告精确美分数值(例如“11.36”、“-240.00”),与1099-B / 1099-DA源文档保持一致。绝不舍入这些数值。
  • 表格1040:根据IRS说明,将所有金额四舍五入至最接近的整数。第7行(资本利得)= 附表D第16行的四舍五入值。
  • 加州540表格:四舍五入至最接近的整数。

Form-Specific

特定表单注意事项

  • 1040: First few fields (
    f1_01
    -
    f1_03
    ) are fiscal year headers, not name fields. SSN = 9 digits, no dashes. Digital assets = crypto only, not stocks.
  • 8949: Box A/B/C checkboxes are 3-way radio buttons. Totals at high field numbers (e.g.
    f1_115
    -
    f1_119
    ), not after last data row. Schedule D lines 1b/8b (from 8949), not 1a/8a.
  • Schedule D: Some fields have
    _RO
    suffix (read-only) — skip those.
  • CA 540: Field names are
    540-PPNN
    (page+sequence, NOT line numbers). Checkboxes end with
    " CB"
    , radio buttons use named AP keys.
  • Downloads: Prior-year IRS =
    irs.gov/pub/irs-prior/
    , current =
    irs.gov/pub/irs-pdf/
  • 1040表格:前几个字段(
    f1_01
    -
    f1_03
    )是财年表头,而非姓名字段。社保号=9位数字,不要加连字符。数字资产仅指加密货币,不包括股票。
  • 8949表格:A/B/C复选框是三选一单选按钮。汇总金额位于字段编号较高的位置(例如
    f1_115
    -
    f1_119
    ),而非最后一行数据之后。附表D第1b/8b行数据来自8949表格,而非第1a/8a行。
  • 附表D:部分字段带有
    _RO
    后缀(只读)——跳过这些字段。
  • 加州540表格:字段名称为
    540-PPNN
    (页面+序号,而非行号)。复选框以
    " CB"
    结尾,单选按钮使用命名的AP键。
  • 下载路径:IRS往年表单=
    irs.gov/pub/irs-prior/
    ,当前年份=
    irs.gov/pub/irs-pdf/