invoice-organizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Invoice Organizer

发票整理工具

Bulk-categorize a CSV of invoices or receipts, detect duplicates, and produce a tax-ready monthly summary.

批量分类发票或收据的CSV文件,检测重复项,并生成可直接用于税务申报的月度汇总。

Table of Contents

目录

Keywords

关键词

invoice, invoices, receipt, receipts, expense, expenses, bookkeeping, accounting, tax, tax prep, categorization, vendor, reimbursement, monthly summary

invoice, invoices, receipt, receipts, expense, expenses, bookkeeping, accounting, tax, tax prep, categorization, vendor, reimbursement, monthly summary

Quick Start

快速开始

Categorize 200 Receipts in 1 Minute

1分钟完成200张收据分类

  1. Export receipts from your bank or expense tool as a CSV with columns:
    date,vendor,description,amount,currency
  2. Run:
    bash
    python scripts/invoice_categorizer.py receipts.csv
  3. Review the categorized output and override anything wrong via the rules file
  4. Export the monthly summary for handoff to your accountant

  1. 从银行或费用管理工具导出收据为CSV文件,需包含列:
    date,vendor,description,amount,currency
  2. 运行以下命令:
    bash
    python scripts/invoice_categorizer.py receipts.csv
  3. 查看分类后的输出结果,通过规则文件修正分类错误的条目
  4. 导出月度汇总文件提交给会计师

Core Workflows

核心工作流

Workflow 1: Monthly Bookkeeping

工作流1:月度簿记

Goal: Convert a month of unstructured receipts into a categorized, tax-ready summary in under 10 minutes.
Steps:
  1. Export receipts as CSV from your bank, card, or expense tool
  2. Run:
    python scripts/invoice_categorizer.py receipts.csv
  3. Review the uncategorized bucket — these need rules added or manual override
  4. Add rules to
    assets/category_rules.json
    for any recurring vendors
  5. Re-run; uncategorized count should drop each month as the rules file grows
  6. Drop the monthly summary into
    assets/monthly_summary_template.md
Expected Output: Categorized expense list + monthly totals by category + duplicate-suspect list.
Time Estimate: 10 minutes/month after initial rules are seeded.
目标: 在10分钟内将一个月的非结构化收据转换为分类完成、可用于税务申报的汇总文件。
步骤:
  1. 从银行、信用卡或费用管理工具导出收据为CSV文件
  2. 运行命令:
    python scripts/invoice_categorizer.py receipts.csv
  3. 查看未分类条目——这些需要添加规则或手动修正
  4. assets/category_rules.json
    中为经常性合作供应商添加规则
  5. 重新运行脚本;随着规则文件的完善,每月未分类条目的数量会逐渐减少
  6. 将月度汇总文件导入
    assets/monthly_summary_template.md
    模板
预期输出: 分类后的费用列表 + 按类别统计的月度总计 + 疑似重复项列表。
时间预估: 初始规则设置完成后,每月耗时约10分钟。

Workflow 2: Duplicate Detection

工作流2:重复项检测

Goal: Catch double-entered receipts before they reach the books.
Steps:
  1. Run:
    python scripts/invoice_categorizer.py receipts.csv --json
  2. Inspect the
    duplicates_suspected
    list
  3. Confirm whether each is a true duplicate (same charge entered twice) or a coincidence (same amount on different days at different vendors)
  4. Remove confirmed duplicates from the source CSV; re-run
Expected Output: Cleaned CSV with no duplicate rows.
Time Estimate: 2-3 minutes per month.
目标: 在入账前发现重复录入的收据。
步骤:
  1. 运行命令:
    python scripts/invoice_categorizer.py receipts.csv --json
  2. 查看
    duplicates_suspected
    列表
  3. 确认每个条目是真重复(同一笔费用重复录入)还是巧合(不同日期、不同供应商但金额相同)
  4. 从源CSV文件中移除确认的重复项,重新运行脚本
预期输出: 清理完成后无重复行的CSV文件。
时间预估: 每月耗时2-3分钟。

Workflow 3: Vendor Spend Review

工作流3:供应商支出审查

Goal: Find spend creep — vendors whose monthly total grew significantly without you noticing.
Steps:
  1. Run categorizer for the last 3-6 months separately
  2. Compare per-vendor totals month-over-month
  3. Flag any vendor where total grew > 25% with no obvious business reason
  4. Either renegotiate, switch, or accept; revisit quarterly
Expected Output: Vendor-spend trend list with flagged growth.
Time Estimate: 15 minutes per quarter.

目标: 发现支出异常增长——即供应商月度总支出大幅增长但未被察觉的情况。
步骤:
  1. 分别运行分类工具处理过去3-6个月的收据
  2. 对比各供应商的月度支出总额
  3. 标记无合理业务原因但支出增长超过25%的供应商
  4. 选择重新协商合作、更换供应商或接受现状,每季度复查一次
预期输出: 包含异常增长标记的供应商支出趋势列表。
时间预估: 每季度耗时15分钟。

Tools

工具

invoice_categorizer.py

invoice_categorizer.py

Reads a CSV of receipts/invoices and:
  • Categorizes each row by vendor + description against rules in
    assets/category_rules.json
    (extensible)
  • Aggregates totals per category and per vendor
  • Detects likely duplicates (same vendor + amount within 3 days)
  • Flags uncategorized items for manual review
bash
undefined
读取收据/发票的CSV文件并执行以下操作:
  • 分类:根据
    assets/category_rules.json
    中的规则(可扩展),按供应商+描述对每一行条目进行分类
  • 汇总:按类别和供应商统计支出总额
  • 检测:疑似重复项(同一供应商+相同金额,且时间间隔在3天内)
  • 标记:未分类条目以便人工审核
bash
undefined

Human-readable summary

人类可读格式的汇总

python scripts/invoice_categorizer.py receipts.csv
python scripts/invoice_categorizer.py receipts.csv

JSON for programmatic use

供程序调用的JSON格式

python scripts/invoice_categorizer.py receipts.csv --json
python scripts/invoice_categorizer.py receipts.csv --json

Use a custom rules file

使用自定义规则文件

python scripts/invoice_categorizer.py receipts.csv --rules my-rules.json

**Expected CSV columns:** `date, vendor, description, amount` (currency optional)
**Date formats accepted:** `YYYY-MM-DD`, `MM/DD/YYYY`, `DD/MM/YYYY`

---
python scripts/invoice_categorizer.py receipts.csv --rules my-rules.json

**要求的CSV列:** `date, vendor, description, amount`(currency为可选列)
**支持的日期格式:** `YYYY-MM-DD`, `MM/DD/YYYY`, `DD/MM/YYYY`

---

Reference Guides

参考指南

  • references/expense_categorization_guide.md
    — Standard expense categories, common tax buckets (US Schedule C, UK self-employment, generic), how to map vendors to categories

  • references/expense_categorization_guide.md
    —— 标准费用类别、常见税务分类(美国Schedule C、英国自雇、通用分类)、供应商与类别的映射方法

Templates

模板

  • assets/category_rules.json
    — Default rules; extend with your recurring vendors
  • assets/monthly_summary_template.md
    — Format for handing the monthly summary to an accountant

  • assets/category_rules.json
    —— 默认规则;可添加您的经常性合作供应商规则
  • assets/monthly_summary_template.md
    —— 提交给会计师的月度汇总文件格式模板

Best Practices

最佳实践

  • Categorize monthly, not annually. Annual catch-up bookkeeping always misses receipts and produces guess-categorization.
  • Grow the rules file over time. First month: 30% uncategorized. Sixth month: < 5%. The compounding return on rule-writing is high.
  • Keep evidence. Categorization is bookkeeping; receipts (PDFs, photos) are tax evidence. Store separately from this script's output.
  • Don't trust auto-categorization for tax filing. Use it for prep; have a human (you or your accountant) sign off before filing.
  • Currency consistency. If you have multi-currency receipts, convert at month-end FX rate before this script; it does not handle FX.

  • 每月分类,而非年度整理。 年度补做簿记总会遗漏收据,且分类结果多为猜测。
  • 逐步完善规则文件。 第一个月:30%未分类条目。第六个月:<5%未分类条目。编写规则的回报会逐步累积。
  • 留存凭证。 分类属于簿记工作;收据(PDF、照片)是税务凭证。需与本脚本的输出文件分开存储。
  • 税务申报勿完全依赖自动分类。 自动分类仅用于准备工作;申报前需由人工(您或您的会计师)审核确认。
  • 保持货币一致性。 若有多币种收据,请在运行本脚本前按月末汇率转换为统一货币;本脚本不处理汇率转换。

Integration Points

集成点

  • Pairs with
    finance/
    skills for budgeting and forecasting
  • Feeds into
    c-level-advisor/cs-cfo-advisor
    cash-flow workflows
  • Used by solo-founder persona for monthly close
  • 可与
    finance/
    技能集成,用于预算编制与预测
  • 可为
    c-level-advisor/cs-cfo-advisor
    现金流工作流提供数据支持
  • 供独立创始人(solo-founder)角色用于月度结账