financial-data-collector
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFinancial Data Collector
财务数据收集器
Collect and validate real financial data for US public companies using free data sources.
Output is a standardized JSON file ready for consumption by other financial skills.
使用免费数据源收集并验证美国上市公司的真实财务数据。
输出为标准化JSON文件,可直接供其他财务技能使用。
Critical Constraints
关键约束
NO FALLBACK values. If a field cannot be retrieved, set it to with .
Never substitute defaults (e.g., ). The downstream skill decides how to handle missing data.
null_source: "missing"beta or 1.0Data source attribution is mandatory. Every data section must have a field.
_sourceCapEx sign convention: yfinance returns CapEx as negative (cash outflow). Preserve the original sign. Document the convention in output metadata. Do NOT flip signs.
yfinance FCF ≠ Investment bank FCF. yfinance FCF = Operating CF + CapEx (no SBC deduction). Flag this in output metadata so downstream DCF skills don't overstate FCF.
禁止使用默认替代值。如果无法获取某个字段,将其设置为并标注。
绝不允许使用默认值替代(例如)。下游技能会自行决定如何处理缺失数据。
null_source: "missing"beta or 1.0必须标注数据源。每个数据部分都必须包含字段。
_sourceCapEx符号约定:yfinance返回的CapEx为负值(代表现金流出)。请保留原始符号,并在输出元数据中说明该约定。不得翻转符号。
yfinance FCF ≠ 投行FCF:yfinance计算的FCF = 经营现金流 + CapEx(未扣除股权激励费用SBC)。请在输出元数据中标记这一点,避免下游DCF技能高估FCF。
Workflow
工作流程
Step 1: Collect Data
步骤1:收集数据
Run the collection script:
bash
python scripts/collect_data.py TICKER [--years 5] [--output path/to/output.json]The script collects in this priority:
- yfinance — market data, historical financials, beta, analyst estimates
- yfinance ^TNX — 10Y Treasury yield as risk-free rate proxy
- User supplement — for years where yfinance returns NaN (report to user, do not guess)
运行收集脚本:
bash
python scripts/collect_data.py TICKER [--years 5] [--output path/to/output.json]脚本按以下优先级收集数据:
- yfinance — 市场数据、历史财务数据、贝塔系数、分析师预期
- yfinance ^TNX — 10年期美国国债收益率,作为无风险利率的替代指标
- 用户补充 — 对于yfinance返回NaN的年份(需告知用户,不得自行猜测)
Step 2: Validate Data
步骤2:验证数据
bash
python scripts/validate_data.py path/to/output.jsonChecks: field completeness, cross-field consistency (Market Cap = Price × Shares), range sanity (WACC 5-20%, beta 0.3-3.0), sign conventions.
bash
python scripts/validate_data.py path/to/output.json检查内容:字段完整性、跨字段一致性(市值 = 股价 × 股份数)、数值合理性(WACC 5-20%,贝塔系数0.3-3.0)、符号约定。
Step 3: Deliver JSON
步骤3:交付JSON
Single file: . Schema in .
{TICKER}_financial_data.jsonreferences/output-schema.mdDo NOT create: README, CSV, summary reports, or any auxiliary files.
输出单个文件:。完整 schema 请查看。
{TICKER}_financial_data.jsonreferences/output-schema.md禁止生成:README、CSV、摘要报告或任何辅助文件。
Output Schema (Summary)
输出Schema(摘要)
json
{
"ticker": "META",
"company_name": "Meta Platforms, Inc.",
"data_date": "2026-03-02",
"currency": "USD",
"unit": "millions_usd",
"data_sources": { "market_data": "...", "2022_to_2024": "..." },
"market_data": { "current_price": 648.18, "shares_outstanding_millions": 2187, "market_cap_millions": 1639607, "beta_5y_monthly": 1.284 },
"income_statement": { "2024": { "revenue": 164501, "ebit": 69380, "tax_expense": ..., "net_income": ..., "_source": "yfinance" } },
"cash_flow": { "2024": { "operating_cash_flow": ..., "capex": -37256, "depreciation_amortization": 15498, "free_cash_flow": ..., "change_in_nwc": ..., "_source": "yfinance" } },
"balance_sheet": { "2024": { "total_debt": 30768, "cash_and_equivalents": 77815, "net_debt": -47047, "current_assets": ..., "current_liabilities": ..., "_source": "yfinance" } },
"wacc_inputs": { "risk_free_rate": 0.0396, "beta": 1.284, "credit_rating": null, "_source": "yfinance + ^TNX" },
"analyst_estimates": { "revenue_next_fy": 251113, "revenue_fy_after": 295558, "eps_next_fy": 29.59, "_source": "yfinance" },
"metadata": { "_capex_convention": "negative = cash outflow", "_fcf_note": "yfinance FCF = OperatingCF + CapEx. Does NOT deduct SBC." }
}Full schema with all field definitions:
references/output-schema.md<correct_patterns>
json
{
"ticker": "META",
"company_name": "Meta Platforms, Inc.",
"data_date": "2026-03-02",
"currency": "USD",
"unit": "millions_usd",
"data_sources": { "market_data": "...", "2022_to_2024": "..." },
"market_data": { "current_price": 648.18, "shares_outstanding_millions": 2187, "market_cap_millions": 1639607, "beta_5y_monthly": 1.284 },
"income_statement": { "2024": { "revenue": 164501, "ebit": 69380, "tax_expense": ..., "net_income": ..., "_source": "yfinance" } },
"cash_flow": { "2024": { "operating_cash_flow": ..., "capex": -37256, "depreciation_amortization": 15498, "free_cash_flow": ..., "change_in_nwc": ..., "_source": "yfinance" } },
"balance_sheet": { "2024": { "total_debt": 30768, "cash_and_equivalents": 77815, "net_debt": -47047, "current_assets": ..., "current_liabilities": ..., "_source": "yfinance" } },
"wacc_inputs": { "risk_free_rate": 0.0396, "beta": 1.284, "credit_rating": null, "_source": "yfinance + ^TNX" },
"analyst_estimates": { "revenue_next_fy": 251113, "revenue_fy_after": 295558, "eps_next_fy": 29.59, "_source": "yfinance" },
"metadata": { "_capex_convention": "negative = cash outflow", "_fcf_note": "yfinance FCF = OperatingCF + CapEx. Does NOT deduct SBC." }
}包含所有字段定义的完整schema:
references/output-schema.md<correct_patterns>
Handling Missing Years
缺失年份的处理
python
if pd.isna(revenue):
result[year] = {"revenue": None, "_source": "yfinance returned NaN — supplement from 10-K"}python
if pd.isna(revenue):
result[year] = {"revenue": None, "_source": "yfinance returned NaN — supplement from 10-K"}Report missing years to the user. Do NOT skip or fill with estimates.
Report missing years to the user. Do NOT skip or fill with estimates.
undefinedundefinedCapEx Sign Preservation
CapEx符号保留
python
capex = cash_flow.loc["Capital Expenditure", year_col] # -37256.0
result["capex"] = float(capex) # Preserve negativepython
capex = cash_flow.loc["Capital Expenditure", year_col] # -37256.0
result["capex"] = float(capex) # Preserve negativeDatetime Column Indexing
日期列索引
python
year_col = [c for c in financials.columns if c.year == target_year][0]
revenue = financials.loc["Total Revenue", year_col]python
year_col = [c for c in financials.columns if c.year == target_year][0]
revenue = financials.loc["Total Revenue", year_col]Field Name Guards
字段名称兼容处理
python
if "Total Revenue" in financials.index:
revenue = financials.loc["Total Revenue", year_col]
elif "Revenue" in financials.index:
revenue = financials.loc["Revenue", year_col]
else:
revenue = None</correct_patterns>
<common_mistakes>
python
if "Total Revenue" in financials.index:
revenue = financials.loc["Total Revenue", year_col]
elif "Revenue" in financials.index:
revenue = financials.loc["Revenue", year_col]
else:
revenue = None</correct_patterns>
<common_mistakes>
Mistake 1: Default Values for Missing Data
错误1:为缺失数据设置默认值
python
undefinedpython
undefined❌ WRONG
❌ WRONG
beta = info.get("beta", 1.0)
growth = data.get("growth") or 0.02
beta = info.get("beta", 1.0)
growth = data.get("growth") or 0.02
✅ RIGHT
✅ RIGHT
beta = info.get("beta") # May be None — that's OK
undefinedbeta = info.get("beta") # May be None — that's OK
undefinedMistake 2: Assuming All Years Have Data
错误2:假设所有年份都有数据
python
undefinedpython
undefined❌ WRONG — 2020-2021 may be NaN
❌ WRONG — 2020-2021 may be NaN
revenue = float(financials.loc["Total Revenue", year_col])
revenue = float(financials.loc["Total Revenue", year_col])
✅ RIGHT
✅ RIGHT
value = financials.loc["Total Revenue", year_col]
revenue = float(value) if pd.notna(value) else None
undefinedvalue = financials.loc["Total Revenue", year_col]
revenue = float(value) if pd.notna(value) else None
undefinedMistake 3: Using yfinance FCF in DCF Models Directly
错误3:直接使用yfinance计算的FCF进行DCF建模
yfinance FCF does NOT deduct SBC. For mega-caps like META, SBC can be $20-30B/yr, making yfinance FCF ~30% higher than investment-bank FCF. Always flag this in output.
yfinance计算的FCF未扣除股权激励费用(SBC)。对于META这类大型科技公司,SBC每年可达200-300亿美元,导致yfinance计算的FCF比投行使用的FCF高出约30%。务必在输出中标记这一点。
Mistake 4: Flipping CapEx Sign
错误4:翻转CapEx符号
python
undefinedpython
undefined❌ WRONG — double-negation risk downstream
❌ WRONG — double-negation risk downstream
capex = abs(cash_flow.loc["Capital Expenditure", year_col])
capex = abs(cash_flow.loc["Capital Expenditure", year_col])
✅ RIGHT — preserve original, document convention
✅ RIGHT — preserve original, document convention
capex = float(cash_flow.loc["Capital Expenditure", year_col]) # -37256.0
</common_mistakes>capex = float(cash_flow.loc["Capital Expenditure", year_col]) # -37256.0
</common_mistakes>Known yfinance Pitfalls
已知yfinance问题
See for detailed field mapping and workarounds.
references/yfinance-pitfalls.md详细的字段映射和解决方案请查看。
references/yfinance-pitfalls.md