python-panel-data
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePython Panel Data
Python 面板数据分析
Purpose
用途
This skill helps economists run panel data models in Python using , , and , with correct fixed effects, clustering, and diagnostics.
pandasstatsmodelslinearmodels本技能帮助经济学家使用、和在Python中运行面板数据模型,包含正确的固定效应、聚类和诊断功能。
pandasstatsmodelslinearmodelsWhen to Use
适用场景
- Estimating fixed effects or random effects models
- Running difference-in-differences on panel data
- Creating regression tables and plots in Python
- 估计固定效应或随机效应模型
- 对面板数据进行双重差分(DiD)分析
- 在Python中创建回归表格和图表
Instructions
操作步骤
Follow these steps to complete the task:
按照以下步骤完成任务:
Step 1: Understand the Context
步骤1:了解上下文
Before generating any code, ask the user:
- What is the unit of observation and panel identifiers?
- Which outcomes and regressors are required?
- What fixed effects or time effects are needed?
- How should standard errors be clustered?
在生成代码前,请询问用户:
- 观测单位和面板标识符是什么?
- 需要哪些因变量和自变量?
- 需要哪些固定效应或时间效应?
- 标准误差应如何聚类?
Step 2: Generate the Output
步骤2:生成输出内容
Based on the context, generate Python code that:
- Loads and cleans the data with
pandas - Sets a MultiIndex for panel structure
- Fits the model using or
linearmodels.PanelOLSRandomEffects - Outputs results in a readable table and optional LaTeX
根据上下文,生成以下Python代码:
- 使用加载并清洗数据
pandas - 设置MultiIndex以构建面板结构
- 使用或
linearmodels.PanelOLS拟合模型RandomEffects - 以可读表格形式输出结果,可选导出为LaTeX格式
Step 3: Verify and Explain
步骤3:验证与解释
After generating output:
- Interpret key coefficients
- Note assumptions (strict exogeneity, parallel trends, etc.)
- Suggest robustness checks (alternative clustering, placebo tests)
生成输出后:
- 解释关键系数
- 标注假设条件(严格外生性、平行趋势等)
- 建议稳健性检验方法(替代聚类方式、安慰剂检验等)
Example Prompts
示例提示词
- "Run a two-way fixed effects model with firm and year effects"
- "Estimate a DiD using state and year fixed effects"
- "Export panel regression results to LaTeX"
- "运行包含企业和年份效应的双向固定效应模型"
- "使用州和年份固定效应估计双重差分模型(DiD)"
- "将面板回归结果导出为LaTeX格式"
Example Output
示例输出
python
undefinedpython
undefined============================================
============================================
Panel Data Analysis in Python
Panel Data Analysis in Python
============================================
============================================
import pandas as pd
from linearmodels.panel import PanelOLS
import pandas as pd
from linearmodels.panel import PanelOLS
Load data
Load data
df = pd.read_csv("panel_data.csv")
df = pd.read_csv("panel_data.csv")
Set panel index
Set panel index
df = df.set_index(["firm_id", "year"])
df = df.set_index(["firm_id", "year"])
Create treatment indicator
Create treatment indicator
df["treat_post"] = df["treated"] * df["post"]
df["treat_post"] = df["treated"] * df["post"]
Two-way fixed effects model
Two-way fixed effects model
model = PanelOLS.from_formula(
"outcome ~ 1 + treat_post + EntityEffects + TimeEffects",
data=df
)
results = model.fit(cov_type="clustered", cluster_entity=True)
print(results.summary)
undefinedmodel = PanelOLS.from_formula(
"outcome ~ 1 + treat_post + EntityEffects + TimeEffects",
data=df
)
results = model.fit(cov_type="clustered", cluster_entity=True)
print(results.summary)
undefinedRequirements
环境要求
Software
软件
- Python 3.10+
- Python 3.10+
Packages
依赖包
pandaslinearmodelsstatsmodels
Install with:
bash
pip install pandas linearmodels statsmodelspandaslinearmodelsstatsmodels
安装命令:
bash
pip install pandas linearmodels statsmodelsBest Practices
最佳实践
- Always verify panel identifiers and balanced vs unbalanced panels
- Cluster standard errors at the appropriate level
- Check for missing data before estimation
- 始终验证面板标识符,区分平衡面板与非平衡面板
- 在合适的层级对标准误差进行聚类
- 估计前检查缺失数据
Common Pitfalls
常见误区
- Failing to set a proper panel index
- Using pooled OLS when fixed effects are required
- Misinterpreting coefficients without accounting for fixed effects
- 未设置正确的面板索引
- 在需要固定效应时使用混合OLS模型
- 未考虑固定效应就误解释系数
References
参考资料
Changelog
更新日志
v1.0.0
v1.0.0
- Initial release
- 初始版本