stata

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Stata Skill

Stata 技能指南

You have access to comprehensive Stata reference files. Do not load all files. Read only the 1-3 files relevant to the user's current task using the routing table below.

你可以访问全面的Stata参考文件。请勿加载所有文件。 请使用下方的路由表,仅读取与用户当前任务相关的1-3个文件。

Critical Gotchas

常见易犯陷阱

These are Stata-specific pitfalls that lead to silent bugs. Internalize these before writing any code.
这些是Stata特有的易导致隐性bug的陷阱。在编写任何代码前,请务必牢记。

Missing Values Sort to +Infinity

缺失值会被排序到正无穷

Stata's
.
(and
.a
-
.z
) are greater than all numbers.
stata
* WRONG — includes observations where income is missing!
gen high_income = (income > 50000)

* RIGHT
gen high_income = (income > 50000) if !missing(income)

* WRONG — missing ages appear in this list
list if age > 60

* RIGHT
list if age > 60 & !missing(age)
Stata中的
.
(以及
.a
-
.z
大于所有数值
stata
* 错误写法——会包含income为缺失值的观测!
gen high_income = (income > 50000)

* 正确写法
gen high_income = (income > 50000) if !missing(income)

* 错误写法——缺失的age会被包含在结果中
list if age > 60

* 正确写法
list if age > 60 & !missing(age)

=
vs
==

=
==
的区别

=
is assignment;
==
is comparison. Mixing them up is a syntax error or silent bug.
stata
* WRONG — syntax error
gen employed = 1 if status = 1

* RIGHT
gen employed = 1 if status == 1
=
用于赋值;
==
用于比较。混淆二者会导致语法错误或隐性bug。
stata
* 错误写法——语法错误
gen employed = 1 if status = 1

* 正确写法
gen employed = 1 if status == 1

Local Macro Syntax

局部宏语法

Locals use
`name'
(backtick + single-quote). Globals use
$name
or
${name}
. Forgetting the closing quote is the #1 macro bug.
stata
local controls "age education income"
regress wage `controls'        // correct
regress wage `controls         // WRONG — missing closing quote
regress wage 'controls'        // WRONG — wrong quote characters
局部宏使用
`name'
(反引号+单引号)。全局宏使用
$name
${name}
。 忘记闭合引号是最常见的宏错误。
stata
local controls "age education income"
regress wage `controls'        // 正确写法
regress wage `controls         // 错误写法——缺少闭合引号
regress wage 'controls'        // 错误写法——引号类型错误

by
Requires Prior Sort (Use
bysort
)

by
命令需要预先排序(推荐使用
bysort

stata
* WRONG — error if data not sorted by id
by id: gen first = (_n == 1)

* RIGHT — bysort sorts automatically
bysort id: gen first = (_n == 1)

* Also RIGHT — explicit sort
sort id
by id: gen first = (_n == 1)
stata
* 错误写法——如果数据未按id排序会报错
by id: gen first = (_n == 1)

* 正确写法——bysort会自动排序
bysort id: gen first = (_n == 1)

* 另一种正确写法——显式排序
sort id
by id: gen first = (_n == 1)

Factor Variable Notation (
i.
and
c.
)

因子变量表示法(
i.
c.

Use
i.
for categorical,
c.
for continuous. Omitting
i.
treats categories as continuous.
stata
* WRONG — treats race as continuous (e.g., race=3 has 3x effect of race=1)
regress wage race education

* RIGHT — creates dummies automatically
regress wage i.race education

* Interactions
regress wage i.race##c.education    // full interaction
regress wage i.race#c.education     // interaction only (no main effects)
使用
i.
表示分类变量,
c.
表示连续变量。省略
i.
会将分类变量视为连续变量处理。
stata
* 错误写法——将race视为连续变量(例如race=3的影响是race=1的3倍)
regress wage race education

* 正确写法——自动生成虚拟变量
regress wage i.race education

* 交互项
regress wage i.race##c.education    // 完整交互项(包含主效应)
regress wage i.race#c.education     // 仅交互项(无主效应)

generate
vs
replace

generate
replace
的区别

generate
creates new variables;
replace
modifies existing ones. Using
generate
on an existing variable name is an error.
stata
gen x = 1
gen x = 2          // ERROR: x already defined
replace x = 2      // correct
generate
用于创建新变量;
replace
用于修改现有变量。对已存在的变量使用
generate
会报错。
stata
gen x = 1
gen x = 2          // 错误:x已定义
replace x = 2      // 正确写法

String Comparison Is Case-Sensitive

字符串比较区分大小写

stata
* May miss "Male", "MALE", etc.
keep if gender == "male"

* Safer
keep if lower(gender) == "male"
stata
* 可能会遗漏"Male"、"MALE"等情况
keep if gender == "male"

* 更安全的写法
keep if lower(gender) == "male"

merge
Always Check
_merge

merge
后务必检查
_merge

Never skip
tab _merge
— it costs nothing and is the only diagnostic you get when
assert
fails.
stata
merge 1:1 id using other.dta
tab _merge                      // ALWAYS tab before assert
assert _merge == 3              // fails silently without tab output
drop _merge
永远不要跳过
tab _merge
——这不会花费任何成本,而且是
assert
失败时唯一的诊断方式。
stata
merge 1:1 id using other.dta
tab _merge                      // 合并后务必执行此命令
assert _merge == 3              // 没有tab输出的话,断言失败时无提示
drop _merge

preserve
/
restore
+
tempfile
for Collapse-Merge-Back

preserve
/
restore
+
tempfile
用于聚合-合并-还原场景

The standard pattern for computing group stats and merging them onto the original data:
stata
tempfile stats
preserve
collapse (mean) avg_x=x, by(group)
save `stats'
restore
merge m:1 group using `stats'
tab _merge
assert _merge == 3
drop _merge
For simple group means,
bysort group: egen avg_x = mean(x)
avoids the round-trip entirely.
计算分组统计量并合并回原始数据的标准模式:
stata
tempfile stats
preserve
collapse (mean) avg_x=x, by(group)
save `stats'
restore
merge m:1 group using `stats'
tab _merge
assert _merge == 3
drop _merge
对于简单的分组均值,使用
bysort group: egen avg_x = mean(x)
可以完全避免上述往返操作。

Weights Are Not Interchangeable

各类权重不可互换

  • fweight
    — frequency weights (replication)
  • aweight
    — analytic/regression weights (inverse variance)
  • pweight
    — probability/sampling weights (survey data, implies robust SE)
  • iweight
    — importance weights (rarely used)
  • fweight
    —— 频率权重(重复观测)
  • aweight
    —— 分析/回归权重(逆方差)
  • pweight
    —— 概率/抽样权重(调查数据,隐含稳健标准误)
  • iweight
    —— 重要性权重(极少使用)

capture
Swallows Errors

capture
会捕获错误

stata
capture some_command
if _rc != 0 {
    di as error "Failed with code: " _rc
    exit _rc
}
stata
capture some_command
if _rc != 0 {
    di as error "执行失败,错误码: " _rc
    exit _rc
}

Line Continuation Uses
///

行续行使用
///

stata
regress y x1 x2 x3 ///
    x4 x5 x6, ///
    vce(robust)
stata
regress y x1 x2 x3 ///
    x4 x5 x6, ///
    vce(robust)

Stored Results:
r()
vs
e()
vs
s()

存储结果:
r()
vs
e()
vs
s()

  • r()
    — r-class commands (summarize, tabulate, etc.)
  • e()
    — e-class commands (estimation: regress, logit, etc.)
  • s()
    — s-class commands (parsing)
A new estimation command overwrites previous
e()
results. Store them first:
stata
regress y x1 x2
estimates store model1

  • r()
    —— r类命令(如summarize、tabulate等)
  • e()
    —— e类命令(估计命令:regress、logit等)
  • s()
    —— s类命令(解析命令)
新的估计命令会覆盖之前的
e()
结果。请先存储结果:
stata
regress y x1 x2
estimates store model1

Running Stata from the Command Line

从命令行运行Stata

Claude can execute Stata code by running
.do
files in batch mode from the terminal. This is how to run Stata non-interactively.
Claude可以通过在终端中以批处理模式运行
.do
文件来执行Stata代码。这是无交互运行Stata的方式。

Finding the Stata Binary

查找Stata可执行文件

Stata on macOS is a
.app
bundle. The actual binary is inside it. Common locations:
undefined
macOS上的Stata是一个
.app
包,实际的可执行文件在包内部。常见路径:
undefined

Stata 18 / StataNow (most common)

Stata 18 / StataNow(最常见)

/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp /Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp /Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp

Other editions (SE, BE)

其他版本(SE、BE)

/Applications/Stata/StataSE.app/Contents/MacOS/stata-se /Applications/Stata/StataBE.app/Contents/MacOS/stata-be

If Stata isn't on `$PATH`, find it with: `mdfind -name "stata-mp" | grep MacOS`
/Applications/Stata/StataSE.app/Contents/MacOS/stata-se /Applications/Stata/StataBE.app/Contents/MacOS/stata-be

如果Stata不在`$PATH`中,可以使用以下命令查找:`mdfind -name "stata-mp" | grep MacOS`

Batch Mode (
-b
)

批处理模式(
-b

bash
undefined
bash
undefined

Run a .do file in batch mode — output goes to <filename>.log

以批处理模式运行.do文件——输出会写入<filename>.log

/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp -b do analysis.do
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp -b do analysis.do

If stata-mp is on PATH (e.g., via symlink or alias):

如果stata-mp已在PATH中(例如通过符号链接或别名):

stata-mp -b do analysis.do

- `-b` = batch mode (non-interactive, no GUI)
- Output (everything Stata would display) is written to `analysis.log` in the working directory
- Exit code is 0 on success, non-zero on error
- The log file contains all output, including error messages — check it after execution
stata-mp -b do analysis.do

- `-b` = 批处理模式(无交互,无图形界面)
- 输出(Stata会显示的所有内容)会写入当前工作目录下的`analysis.log`
- 执行成功时退出码为0,失败时为非0值
- 日志文件包含所有输出,包括错误信息——执行后请务必检查

Running Inline Stata Code

运行单行Stata代码

To run a quick Stata snippet without creating a
.do
file:
bash
undefined
无需创建
.do
文件即可快速运行Stata代码片段:
bash
undefined

Write a temp .do file and run it

创建临时.do文件并运行

cat > /tmp/stata_run.do << 'EOF' sysuse auto, clear summarize price mpg EOF stata-mp -b do /tmp/stata_run.do cat /tmp/stata_run.log
undefined
cat > /tmp/stata_run.do << 'EOF' sysuse auto, clear summarize price mpg EOF stata-mp -b do /tmp/stata_run.do cat /tmp/stata_run.log
undefined

Checking Results

检查执行结果

bash
undefined
bash
undefined

Check if it succeeded

检查是否执行成功

stata-mp -b do tests/run_tests.do && echo "SUCCESS" || echo "FAILED"
stata-mp -b do tests/run_tests.do && echo "执行成功" || echo "执行失败"

Search the log for pass/fail

在日志中搜索执行结果

grep -E "PASS|FAIL|error|r([0-9]+)" run_tests.log
undefined
grep -E "PASS|FAIL|error|r([0-9]+)" run_tests.log
undefined

Tips

注意事项

  • clear all
    at the top of batch scripts
    — batch mode starts with a fresh Stata session, but
    clear all
    ensures no stale state from prior runs in the same session.
  • set more off
    — prevents Stata from pausing for
    --more--
    prompts (fatal in batch mode).
  • Log files overwrite silently
    analysis.do
    always writes to
    analysis.log
    in the current directory. If you run multiple
    .do
    files, check the right log.
  • Working directory — Stata's working directory is wherever you run the command from, not where the
    .do
    file lives. Use
    cd
    in the
    .do
    file or absolute paths if needed.

  • 批处理脚本开头使用
    clear all
    ——批处理模式会启动全新的Stata会话,但
    clear all
    可确保同一会话中之前的运行不会留下残留状态。
  • 设置
    set more off
    ——防止Stata因
    --more--
    提示而暂停(批处理模式下会导致致命错误)。
  • 日志文件会被静默覆盖——
    analysis.do
    始终会写入当前目录下的
    analysis.log
    。如果运行多个
    .do
    文件,请检查对应的日志。
  • 工作目录——Stata的工作目录是你运行命令的目录,而非
    .do
    文件所在的目录。如果需要,请在
    .do
    文件中使用
    cd
    命令或绝对路径。

Routing Table

路由表

Read only the files relevant to the user's task. Paths are relative to this SKILL.md file.
仅读取与用户任务相关的文件。路径相对于本SKILL.md文件。

Data Operations

数据操作

FileTopics & Key Commands
references/basics-getting-started.md
use
,
save
,
describe
,
browse
,
sysuse
, basic workflow
references/data-import-export.md
import delimited
,
import excel
, ODBC,
export
, web data
references/data-management.md
generate
,
replace
,
merge
,
append
,
reshape
,
collapse
,
recode
,
egen
,
encode
/
decode
references/variables-operators.md
Variable types,
byte
/
int
/
long
/
float
/
double
, operators, missing values (
.<.a
),
if
/
in
qualifiers
references/string-functions.md
substr()
,
regexm()
,
strtrim()
,
split
,
ustrlen()
, regex, Unicode
references/date-time-functions.md
date()
,
clock()
,
%td
/
%tc
formats,
mdy()
,
dofm()
, business calendars
references/mathematical-functions.md
round()
,
log()
,
exp()
,
abs()
,
mod()
,
cond()
, distributions, random numbers
文件主题与关键命令
references/basics-getting-started.md
use
,
save
,
describe
,
browse
,
sysuse
, 基础工作流
references/data-import-export.md
import delimited
,
import excel
, ODBC,
export
, 网络数据
references/data-management.md
generate
,
replace
,
merge
,
append
,
reshape
,
collapse
,
recode
,
egen
,
encode
/
decode
references/variables-operators.md
变量类型,
byte
/
int
/
long
/
float
/
double
, 运算符, 缺失值(
.<.a
),
if
/
in
限定符
references/string-functions.md
substr()
,
regexm()
,
strtrim()
,
split
,
ustrlen()
, 正则表达式, Unicode
references/date-time-functions.md
date()
,
clock()
,
%td
/
%tc
格式,
mdy()
,
dofm()
, 商务日历
references/mathematical-functions.md
round()
,
log()
,
exp()
,
abs()
,
mod()
,
cond()
, 分布函数, 随机数

Statistics & Econometrics

统计与计量经济学

FileTopics & Key Commands
references/descriptive-statistics.md
summarize
,
tabulate
,
correlate
,
tabstat
,
codebook
, weighted stats
references/linear-regression.md
regress
,
vce(robust)
,
vce(cluster)
,
test
,
lincom
,
margins
,
predict
,
ivregress
references/panel-data.md
xtset
,
xtreg fe
/
re
, Hausman test,
xtabond
, dynamic panels
references/time-series.md
tsset
, ARIMA, VAR,
dfuller
,
pperron
,
irf
, forecasting
references/limited-dependent-variables.md
logit
,
probit
,
tobit
,
poisson
,
nbreg
,
mlogit
,
ologit
,
margins
for nonlinear
references/bootstrap-simulation.md
bootstrap
,
simulate
,
permute
, Monte Carlo
references/survey-data-analysis.md
svyset
,
svy:
,
subpop()
, complex survey design, replicate weights
references/missing-data-handling.md
mi impute
,
mi estimate
, FIML,
misstable
, diagnostics
references/maximum-likelihood.md
ml model
, custom likelihood functions,
ml init
, gradient-based optimization
references/gmm-estimation.md
gmm
, moment conditions,
estat overid
, J-test
文件主题与关键命令
references/descriptive-statistics.md
summarize
,
tabulate
,
correlate
,
tabstat
,
codebook
, 加权统计
references/linear-regression.md
regress
,
vce(robust)
,
vce(cluster)
,
test
,
lincom
,
margins
,
predict
,
ivregress
references/panel-data.md
xtset
,
xtreg fe
/
re
, Hausman检验,
xtabond
, 动态面板
references/time-series.md
tsset
, ARIMA, VAR,
dfuller
,
pperron
,
irf
, 预测
references/limited-dependent-variables.md
logit
,
probit
,
tobit
,
poisson
,
nbreg
,
mlogit
,
ologit
, 非线性模型的
margins
references/bootstrap-simulation.md
bootstrap
,
simulate
,
permute
, 蒙特卡洛模拟
references/survey-data-analysis.md
svyset
,
svy:
,
subpop()
, 复杂调查设计, 重复权重
references/missing-data-handling.md
mi impute
,
mi estimate
, FIML,
misstable
, 诊断
references/maximum-likelihood.md
ml model
, 自定义似然函数,
ml init
, 梯度优化
references/gmm-estimation.md
gmm
, 矩条件,
estat overid
, J检验

Causal Inference

因果推断

FileTopics & Key Commands
references/treatment-effects.md
teffects ra/ipw/ipwra/aipw
,
stteffects
, ATE/ATT/ATET
references/difference-in-differences.md
DiD, parallel trends, event studies, staggered adoption
references/regression-discontinuity.md
Sharp/fuzzy RD, bandwidth selection,
rdplot
references/matching-methods.md
PSM, nearest neighbor, kernel matching,
teffects nnmatch
references/sample-selection.md
heckman
,
heckprobit
, treatment models, exclusion restrictions
文件主题与关键命令
references/treatment-effects.md
teffects ra/ipw/ipwra/aipw
,
stteffects
, ATE/ATT/ATET
references/difference-in-differences.md
双重差分法(DiD), 平行趋势, 事件研究, staggered adoption
references/regression-discontinuity.md
精确/模糊断点回归, 带宽选择,
rdplot
references/matching-methods.md
倾向得分匹配(PSM), 最近邻匹配, 核匹配,
teffects nnmatch
references/sample-selection.md
heckman
,
heckprobit
, 处理效应模型, 排他性约束

Advanced Methods

高级方法

FileTopics & Key Commands
references/survival-analysis.md
stset
,
stcox
,
streg
, Kaplan-Meier, parametric models
references/sem-factor-analysis.md
sem
,
gsem
, CFA, path analysis,
alpha
, reliability
references/nonparametric-methods.md
kdensity
, rank tests,
qreg
,
npregress
references/spatial-analysis.md
spmatrix
,
spregress
, spatial weights, Moran's I
references/machine-learning.md
lasso
,
elasticnet
,
cvlasso
, cross-validation
文件主题与关键命令
references/survival-analysis.md
stset
,
stcox
,
streg
, Kaplan-Meier, 参数模型
references/sem-factor-analysis.md
sem
,
gsem
, 验证性因子分析(CFA), 路径分析,
alpha
, 信度分析
references/nonparametric-methods.md
kdensity
, 秩检验,
qreg
,
npregress
references/spatial-analysis.md
spmatrix
,
spregress
, 空间权重矩阵, Moran's I
references/machine-learning.md
lasso
,
elasticnet
,
cvlasso
, 交叉验证

Graphics

绘图

FileTopics & Key Commands
references/graphics.md
twoway
,
scatter
,
line
,
bar
,
histogram
,
graph combine
,
graph export
, schemes
文件主题与关键命令
references/graphics.md
twoway
,
scatter
,
line
,
bar
,
histogram
,
graph combine
,
graph export
, 绘图样式

Programming

编程

FileTopics & Key Commands
references/programming-basics.md
local
,
global
,
foreach
,
forvalues
,
program define
,
syntax
,
return
references/advanced-programming.md
syntax
,
mata
, classes,
_prefix
, dialog boxes,
tempfile
/
tempvar
references/mata-introduction.md
Mata basics, when to use Mata vs ado, data types
references/mata-programming.md
Mata functions, flow control, structures, pointers
references/mata-matrix-operations.md
Matrix creation, decompositions, solvers,
st_matrix()
references/mata-data-access.md
st_data()
,
st_view()
,
st_store()
, performance tips
文件主题与关键命令
references/programming-basics.md
local
,
global
,
foreach
,
forvalues
,
program define
,
syntax
,
return
references/advanced-programming.md
syntax
,
mata
, 类,
_prefix
, 对话框,
tempfile
/
tempvar
references/mata-introduction.md
Mata基础, Mata与ado的适用场景, 数据类型
references/mata-programming.md
Mata函数, 流程控制, 数据结构, 指针
references/mata-matrix-operations.md
矩阵创建, 矩阵分解, 求解器,
st_matrix()
references/mata-data-access.md
st_data()
,
st_view()
,
st_store()
, 性能优化技巧

Output & Workflow

输出与工作流

FileTopics & Key Commands
references/tables-reporting.md
putexcel
,
putdocx
,
putpdf
, LaTeX integration,
collect
references/workflow-best-practices.md
Project structure, master do-files, version control, debugging, common mistakes
references/external-tools-integration.md
Python via
python:
, R via
rsource
, shell commands, Git
references/filing-issues.md
User wants to report a Stata skill documentation gap or error to the repository
文件主题与关键命令
references/tables-reporting.md
putexcel
,
putdocx
,
putpdf
, LaTeX集成,
collect
references/workflow-best-practices.md
项目结构, 主do文件, 版本控制, 调试, 常见错误
references/external-tools-integration.md
通过
python:
调用Python, 通过
rsource
调用R, shell命令, Git
references/filing-issues.md
用户需要向仓库报告Stata技能文档的漏洞或错误

Community Packages

社区包

FileWhat It Does
packages/reghdfe.md
High-dimensional fixed effects OLS (absorbs multiple FE sets efficiently)
packages/estout.md
esttab
/
estout
: publication-quality regression tables
packages/outreg2.md
Alternative regression table exporter (Word, Excel, TeX)
packages/asdoc.md
One-command Word document creation for any Stata output
packages/tabout.md
Cross-tabulations and summary tables to file
packages/coefplot.md
Coefficient plots from stored estimates
packages/graph-schemes.md
grstyle
,
schemepack
,
plotplain
— better graph themes
packages/did.md
Modern DiD:
csdid
,
did_multiplegt
,
did_imputation
(Callaway-Sant'Anna, de Chaisemartin-D'Haultfoeuille, Borusyak-Jaravel-Spiess)
packages/event-study.md
eventstudyinteract
,
eventdd
— event study estimators
packages/rdrobust.md
Robust RD estimation with optimal bandwidth (
rdrobust
,
rdplot
,
rdbwselect
)
packages/psmatch2.md
Propensity score matching (nearest neighbor, kernel, radius)
packages/synth.md
Synthetic control method (
synth
,
synth_runner
)
packages/ivreg2.md
Enhanced IV/2SLS:
ivreg2
,
xtivreg2
with additional diagnostics
packages/xtabond2.md
Dynamic panel GMM (Arellano-Bond/Blundell-Bond)
packages/binsreg.md
Binned scatter plots with CI (
binsreg
,
binstest
)
packages/nprobust.md
Nonparametric kernel estimation and inference
packages/diagnostics.md
bacondecomp
,
xttest3
, collinearity, heteroskedasticity tests
packages/winsor.md
Winsorizing and trimming:
winsor2
,
winsor
packages/data-manipulation.md
gtools
(fast collapse/egen),
rangestat
,
egenmore
packages/package-management.md
ssc install
,
net install
,
ado update
, finding packages

文件功能描述
packages/reghdfe.md
高维固定效应OLS(高效吸收多个固定效应集合)
packages/estout.md
esttab
/
estout
: 符合出版要求的回归表格
packages/outreg2.md
替代回归表格导出工具(支持Word、Excel、TeX)
packages/asdoc.md
一键将任意Stata输出生成Word文档
packages/tabout.md
生成交叉表和汇总表到文件
packages/coefplot.md
从存储的估计结果绘制系数图
packages/graph-schemes.md
grstyle
,
schemepack
,
plotplain
——更美观的绘图主题
packages/did.md
现代双重差分法:
csdid
,
did_multiplegt
,
did_imputation
(Callaway-Sant'Anna、de Chaisemartin-D'Haultfoeuille、Borusyak-Jaravel-Spiess方法)
packages/event-study.md
eventstudyinteract
,
eventdd
——事件研究估计量
packages/rdrobust.md
稳健断点回归估计与最优带宽选择(
rdrobust
,
rdplot
,
rdbwselect
packages/psmatch2.md
倾向得分匹配(最近邻、核、半径匹配)
packages/synth.md
合成控制法(
synth
,
synth_runner
packages/ivreg2.md
增强版IV/2SLS:
ivreg2
,
xtivreg2
(含额外诊断)
packages/xtabond2.md
动态面板GMM(Arellano-Bond/Blundell-Bond方法)
packages/binsreg.md
带置信区间的分箱散点图(
binsreg
,
binstest
packages/nprobust.md
非参数核估计与推断
packages/diagnostics.md
bacondecomp
,
xttest3
, 共线性检验, 异方差检验
packages/winsor.md
缩尾与截尾处理:
winsor2
,
winsor
packages/data-manipulation.md
gtools
(快速聚合/egen),
rangestat
,
egenmore
packages/package-management.md
ssc install
,
net install
,
ado update
, 查找包

Common Patterns

常见代码模式

Regression Table Workflow

回归表格工作流

stata
* Estimate models
eststo clear
eststo: regress y x1 x2, vce(robust)
eststo: regress y x1 x2 x3, vce(robust)
eststo: regress y x1 x2 x3 x4, vce(cluster id)

* Export table
esttab using "results.tex", replace ///
    se star(* 0.10 ** 0.05 *** 0.01) ///
    label booktabs ///
    title("Main Results") ///
    mtitles("(1)" "(2)" "(3)")
stata
* 估计模型
eststo clear
eststo: regress y x1 x2, vce(robust)
eststo: regress y x1 x2 x3, vce(robust)
eststo: regress y x1 x2 x3 x4, vce(cluster id)

* 导出表格
esttab using "results.tex", replace ///
    se star(* 0.10 ** 0.05 *** 0.01) ///
    label booktabs ///
    title("主要结果") ///
    mtitles("(1)" "(2)" "(3)")

Panel Data Setup

面板数据设置

stata
xtset panelid timevar          // declare panel structure
xtdescribe                      // check balance
xtsum outcome                   // within/between variation

* Fixed effects
xtreg y x1 x2, fe vce(cluster panelid)
* Or with reghdfe (preferred for multiple FE)
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)
stata
xtset panelid timevar          // 声明面板结构
xtdescribe                      // 检查面板平衡性
xtsum outcome                   // 组内/组间变异

* 固定效应模型
xtreg y x1 x2, fe vce(cluster panelid)
* 或使用reghdfe(推荐用于多固定效应)
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)

Difference-in-Differences

双重差分法

stata
* Classic 2x2 DiD
gen post = (year >= treatment_year)
gen treat_post = treated * post
regress y treated post treat_post, vce(cluster id)

* Event study (uniform timing — must interact with treatment group)
reghdfe y ib(-1).rel_time#1.treated, absorb(id year) vce(cluster id)
testparm *.rel_time#1.treated   // pre-trend test

* Modern staggered DiD (Callaway & Sant'Anna)
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
csdid_plot
stata
* 经典2x2双重差分
gen post = (year >= treatment_year)
gen treat_post = treated * post
regress y treated post treat_post, vce(cluster id)

* 事件研究(统一处理时间——必须与处理组交互)
reghdfe y ib(-1).rel_time#1.treated, absorb(id year) vce(cluster id)
testparm *.rel_time#1.treated   // 平行趋势检验

* 现代多期双重差分(Callaway & Sant'Anna方法)
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
csdid_plot

Graph Export

图形导出

stata
* Publication-quality scatter with fit line
twoway (scatter y x, mcolor(navy%50) msize(small)) ///
       (lfit y x, lcolor(cranberry) lwidth(medthick)), ///
    title("Title Here") ///
    xtitle("X Label") ytitle("Y Label") ///
    legend(off) scheme(s2color)
graph export "figure1.pdf", replace as(pdf)
graph export "figure1.png", replace as(png) width(2400)
stata
* 符合出版要求的散点图与拟合线
twoway (scatter y x, mcolor(navy%50) msize(small)) ///
       (lfit y x, lcolor(cranberry) lwidth(medthick)), ///
    title("标题") ///
    xtitle("X轴标签") ytitle("Y轴标签") ///
    legend(off) scheme(s2color)
graph export "figure1.pdf", replace as(pdf)
graph export "figure1.png", replace as(png) width(2400)

Data Cleaning Pipeline

数据清洗流程

stata
* Load and inspect
import delimited "raw_data.csv", clear varnames(1)
describe
codebook, compact

* Clean
rename *, lower                 // lowercase all varnames
destring income, replace force  // convert string to numeric
replace income = . if income < 0

* Label
label variable income "Annual household income (USD)"
label define yesno 0 "No" 1 "Yes"
label values employed yesno

* Save
compress
save "clean_data.dta", replace
stata
* 加载并检查数据
import delimited "raw_data.csv", clear varnames(1)
describe
codebook, compact

* 数据清洗
rename *, lower                 // 所有变量名转为小写
destring income, replace force  // 将字符串转为数值型
replace income = . if income < 0

* 添加标签
label variable income "家庭年收入(美元)"
label define yesno 0 "否" 1 "是"
label values employed yesno

* 保存数据
compress
save "clean_data.dta", replace

Multiple Imputation

多重插补

stata
mi set mlong
mi register imputed income education
mi impute chained (regress) income (ologit) education = age i.gender, add(20) rseed(12345)
mi estimate: regress wage income education age i.gender

stata
mi set mlong
mi register imputed income education
mi impute chained (regress) income (ologit) education = age i.gender, add(20) rseed(12345)
mi estimate: regress wage income education age i.gender

Help Improve This Skill

帮助改进本技能

If you produce Stata code with a significant error — wrong syntax, incorrect command usage, or a gotcha you failed to catch — and the issue seems to stem from a gap in these reference files rather than a one-off mistake, consider suggesting to the user that they file an issue on the skill repository. This helps future users.
When to raise this: Only after you've already corrected the error and the user has working code. Frame it as optional: "I made an error with [X] that I think comes from a gap in the Stata skill documentation. If you'd like, I can help you file an issue or a PR so it gets fixed for everyone."
When NOT to raise this: If the user is on Claude Haiku, the error is more likely a model capability issue than a documentation gap. In that case, suggest they try Sonnet or Opus for complex Stata work instead of filing an issue.
If the user agrees, read
references/filing-issues.md
for instructions on writing a good issue report.
如果你生成的Stata代码存在严重错误——语法错误、命令使用不当,或者遗漏了本应注意的陷阱,且问题似乎源于参考文件的漏洞而非偶然错误,请考虑建议用户向仓库提交issue。这将帮助未来的用户。
建议提交issue的场景: 仅在你已经修正错误且用户获得可运行代码之后。请以可选的方式提出:"我在[X]方面犯了一个错误,我认为这是Stata技能文档的漏洞导致的。如果你愿意,我可以帮你提交issue或PR,让这个问题被修复以帮助所有人。"
不建议提交issue的场景: 如果用户使用的是Claude Haiku,错误更可能是模型能力问题而非文档漏洞。这种情况下,建议用户尝试Sonnet或Opus来处理复杂的Stata工作,而非提交issue。
如果用户同意,请阅读
references/filing-issues.md
以了解如何编写高质量的issue报告。