stata
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStata Skill
Stata 技能指南
You have access to comprehensive Stata reference files. Do not load all files.
Read only the 1-3 files relevant to the user's current task using the routing table below.
你可以访问全面的Stata参考文件。请勿加载所有文件。
请使用下方的路由表,仅读取与用户当前任务相关的1-3个文件。
Critical Gotchas
常见易犯陷阱
These are Stata-specific pitfalls that lead to silent bugs. Internalize these before writing any code.
这些是Stata特有的易导致隐性bug的陷阱。在编写任何代码前,请务必牢记。
Missing Values Sort to +Infinity
缺失值会被排序到正无穷
Stata's (and -) are greater than all numbers.
..a.zstata
* WRONG — includes observations where income is missing!
gen high_income = (income > 50000)
* RIGHT
gen high_income = (income > 50000) if !missing(income)
* WRONG — missing ages appear in this list
list if age > 60
* RIGHT
list if age > 60 & !missing(age)Stata中的(以及-)大于所有数值。
..a.zstata
* 错误写法——会包含income为缺失值的观测!
gen high_income = (income > 50000)
* 正确写法
gen high_income = (income > 50000) if !missing(income)
* 错误写法——缺失的age会被包含在结果中
list if age > 60
* 正确写法
list if age > 60 & !missing(age)=
vs ==
====
与 ==
的区别
======stata
* WRONG — syntax error
gen employed = 1 if status = 1
* RIGHT
gen employed = 1 if status == 1===stata
* 错误写法——语法错误
gen employed = 1 if status = 1
* 正确写法
gen employed = 1 if status == 1Local Macro Syntax
局部宏语法
Locals use (backtick + single-quote). Globals use or .
Forgetting the closing quote is the #1 macro bug.
`name'$name${name}stata
local controls "age education income"
regress wage `controls' // correct
regress wage `controls // WRONG — missing closing quote
regress wage 'controls' // WRONG — wrong quote characters局部宏使用(反引号+单引号)。全局宏使用或。
忘记闭合引号是最常见的宏错误。
`name'$name${name}stata
local controls "age education income"
regress wage `controls' // 正确写法
regress wage `controls // 错误写法——缺少闭合引号
regress wage 'controls' // 错误写法——引号类型错误by
Requires Prior Sort (Use bysort
)
bybysortby
命令需要预先排序(推荐使用bysort
)
bybysortstata
* WRONG — error if data not sorted by id
by id: gen first = (_n == 1)
* RIGHT — bysort sorts automatically
bysort id: gen first = (_n == 1)
* Also RIGHT — explicit sort
sort id
by id: gen first = (_n == 1)stata
* 错误写法——如果数据未按id排序会报错
by id: gen first = (_n == 1)
* 正确写法——bysort会自动排序
bysort id: gen first = (_n == 1)
* 另一种正确写法——显式排序
sort id
by id: gen first = (_n == 1)Factor Variable Notation (i.
and c.
)
i.c.因子变量表示法(i.
和 c.
)
i.c.Use for categorical, for continuous. Omitting treats categories as continuous.
i.c.i.stata
* WRONG — treats race as continuous (e.g., race=3 has 3x effect of race=1)
regress wage race education
* RIGHT — creates dummies automatically
regress wage i.race education
* Interactions
regress wage i.race##c.education // full interaction
regress wage i.race#c.education // interaction only (no main effects)使用表示分类变量,表示连续变量。省略会将分类变量视为连续变量处理。
i.c.i.stata
* 错误写法——将race视为连续变量(例如race=3的影响是race=1的3倍)
regress wage race education
* 正确写法——自动生成虚拟变量
regress wage i.race education
* 交互项
regress wage i.race##c.education // 完整交互项(包含主效应)
regress wage i.race#c.education // 仅交互项(无主效应)generate
vs replace
generatereplacegenerate
与 replace
的区别
generatereplacegeneratereplacegeneratestata
gen x = 1
gen x = 2 // ERROR: x already defined
replace x = 2 // correctgeneratereplacegeneratestata
gen x = 1
gen x = 2 // 错误:x已定义
replace x = 2 // 正确写法String Comparison Is Case-Sensitive
字符串比较区分大小写
stata
* May miss "Male", "MALE", etc.
keep if gender == "male"
* Safer
keep if lower(gender) == "male"stata
* 可能会遗漏"Male"、"MALE"等情况
keep if gender == "male"
* 更安全的写法
keep if lower(gender) == "male"merge
Always Check _merge
merge_mergemerge
后务必检查 _merge
merge_mergeNever skip — it costs nothing and is the only diagnostic you get when fails.
tab _mergeassertstata
merge 1:1 id using other.dta
tab _merge // ALWAYS tab before assert
assert _merge == 3 // fails silently without tab output
drop _merge永远不要跳过——这不会花费任何成本,而且是失败时唯一的诊断方式。
tab _mergeassertstata
merge 1:1 id using other.dta
tab _merge // 合并后务必执行此命令
assert _merge == 3 // 没有tab输出的话,断言失败时无提示
drop _mergepreserve
/ restore
+ tempfile
for Collapse-Merge-Back
preserverestoretempfilepreserve
/ restore
+ tempfile
用于聚合-合并-还原场景
preserverestoretempfileThe standard pattern for computing group stats and merging them onto the original data:
stata
tempfile stats
preserve
collapse (mean) avg_x=x, by(group)
save `stats'
restore
merge m:1 group using `stats'
tab _merge
assert _merge == 3
drop _mergeFor simple group means, avoids the round-trip entirely.
bysort group: egen avg_x = mean(x)计算分组统计量并合并回原始数据的标准模式:
stata
tempfile stats
preserve
collapse (mean) avg_x=x, by(group)
save `stats'
restore
merge m:1 group using `stats'
tab _merge
assert _merge == 3
drop _merge对于简单的分组均值,使用可以完全避免上述往返操作。
bysort group: egen avg_x = mean(x)Weights Are Not Interchangeable
各类权重不可互换
- — frequency weights (replication)
fweight - — analytic/regression weights (inverse variance)
aweight - — probability/sampling weights (survey data, implies robust SE)
pweight - — importance weights (rarely used)
iweight
- —— 频率权重(重复观测)
fweight - —— 分析/回归权重(逆方差)
aweight - —— 概率/抽样权重(调查数据,隐含稳健标准误)
pweight - —— 重要性权重(极少使用)
iweight
capture
Swallows Errors
capturecapture
会捕获错误
capturestata
capture some_command
if _rc != 0 {
di as error "Failed with code: " _rc
exit _rc
}stata
capture some_command
if _rc != 0 {
di as error "执行失败,错误码: " _rc
exit _rc
}Line Continuation Uses ///
///行续行使用 ///
///stata
regress y x1 x2 x3 ///
x4 x5 x6, ///
vce(robust)stata
regress y x1 x2 x3 ///
x4 x5 x6, ///
vce(robust)Stored Results: r()
vs e()
vs s()
r()e()s()存储结果:r()
vs e()
vs s()
r()e()s()- — r-class commands (summarize, tabulate, etc.)
r() - — e-class commands (estimation: regress, logit, etc.)
e() - — s-class commands (parsing)
s()
A new estimation command overwrites previous results. Store them first:
e()stata
regress y x1 x2
estimates store model1- —— r类命令(如summarize、tabulate等)
r() - —— e类命令(估计命令:regress、logit等)
e() - —— s类命令(解析命令)
s()
新的估计命令会覆盖之前的结果。请先存储结果:
e()stata
regress y x1 x2
estimates store model1Running Stata from the Command Line
从命令行运行Stata
Claude can execute Stata code by running files in batch mode from the terminal. This is how to run Stata non-interactively.
.doClaude可以通过在终端中以批处理模式运行文件来执行Stata代码。这是无交互运行Stata的方式。
.doFinding the Stata Binary
查找Stata可执行文件
Stata on macOS is a bundle. The actual binary is inside it. Common locations:
.appundefinedmacOS上的Stata是一个包,实际的可执行文件在包内部。常见路径:
.appundefinedStata 18 / StataNow (most common)
Stata 18 / StataNow(最常见)
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp
/Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp
/Applications/StataNow/StataMP.app/Contents/MacOS/stata-mp
Other editions (SE, BE)
其他版本(SE、BE)
/Applications/Stata/StataSE.app/Contents/MacOS/stata-se
/Applications/Stata/StataBE.app/Contents/MacOS/stata-be
If Stata isn't on `$PATH`, find it with: `mdfind -name "stata-mp" | grep MacOS`/Applications/Stata/StataSE.app/Contents/MacOS/stata-se
/Applications/Stata/StataBE.app/Contents/MacOS/stata-be
如果Stata不在`$PATH`中,可以使用以下命令查找:`mdfind -name "stata-mp" | grep MacOS`Batch Mode (-b
)
-b批处理模式(-b
)
-bbash
undefinedbash
undefinedRun a .do file in batch mode — output goes to <filename>.log
以批处理模式运行.do文件——输出会写入<filename>.log
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp -b do analysis.do
/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp -b do analysis.do
If stata-mp is on PATH (e.g., via symlink or alias):
如果stata-mp已在PATH中(例如通过符号链接或别名):
stata-mp -b do analysis.do
- `-b` = batch mode (non-interactive, no GUI)
- Output (everything Stata would display) is written to `analysis.log` in the working directory
- Exit code is 0 on success, non-zero on error
- The log file contains all output, including error messages — check it after executionstata-mp -b do analysis.do
- `-b` = 批处理模式(无交互,无图形界面)
- 输出(Stata会显示的所有内容)会写入当前工作目录下的`analysis.log`
- 执行成功时退出码为0,失败时为非0值
- 日志文件包含所有输出,包括错误信息——执行后请务必检查Running Inline Stata Code
运行单行Stata代码
To run a quick Stata snippet without creating a file:
.dobash
undefined无需创建文件即可快速运行Stata代码片段:
.dobash
undefinedWrite a temp .do file and run it
创建临时.do文件并运行
cat > /tmp/stata_run.do << 'EOF'
sysuse auto, clear
summarize price mpg
EOF
stata-mp -b do /tmp/stata_run.do
cat /tmp/stata_run.log
undefinedcat > /tmp/stata_run.do << 'EOF'
sysuse auto, clear
summarize price mpg
EOF
stata-mp -b do /tmp/stata_run.do
cat /tmp/stata_run.log
undefinedChecking Results
检查执行结果
bash
undefinedbash
undefinedCheck if it succeeded
检查是否执行成功
stata-mp -b do tests/run_tests.do && echo "SUCCESS" || echo "FAILED"
stata-mp -b do tests/run_tests.do && echo "执行成功" || echo "执行失败"
Search the log for pass/fail
在日志中搜索执行结果
grep -E "PASS|FAIL|error|r([0-9]+)" run_tests.log
undefinedgrep -E "PASS|FAIL|error|r([0-9]+)" run_tests.log
undefinedTips
注意事项
- at the top of batch scripts — batch mode starts with a fresh Stata session, but
clear allensures no stale state from prior runs in the same session.clear all - — prevents Stata from pausing for
set more offprompts (fatal in batch mode).--more-- - Log files overwrite silently — always writes to
analysis.doin the current directory. If you run multipleanalysis.logfiles, check the right log..do - Working directory — Stata's working directory is wherever you run the command from, not where the file lives. Use
.doin thecdfile or absolute paths if needed..do
- 批处理脚本开头使用——批处理模式会启动全新的Stata会话,但
clear all可确保同一会话中之前的运行不会留下残留状态。clear all - 设置——防止Stata因
set more off提示而暂停(批处理模式下会导致致命错误)。--more-- - 日志文件会被静默覆盖——始终会写入当前目录下的
analysis.do。如果运行多个analysis.log文件,请检查对应的日志。.do - 工作目录——Stata的工作目录是你运行命令的目录,而非文件所在的目录。如果需要,请在
.do文件中使用.do命令或绝对路径。cd
Routing Table
路由表
Read only the files relevant to the user's task. Paths are relative to this SKILL.md file.
仅读取与用户任务相关的文件。路径相对于本SKILL.md文件。
Data Operations
数据操作
| File | Topics & Key Commands |
|---|---|
| |
| |
| |
| Variable types, |
| |
| |
| |
| 文件 | 主题与关键命令 |
|---|---|
| |
| |
| |
| 变量类型, |
| |
| |
| |
Statistics & Econometrics
统计与计量经济学
| File | Topics & Key Commands |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| 文件 | 主题与关键命令 |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Causal Inference
因果推断
| File | Topics & Key Commands |
|---|---|
| |
| DiD, parallel trends, event studies, staggered adoption |
| Sharp/fuzzy RD, bandwidth selection, |
| PSM, nearest neighbor, kernel matching, |
| |
| 文件 | 主题与关键命令 |
|---|---|
| |
| 双重差分法(DiD), 平行趋势, 事件研究, staggered adoption |
| 精确/模糊断点回归, 带宽选择, |
| 倾向得分匹配(PSM), 最近邻匹配, 核匹配, |
| |
Advanced Methods
高级方法
| File | Topics & Key Commands |
|---|---|
| |
| |
| |
| |
| |
| 文件 | 主题与关键命令 |
|---|---|
| |
| |
| |
| |
| |
Graphics
绘图
| File | Topics & Key Commands |
|---|---|
| |
| 文件 | 主题与关键命令 |
|---|---|
| |
Programming
编程
| File | Topics & Key Commands |
|---|---|
| |
| |
| Mata basics, when to use Mata vs ado, data types |
| Mata functions, flow control, structures, pointers |
| Matrix creation, decompositions, solvers, |
| |
| 文件 | 主题与关键命令 |
|---|---|
| |
| |
| Mata基础, Mata与ado的适用场景, 数据类型 |
| Mata函数, 流程控制, 数据结构, 指针 |
| 矩阵创建, 矩阵分解, 求解器, |
| |
Output & Workflow
输出与工作流
| File | Topics & Key Commands |
|---|---|
| |
| Project structure, master do-files, version control, debugging, common mistakes |
| Python via |
| User wants to report a Stata skill documentation gap or error to the repository |
| 文件 | 主题与关键命令 |
|---|---|
| |
| 项目结构, 主do文件, 版本控制, 调试, 常见错误 |
| 通过 |
| 用户需要向仓库报告Stata技能文档的漏洞或错误 |
Community Packages
社区包
| File | What It Does |
|---|---|
| High-dimensional fixed effects OLS (absorbs multiple FE sets efficiently) |
| |
| Alternative regression table exporter (Word, Excel, TeX) |
| One-command Word document creation for any Stata output |
| Cross-tabulations and summary tables to file |
| Coefficient plots from stored estimates |
| |
| Modern DiD: |
| |
| Robust RD estimation with optimal bandwidth ( |
| Propensity score matching (nearest neighbor, kernel, radius) |
| Synthetic control method ( |
| Enhanced IV/2SLS: |
| Dynamic panel GMM (Arellano-Bond/Blundell-Bond) |
| Binned scatter plots with CI ( |
| Nonparametric kernel estimation and inference |
| |
| Winsorizing and trimming: |
| |
| |
| 文件 | 功能描述 |
|---|---|
| 高维固定效应OLS(高效吸收多个固定效应集合) |
| |
| 替代回归表格导出工具(支持Word、Excel、TeX) |
| 一键将任意Stata输出生成Word文档 |
| 生成交叉表和汇总表到文件 |
| 从存储的估计结果绘制系数图 |
| |
| 现代双重差分法: |
| |
| 稳健断点回归估计与最优带宽选择( |
| 倾向得分匹配(最近邻、核、半径匹配) |
| 合成控制法( |
| 增强版IV/2SLS: |
| 动态面板GMM(Arellano-Bond/Blundell-Bond方法) |
| 带置信区间的分箱散点图( |
| 非参数核估计与推断 |
| |
| 缩尾与截尾处理: |
| |
| |
Common Patterns
常见代码模式
Regression Table Workflow
回归表格工作流
stata
* Estimate models
eststo clear
eststo: regress y x1 x2, vce(robust)
eststo: regress y x1 x2 x3, vce(robust)
eststo: regress y x1 x2 x3 x4, vce(cluster id)
* Export table
esttab using "results.tex", replace ///
se star(* 0.10 ** 0.05 *** 0.01) ///
label booktabs ///
title("Main Results") ///
mtitles("(1)" "(2)" "(3)")stata
* 估计模型
eststo clear
eststo: regress y x1 x2, vce(robust)
eststo: regress y x1 x2 x3, vce(robust)
eststo: regress y x1 x2 x3 x4, vce(cluster id)
* 导出表格
esttab using "results.tex", replace ///
se star(* 0.10 ** 0.05 *** 0.01) ///
label booktabs ///
title("主要结果") ///
mtitles("(1)" "(2)" "(3)")Panel Data Setup
面板数据设置
stata
xtset panelid timevar // declare panel structure
xtdescribe // check balance
xtsum outcome // within/between variation
* Fixed effects
xtreg y x1 x2, fe vce(cluster panelid)
* Or with reghdfe (preferred for multiple FE)
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)stata
xtset panelid timevar // 声明面板结构
xtdescribe // 检查面板平衡性
xtsum outcome // 组内/组间变异
* 固定效应模型
xtreg y x1 x2, fe vce(cluster panelid)
* 或使用reghdfe(推荐用于多固定效应)
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)Difference-in-Differences
双重差分法
stata
* Classic 2x2 DiD
gen post = (year >= treatment_year)
gen treat_post = treated * post
regress y treated post treat_post, vce(cluster id)
* Event study (uniform timing — must interact with treatment group)
reghdfe y ib(-1).rel_time#1.treated, absorb(id year) vce(cluster id)
testparm *.rel_time#1.treated // pre-trend test
* Modern staggered DiD (Callaway & Sant'Anna)
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
csdid_plotstata
* 经典2x2双重差分
gen post = (year >= treatment_year)
gen treat_post = treated * post
regress y treated post treat_post, vce(cluster id)
* 事件研究(统一处理时间——必须与处理组交互)
reghdfe y ib(-1).rel_time#1.treated, absorb(id year) vce(cluster id)
testparm *.rel_time#1.treated // 平行趋势检验
* 现代多期双重差分(Callaway & Sant'Anna方法)
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
csdid_plotGraph Export
图形导出
stata
* Publication-quality scatter with fit line
twoway (scatter y x, mcolor(navy%50) msize(small)) ///
(lfit y x, lcolor(cranberry) lwidth(medthick)), ///
title("Title Here") ///
xtitle("X Label") ytitle("Y Label") ///
legend(off) scheme(s2color)
graph export "figure1.pdf", replace as(pdf)
graph export "figure1.png", replace as(png) width(2400)stata
* 符合出版要求的散点图与拟合线
twoway (scatter y x, mcolor(navy%50) msize(small)) ///
(lfit y x, lcolor(cranberry) lwidth(medthick)), ///
title("标题") ///
xtitle("X轴标签") ytitle("Y轴标签") ///
legend(off) scheme(s2color)
graph export "figure1.pdf", replace as(pdf)
graph export "figure1.png", replace as(png) width(2400)Data Cleaning Pipeline
数据清洗流程
stata
* Load and inspect
import delimited "raw_data.csv", clear varnames(1)
describe
codebook, compact
* Clean
rename *, lower // lowercase all varnames
destring income, replace force // convert string to numeric
replace income = . if income < 0
* Label
label variable income "Annual household income (USD)"
label define yesno 0 "No" 1 "Yes"
label values employed yesno
* Save
compress
save "clean_data.dta", replacestata
* 加载并检查数据
import delimited "raw_data.csv", clear varnames(1)
describe
codebook, compact
* 数据清洗
rename *, lower // 所有变量名转为小写
destring income, replace force // 将字符串转为数值型
replace income = . if income < 0
* 添加标签
label variable income "家庭年收入(美元)"
label define yesno 0 "否" 1 "是"
label values employed yesno
* 保存数据
compress
save "clean_data.dta", replaceMultiple Imputation
多重插补
stata
mi set mlong
mi register imputed income education
mi impute chained (regress) income (ologit) education = age i.gender, add(20) rseed(12345)
mi estimate: regress wage income education age i.genderstata
mi set mlong
mi register imputed income education
mi impute chained (regress) income (ologit) education = age i.gender, add(20) rseed(12345)
mi estimate: regress wage income education age i.genderHelp Improve This Skill
帮助改进本技能
If you produce Stata code with a significant error — wrong syntax, incorrect command usage, or a gotcha you failed to catch — and the issue seems to stem from a gap in these reference files rather than a one-off mistake, consider suggesting to the user that they file an issue on the skill repository. This helps future users.
When to raise this: Only after you've already corrected the error and the user has working code. Frame it as optional: "I made an error with [X] that I think comes from a gap in the Stata skill documentation. If you'd like, I can help you file an issue or a PR so it gets fixed for everyone."
When NOT to raise this: If the user is on Claude Haiku, the error is more likely a model capability issue than a documentation gap. In that case, suggest they try Sonnet or Opus for complex Stata work instead of filing an issue.
If the user agrees, read for instructions on writing a good issue report.
references/filing-issues.md如果你生成的Stata代码存在严重错误——语法错误、命令使用不当,或者遗漏了本应注意的陷阱,且问题似乎源于参考文件的漏洞而非偶然错误,请考虑建议用户向仓库提交issue。这将帮助未来的用户。
建议提交issue的场景: 仅在你已经修正错误且用户获得可运行代码之后。请以可选的方式提出:"我在[X]方面犯了一个错误,我认为这是Stata技能文档的漏洞导致的。如果你愿意,我可以帮你提交issue或PR,让这个问题被修复以帮助所有人。"
不建议提交issue的场景: 如果用户使用的是Claude Haiku,错误更可能是模型能力问题而非文档漏洞。这种情况下,建议用户尝试Sonnet或Opus来处理复杂的Stata工作,而非提交issue。
如果用户同意,请阅读以了解如何编写高质量的issue报告。
references/filing-issues.md