autoresearch
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAutoresearch Skill
Autoresearch Skill
Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.
When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.
When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.
The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.
针对所有以转化为核心的内容的Karpathy风格优化循环,无需流量,搭载模拟专家小组,仅需数分钟即可完成,无需等待数周。
适用场景: 内容上线前优化。生成50+个变体,通过5位模拟专家评分,迭代优胜方案,输出最优版本 + 完整实验日志。
不适用场景: 上线后真实流量A/B测试——这类场景需要真实分析数据,而非模拟评分。
流程: 先运行自动研究达到85分以上的模拟评分,再部署上线,最后通过真实流量验证效果。
What You'll Produce
输出内容
Every run outputs 3 files:
| File | Purpose |
|---|---|
| The winning optimized content |
| Full experiment log — all variants + all scores |
| Human-readable summary with winner rationale |
每次运行会输出3个文件:
| 文件 | 用途 |
|---|---|
| 优胜的优化后内容 |
| 完整实验日志——所有变体 + 所有评分 |
| 人工可读的总结报告,包含优胜方案的评选逻辑 |
Expert Panel (5 Personas)
专家小组(5个角色)
Score every variant against all 5. Batch all variants into a single API call per round.
| # | Persona | Scoring Lens |
|---|---|---|
| 1 | CMO at a mid-market B2B company (50M+ revenue) | "Would this make me stop and engage?" |
| 2 | Skeptical founder | "Do I believe this? Would I trust this company?" |
| 3 | Conversion rate optimizer | "Is this clear, specific, and action-driving?" |
| 4 | Senior copywriter | "Is this compelling, differentiated, and well-crafted?" |
| 5 | Your CEO/founder | "Direct, ROI-obsessed, no BS. Would I put this on my site?" |
Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in afile.references/founder-voice.md
Each judge scores 0–100. Final score = average across all 5 judges.
所有变体都要经过5个角色的评分,每轮的所有变体合并为单次API调用批量处理。
| 编号 | 角色 | 评分维度 |
|---|---|---|
| 1 | 中等规模B2B公司CMO(年收入5000万美元以上) | 「这个内容会让我停下浏览并产生兴趣吗?」 |
| 2 | 持怀疑态度的创始人 | 「我相信这个内容吗?我会信任这家公司吗?」 |
| 3 | 转化率优化专家 | 「内容是否清晰、具体,能驱动用户行动?」 |
| 4 | 资深文案 | 「内容是否有吸引力、有差异化、撰写精良?」 |
| 5 | 你的CEO/创始人 | 「直接、关注ROI、没有空话。我会把这个内容放在我们的官网上吗?」 |
自定义设置: 你可以用自己的CEO/创始人的说话风格替换第5个角色,在文件中定义他们的优先级和沟通风格即可。references/founder-voice.md
每个评委打0-100分,最终得分 = 5位评委的平均分。
Round Structure (Per Content Element)
轮次结构(针对每个内容元素)
Round 1:
→ Generate 10 variants of the element
→ Batch-score all 10 with the 5-expert panel (1 API call)
→ Rank by average score
→ Keep top 3
Round 2 (Evolution):
→ Analyze what the top 3 did right
→ Generate 10 new variants that push those winning patterns further
→ Batch-score all 10 (1 API call)
→ Keep top 3
Round 3 (If score < threshold):
→ Identify weakest scoring dimension
→ Generate 10 variants optimized for that dimension
→ Batch-score → keep top 1
Multi-element cross-breeding:
→ Take top 1 winner from each element
→ Generate 5 combinations that mix winning elements
→ Score holistically as complete units
→ Output the single best combinationStop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.
第1轮:
→ 生成该元素的10个变体
→ 用5人专家小组批量给10个变体评分(1次API调用)
→ 按平均分排序
→ 保留前3名
第2轮(迭代进化):
→ 分析前3名的优势点
→ 生成10个新变体,进一步放大这些优胜特征
→ 批量给10个新变体评分(1次API调用)
→ 保留前3名
第3轮(如果得分低于阈值):
→ 找出得分最低的维度
→ 生成10个针对该维度优化的变体
→ 批量评分 → 保留第1名
多元素交叉组合:
→ 取每个元素的第1名优胜方案
→ 生成5个组合方案,混合所有元素的优胜特征
→ 作为完整单元进行整体评分
→ 输出得分最高的唯一组合方案终止条件: 排名第一的变体达到最低得分阈值(默认80分)或完成3轮迭代。
Content Types & Score Dimensions
内容类型与评分维度
Landing Pages
着陆页
Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof
Score dimensions:
- — Does it grab immediately?
first_impression - — Is the offer instantly understood?
clarity - — Does it feel credible?
trust - — Is there a reason to act now?
urgency - — Would the judge actually click?
would_convert
优化元素: 主标题、副标题、CTA文本、问题阐述板块、社交证明
评分维度:
- — 是否能立刻抓住注意力?
first_impression - — 提供的权益是否能立刻被理解?
clarity - — 内容是否让人觉得可信?
trust - — 是否有立刻行动的理由?
urgency - — 评委是否真的会点击?
would_convert
Email Sequences
邮件序列
Elements to optimize: Subject line, opening line, body copy, CTA, PS line
Score dimensions:
- — Subject line pass rate
would_open - — Does the opening hook?
would_read - — Is the CTA compelling?
would_click - — Does it feel personal enough to respond to?
would_reply - — Does it feel spammy? (lower = better; invert for final score)
spam_risk
优化元素: 主题行、开头语、正文文案、CTA、附言
评分维度:
- — 主题行的打开率
would_open - — 开头是否能吸引用户读下去?
would_read - — CTA是否有吸引力?
would_click - — 内容是否足够个性化,能让用户愿意回复?
would_reply - — 内容看起来像垃圾邮件吗?(越低越好,计算最终得分时取反)
spam_risk
Ad Copy
广告文案
Elements to optimize: Headline, description, CTA
Score dimensions:
- — Does it interrupt the scroll?
scroll_stopping - — Is the value prop clear in 3 seconds?
clarity - — Does the judge want to click?
click_worthiness - — Does it match likely audience intent?
relevance - — Does it stand out from competitors?
differentiation
优化元素: 标题、描述、CTA
评分维度:
- — 是否能中断用户的滑动浏览?
scroll_stopping - — 价值主张是否能在3秒内被理解?
clarity - — 评委是否愿意点击?
click_worthiness - — 是否匹配目标受众的潜在意图?
relevance - — 是否能从竞品中脱颖而出?
differentiation
Form Pages
表单页
Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy
Score dimensions:
- — Does it feel worth filling out?
first_impression - — Do they believe their info is safe and the offer is real?
trust - — Would the judge start filling it out?
completion_likelihood - — Would this attract serious prospects (not tire-kickers)?
lead_quality - — Final gut check: would they submit?
would_fill_out
优化元素: 标题、辅助文本、价值主张要点、按钮文本、字段顺序、感谢语
评分维度:
- — 看起来值得填写吗?
first_impression - — 用户是否相信他们的信息是安全的,权益是真实的?
trust - — 评委是否会开始填写表单?
completion_likelihood - — 是否能吸引真正的潜在客户,而非随便看看的用户?
lead_quality - — 最终直觉判断:他们会提交表单吗?
would_fill_out
Step-by-Step Execution Protocol
分步执行协议
Step 1: Intake & Parse
步骤1:接收与解析
Read the source content. Identify content type automatically or confirm with user:
- HTML file → landing page or form page
- Markdown / plain text → email or ad copy
- If ambiguous, ask: "Is this a landing page, email sequence, ad copy, or form page?"
Extract all optimizable elements. List them back to user:
Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]
Optimizing: all | Variants per round: 10 | Min score: 80读取源内容,自动识别内容类型或向用户确认:
- HTML文件 → 着陆页或表单页
- Markdown / 纯文本 → 邮件或广告文案
- 如果识别不明确,询问用户:「这是着陆页、邮件序列、广告文案还是表单页?」
提取所有可优化元素,向用户反馈确认:
找到5个可优化元素:
1. 主标题:"We help B2B companies grow"
2. 副标题:"Full-service digital marketing..."
3. CTA:"Get Started"
4. 问题阐述:[节选]
5. 社交证明:[节选]
优化范围:全部 | 每轮变体数量:10 | 最低得分阈值:80Step 2: Get API Key
步骤2:获取API密钥
Check for Anthropic API key: environment variable.
$ANTHROPIC_API_KEYbash
export ANTHROPIC_API_KEY="your-api-key-here"检查Anthropic API密钥:环境变量。
$ANTHROPIC_API_KEYbash
export ANTHROPIC_API_KEY="your-api-key-here"Step 3: Run Optimization Rounds
步骤3:运行优化轮次
For each element, run the round structure above.
Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.
Model preference (in order):
- (preferred — fast + smart)
claude-sonnet-4-5 - (if highest quality needed)
claude-opus-4 - Any claude-3.5+ model if the above aren't available
针对每个元素,运行上述轮次结构。
关键API效率规则: 始终将所有变体合并到单个提示词中,永远不要为单个变体单独调用API。10个变体的轮次 = 1次API调用。
模型优先级(从高到低):
- (首选——速度快 + 效果好)
claude-sonnet-4-5 - (如果需要最高质量)
claude-opus-4 - 任意claude-3.5+模型(如果上述模型不可用)
Step 4: Cross-Breed (Multi-Element)
步骤4:交叉组合(多元素场景)
After all elements have winners:
- Assemble the top winner from each element into a complete unit
- Generate 5 holistic variants that naturally combine the winning elements
- Score the complete units (not just individual parts)
- Pick the winner with the highest holistic score
所有元素都得出优胜方案后:
- 将每个元素的第一名优胜方案组装成完整单元
- 生成5个整体变体,自然融合所有元素的优胜特征
- 对完整单元进行评分(而非单独给每个元素评分)
- 选择整体得分最高的方案作为最终优胜者
Step 5: Write Output Files
步骤5:写入输出文件
bash
undefinedbash
undefinedCreate output directory
创建输出目录
mkdir -p data
mkdir -p data
Write optimized content
写入优化后内容
Write experiments JSON
写入实验JSON文件
Write optimization report
写入优化报告
**Experiments JSON structure:**
```json
{
"run_id": "autoresearch-{name}-{timestamp}",
"content_type": "landing_page",
"source_file": "path/to/original",
"min_score_threshold": 80,
"rounds": [
{
"round": 1,
"element": "hero_headline",
"variants": [
{
"id": 1,
"text": "...",
"scores": {
"cmo": 72,
"skeptical_founder": 68,
"cro": 75,
"copywriter": 70,
"founder": 65
},
"avg_score": 70
}
],
"top_3": [1, 4, 7],
"winner_score": 82
}
],
"final_winner": {
"hero_headline": "...",
"subheadline": "...",
"cta": "...",
"holistic_score": 87
}
}
**实验JSON结构:**
```json
{
"run_id": "autoresearch-{name}-{timestamp}",
"content_type": "landing_page",
"source_file": "path/to/original",
"min_score_threshold": 80,
"rounds": [
{
"round": 1,
"element": "hero_headline",
"variants": [
{
"id": 1,
"text": "...",
"scores": {
"cmo": 72,
"skeptical_founder": 68,
"cro": 75,
"copywriter": 70,
"founder": 65
},
"avg_score": 70
}
],
"top_3": [1, 4, 7],
"winner_score": 82
}
],
"final_winner": {
"hero_headline": "...",
"subheadline": "...",
"cta": "...",
"holistic_score": 87
}
}Step 6: Report Back
步骤6:结果反馈
Summarize results to user:
- Final winning score
- Biggest score jump (which element improved most)
- Top 2 runner-up alternatives (in case winner doesn't feel right)
- Path to all 3 output files
- Clear next step
向用户总结结果:
- 最终优胜方案得分
- 得分提升最大的元素(哪个元素优化效果最明显)
- 前2名备选方案(如果优胜方案不符合预期可选用)
- 3个输出文件的路径
- 清晰的下一步建议
User Options
用户可配置选项
| Option | Default | Description |
|---|---|---|
| all | Which elements to optimize |
| 10 | How many variants to generate per round |
| 80 | Stop when this score is hit |
| 3 | Max rounds before stopping |
| false | Whether to overwrite the source file with winners |
| auto-detect | Force a content type if auto-detect is wrong |
| 选项 | 默认值 | 说明 |
|---|---|---|
| 全部 | 要优化的元素范围 |
| 10 | 每轮生成的变体数量 |
| 80 | 达到该得分时终止优化 |
| 3 | 最大迭代轮次 |
| false | 是否用优胜方案覆盖源文件 |
| 自动识别 | 如果自动识别错误可手动指定内容类型 |
Quality Gates
质量门限
- < 70: Don't ship. Something fundamental is broken.
- 70-79: Marginal. One more round targeting the lowest-scoring dimension.
- 80-84: Good. Shippable. Validate with real traffic.
- 85-89: Strong. Ship with confidence.
- 90+: Rare. Ship immediately.
- < 70分: 不要上线,存在根本性问题。
- 70-79分: 刚达标,再针对最低分维度多做一轮优化。
- 80-84分: 良好,可以上线,用真实流量验证效果。
- 85-89分: 优秀,可以放心上线。
- 90+分: 非常罕见,可以直接上线。
Anti-Patterns to Avoid
需避免的反模式
- Never call the API once per variant. Always batch. A 10-variant round = 1 call.
- Don't over-optimize for one dimension. If you're hitting 95 on clarity but 45 on trust, the overall score is misleading.
- Don't run more than 5 rounds. If you're not hitting 80 after 3 rounds, the problem is strategic (wrong positioning), not tactical (wrong words).
- Don't cross-breed until each element has its own winner. Premature cross-breeding creates incoherent combinations.
- 永远不要为单个变体单独调用API,始终批量处理。 10个变体的轮次 = 1次API调用。
- 不要过度优化单一维度。 如果清晰度得分95,但信任得分只有45,整体得分没有参考意义。
- 不要运行超过5轮迭代。 如果3轮后还达不到80分,问题出在战略层面(定位错误),而非战术层面(措辞错误)。
- 不要在每个元素都得出优胜方案前进行交叉组合。 过早交叉组合会产生逻辑不连贯的方案。