autoresearch

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Autoresearch Skill

Autoresearch Skill

Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.
When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.
When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.
The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.

针对所有以转化为核心的内容的Karpathy风格优化循环,无需流量,搭载模拟专家小组,仅需数分钟即可完成,无需等待数周。
适用场景: 内容上线前优化。生成50+个变体,通过5位模拟专家评分,迭代优胜方案,输出最优版本 + 完整实验日志。
不适用场景: 上线后真实流量A/B测试——这类场景需要真实分析数据,而非模拟评分。
流程: 先运行自动研究达到85分以上的模拟评分,再部署上线,最后通过真实流量验证效果。

What You'll Produce

输出内容

Every run outputs 3 files:
FilePurpose
{name}-optimized.{ext}
The winning optimized content
data/{name}-experiments.json
Full experiment log — all variants + all scores
data/{name}-optimization-report.md
Human-readable summary with winner rationale

每次运行会输出3个文件:
文件用途
{name}-optimized.{ext}
优胜的优化后内容
data/{name}-experiments.json
完整实验日志——所有变体 + 所有评分
data/{name}-optimization-report.md
人工可读的总结报告,包含优胜方案的评选逻辑

Expert Panel (5 Personas)

专家小组(5个角色)

Score every variant against all 5. Batch all variants into a single API call per round.
#PersonaScoring Lens
1CMO at a mid-market B2B company (50M+ revenue)"Would this make me stop and engage?"
2Skeptical founder"Do I believe this? Would I trust this company?"
3Conversion rate optimizer"Is this clear, specific, and action-driving?"
4Senior copywriter"Is this compelling, differentiated, and well-crafted?"
5Your CEO/founder"Direct, ROI-obsessed, no BS. Would I put this on my site?"
Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in a
references/founder-voice.md
file.
Each judge scores 0–100. Final score = average across all 5 judges.

所有变体都要经过5个角色的评分,每轮的所有变体合并为单次API调用批量处理。
编号角色评分维度
1中等规模B2B公司CMO(年收入5000万美元以上)「这个内容会让我停下浏览并产生兴趣吗?」
2持怀疑态度的创始人「我相信这个内容吗?我会信任这家公司吗?」
3转化率优化专家「内容是否清晰、具体,能驱动用户行动?」
4资深文案「内容是否有吸引力、有差异化、撰写精良?」
5你的CEO/创始人「直接、关注ROI、没有空话。我会把这个内容放在我们的官网上吗?」
自定义设置: 你可以用自己的CEO/创始人的说话风格替换第5个角色,在
references/founder-voice.md
文件中定义他们的优先级和沟通风格即可。
每个评委打0-100分,最终得分 = 5位评委的平均分

Round Structure (Per Content Element)

轮次结构(针对每个内容元素)

Round 1:
  → Generate 10 variants of the element
  → Batch-score all 10 with the 5-expert panel (1 API call)
  → Rank by average score
  → Keep top 3

Round 2 (Evolution):
  → Analyze what the top 3 did right
  → Generate 10 new variants that push those winning patterns further
  → Batch-score all 10 (1 API call)
  → Keep top 3

Round 3 (If score < threshold):
  → Identify weakest scoring dimension
  → Generate 10 variants optimized for that dimension
  → Batch-score → keep top 1

Multi-element cross-breeding:
  → Take top 1 winner from each element
  → Generate 5 combinations that mix winning elements
  → Score holistically as complete units
  → Output the single best combination
Stop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.

第1轮:
  → 生成该元素的10个变体
  → 用5人专家小组批量给10个变体评分(1次API调用)
  → 按平均分排序
  → 保留前3名

第2轮(迭代进化):
  → 分析前3名的优势点
  → 生成10个新变体,进一步放大这些优胜特征
  → 批量给10个新变体评分(1次API调用)
  → 保留前3名

第3轮(如果得分低于阈值):
  → 找出得分最低的维度
  → 生成10个针对该维度优化的变体
  → 批量评分 → 保留第1名

多元素交叉组合:
  → 取每个元素的第1名优胜方案
  → 生成5个组合方案,混合所有元素的优胜特征
  → 作为完整单元进行整体评分
  → 输出得分最高的唯一组合方案
终止条件: 排名第一的变体达到最低得分阈值(默认80分)或完成3轮迭代。

Content Types & Score Dimensions

内容类型与评分维度

Landing Pages

着陆页

Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof
Score dimensions:
  • first_impression
    — Does it grab immediately?
  • clarity
    — Is the offer instantly understood?
  • trust
    — Does it feel credible?
  • urgency
    — Is there a reason to act now?
  • would_convert
    — Would the judge actually click?
优化元素: 主标题、副标题、CTA文本、问题阐述板块、社交证明
评分维度:
  • first_impression
    — 是否能立刻抓住注意力?
  • clarity
    — 提供的权益是否能立刻被理解?
  • trust
    — 内容是否让人觉得可信?
  • urgency
    — 是否有立刻行动的理由?
  • would_convert
    — 评委是否真的会点击?

Email Sequences

邮件序列

Elements to optimize: Subject line, opening line, body copy, CTA, PS line
Score dimensions:
  • would_open
    — Subject line pass rate
  • would_read
    — Does the opening hook?
  • would_click
    — Is the CTA compelling?
  • would_reply
    — Does it feel personal enough to respond to?
  • spam_risk
    — Does it feel spammy? (lower = better; invert for final score)
优化元素: 主题行、开头语、正文文案、CTA、附言
评分维度:
  • would_open
    — 主题行的打开率
  • would_read
    — 开头是否能吸引用户读下去?
  • would_click
    — CTA是否有吸引力?
  • would_reply
    — 内容是否足够个性化,能让用户愿意回复?
  • spam_risk
    — 内容看起来像垃圾邮件吗?(越低越好,计算最终得分时取反)

Ad Copy

广告文案

Elements to optimize: Headline, description, CTA
Score dimensions:
  • scroll_stopping
    — Does it interrupt the scroll?
  • clarity
    — Is the value prop clear in 3 seconds?
  • click_worthiness
    — Does the judge want to click?
  • relevance
    — Does it match likely audience intent?
  • differentiation
    — Does it stand out from competitors?
优化元素: 标题、描述、CTA
评分维度:
  • scroll_stopping
    — 是否能中断用户的滑动浏览?
  • clarity
    — 价值主张是否能在3秒内被理解?
  • click_worthiness
    — 评委是否愿意点击?
  • relevance
    — 是否匹配目标受众的潜在意图?
  • differentiation
    — 是否能从竞品中脱颖而出?

Form Pages

表单页

Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy
Score dimensions:
  • first_impression
    — Does it feel worth filling out?
  • trust
    — Do they believe their info is safe and the offer is real?
  • completion_likelihood
    — Would the judge start filling it out?
  • lead_quality
    — Would this attract serious prospects (not tire-kickers)?
  • would_fill_out
    — Final gut check: would they submit?

优化元素: 标题、辅助文本、价值主张要点、按钮文本、字段顺序、感谢语
评分维度:
  • first_impression
    — 看起来值得填写吗?
  • trust
    — 用户是否相信他们的信息是安全的,权益是真实的?
  • completion_likelihood
    — 评委是否会开始填写表单?
  • lead_quality
    — 是否能吸引真正的潜在客户,而非随便看看的用户?
  • would_fill_out
    — 最终直觉判断:他们会提交表单吗?

Step-by-Step Execution Protocol

分步执行协议

Step 1: Intake & Parse

步骤1:接收与解析

Read the source content. Identify content type automatically or confirm with user:
  • HTML file → landing page or form page
  • Markdown / plain text → email or ad copy
  • If ambiguous, ask: "Is this a landing page, email sequence, ad copy, or form page?"
Extract all optimizable elements. List them back to user:
Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]

Optimizing: all | Variants per round: 10 | Min score: 80
读取源内容,自动识别内容类型或向用户确认:
  • HTML文件 → 着陆页或表单页
  • Markdown / 纯文本 → 邮件或广告文案
  • 如果识别不明确,询问用户:「这是着陆页、邮件序列、广告文案还是表单页?」
提取所有可优化元素,向用户反馈确认:
找到5个可优化元素:
1. 主标题:"We help B2B companies grow"
2. 副标题:"Full-service digital marketing..."
3. CTA:"Get Started"
4. 问题阐述:[节选]
5. 社交证明:[节选]

优化范围:全部 | 每轮变体数量:10 | 最低得分阈值:80

Step 2: Get API Key

步骤2:获取API密钥

Check for Anthropic API key:
$ANTHROPIC_API_KEY
environment variable.
bash
export ANTHROPIC_API_KEY="your-api-key-here"
检查Anthropic API密钥:
$ANTHROPIC_API_KEY
环境变量。
bash
export ANTHROPIC_API_KEY="your-api-key-here"

Step 3: Run Optimization Rounds

步骤3:运行优化轮次

For each element, run the round structure above.
Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.
Model preference (in order):
  1. claude-sonnet-4-5
    (preferred — fast + smart)
  2. claude-opus-4
    (if highest quality needed)
  3. Any claude-3.5+ model if the above aren't available
针对每个元素,运行上述轮次结构。
关键API效率规则: 始终将所有变体合并到单个提示词中,永远不要为单个变体单独调用API。10个变体的轮次 = 1次API调用。
模型优先级(从高到低):
  1. claude-sonnet-4-5
    (首选——速度快 + 效果好)
  2. claude-opus-4
    (如果需要最高质量)
  3. 任意claude-3.5+模型(如果上述模型不可用)

Step 4: Cross-Breed (Multi-Element)

步骤4:交叉组合(多元素场景)

After all elements have winners:
  1. Assemble the top winner from each element into a complete unit
  2. Generate 5 holistic variants that naturally combine the winning elements
  3. Score the complete units (not just individual parts)
  4. Pick the winner with the highest holistic score
所有元素都得出优胜方案后:
  1. 将每个元素的第一名优胜方案组装成完整单元
  2. 生成5个整体变体,自然融合所有元素的优胜特征
  3. 对完整单元进行评分(而非单独给每个元素评分)
  4. 选择整体得分最高的方案作为最终优胜者

Step 5: Write Output Files

步骤5:写入输出文件

bash
undefined
bash
undefined

Create output directory

创建输出目录

mkdir -p data
mkdir -p data

Write optimized content

写入优化后内容

Write experiments JSON

写入实验JSON文件

Write optimization report

写入优化报告


**Experiments JSON structure:**
```json
{
  "run_id": "autoresearch-{name}-{timestamp}",
  "content_type": "landing_page",
  "source_file": "path/to/original",
  "min_score_threshold": 80,
  "rounds": [
    {
      "round": 1,
      "element": "hero_headline",
      "variants": [
        {
          "id": 1,
          "text": "...",
          "scores": {
            "cmo": 72,
            "skeptical_founder": 68,
            "cro": 75,
            "copywriter": 70,
            "founder": 65
          },
          "avg_score": 70
        }
      ],
      "top_3": [1, 4, 7],
      "winner_score": 82
    }
  ],
  "final_winner": {
    "hero_headline": "...",
    "subheadline": "...",
    "cta": "...",
    "holistic_score": 87
  }
}

**实验JSON结构:**
```json
{
  "run_id": "autoresearch-{name}-{timestamp}",
  "content_type": "landing_page",
  "source_file": "path/to/original",
  "min_score_threshold": 80,
  "rounds": [
    {
      "round": 1,
      "element": "hero_headline",
      "variants": [
        {
          "id": 1,
          "text": "...",
          "scores": {
            "cmo": 72,
            "skeptical_founder": 68,
            "cro": 75,
            "copywriter": 70,
            "founder": 65
          },
          "avg_score": 70
        }
      ],
      "top_3": [1, 4, 7],
      "winner_score": 82
    }
  ],
  "final_winner": {
    "hero_headline": "...",
    "subheadline": "...",
    "cta": "...",
    "holistic_score": 87
  }
}

Step 6: Report Back

步骤6:结果反馈

Summarize results to user:
  • Final winning score
  • Biggest score jump (which element improved most)
  • Top 2 runner-up alternatives (in case winner doesn't feel right)
  • Path to all 3 output files
  • Clear next step

向用户总结结果:
  • 最终优胜方案得分
  • 得分提升最大的元素(哪个元素优化效果最明显)
  • 前2名备选方案(如果优胜方案不符合预期可选用)
  • 3个输出文件的路径
  • 清晰的下一步建议

User Options

用户可配置选项

OptionDefaultDescription
elements
allWhich elements to optimize
variants_per_round
10How many variants to generate per round
min_score
80Stop when this score is hit
rounds
3Max rounds before stopping
auto_apply
falseWhether to overwrite the source file with winners
content_type
auto-detectForce a content type if auto-detect is wrong

选项默认值说明
elements
全部要优化的元素范围
variants_per_round
10每轮生成的变体数量
min_score
80达到该得分时终止优化
rounds
3最大迭代轮次
auto_apply
false是否用优胜方案覆盖源文件
content_type
自动识别如果自动识别错误可手动指定内容类型

Quality Gates

质量门限

  • < 70: Don't ship. Something fundamental is broken.
  • 70-79: Marginal. One more round targeting the lowest-scoring dimension.
  • 80-84: Good. Shippable. Validate with real traffic.
  • 85-89: Strong. Ship with confidence.
  • 90+: Rare. Ship immediately.

  • < 70分: 不要上线,存在根本性问题。
  • 70-79分: 刚达标,再针对最低分维度多做一轮优化。
  • 80-84分: 良好,可以上线,用真实流量验证效果。
  • 85-89分: 优秀,可以放心上线。
  • 90+分: 非常罕见,可以直接上线。

Anti-Patterns to Avoid

需避免的反模式

  • Never call the API once per variant. Always batch. A 10-variant round = 1 call.
  • Don't over-optimize for one dimension. If you're hitting 95 on clarity but 45 on trust, the overall score is misleading.
  • Don't run more than 5 rounds. If you're not hitting 80 after 3 rounds, the problem is strategic (wrong positioning), not tactical (wrong words).
  • Don't cross-breed until each element has its own winner. Premature cross-breeding creates incoherent combinations.
  • 永远不要为单个变体单独调用API,始终批量处理。 10个变体的轮次 = 1次API调用。
  • 不要过度优化单一维度。 如果清晰度得分95,但信任得分只有45,整体得分没有参考意义。
  • 不要运行超过5轮迭代。 如果3轮后还达不到80分,问题出在战略层面(定位错误),而非战术层面(措辞错误)。
  • 不要在每个元素都得出优胜方案前进行交叉组合。 过早交叉组合会产生逻辑不连贯的方案。