autoresearch

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Autoresearch Skill

Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.

When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.

When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.

The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.

针对所有以转化为核心的内容的Karpathy风格优化循环，无需流量，搭载模拟专家小组，仅需数分钟即可完成，无需等待数周。

适用场景： 内容上线前优化。生成50+个变体，通过5位模拟专家评分，迭代优胜方案，输出最优版本 + 完整实验日志。

不适用场景： 上线后真实流量A/B测试——这类场景需要真实分析数据，而非模拟评分。

流程： 先运行自动研究达到85分以上的模拟评分，再部署上线，最后通过真实流量验证效果。

What You'll Produce

输出内容

Every run outputs 3 files:

File	Purpose
`{name}-optimized.{ext}`	The winning optimized content
`data/{name}-experiments.json`	Full experiment log — all variants + all scores
`data/{name}-optimization-report.md`	Human-readable summary with winner rationale

每次运行会输出3个文件：

文件	用途
`{name}-optimized.{ext}`	优胜的优化后内容
`data/{name}-experiments.json`	完整实验日志——所有变体 + 所有评分
`data/{name}-optimization-report.md`	人工可读的总结报告，包含优胜方案的评选逻辑

Expert Panel (5 Personas)

专家小组（5个角色）

Score every variant against all 5. Batch all variants into a single API call per round.

#	Persona	Scoring Lens
1	CMO at a mid-market B2B company (50M+ revenue)	"Would this make me stop and engage?"
2	Skeptical founder	"Do I believe this? Would I trust this company?"
3	Conversion rate optimizer	"Is this clear, specific, and action-driving?"
4	Senior copywriter	"Is this compelling, differentiated, and well-crafted?"
5	Your CEO/founder	"Direct, ROI-obsessed, no BS. Would I put this on my site?"

Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in a
references/founder-voice.md
file.

Each judge scores 0–100. Final score = average across all 5 judges.

所有变体都要经过5个角色的评分，每轮的所有变体合并为单次API调用批量处理。

编号	角色	评分维度
1	中等规模B2B公司CMO（年收入5000万美元以上）	「这个内容会让我停下浏览并产生兴趣吗？」
2	持怀疑态度的创始人	「我相信这个内容吗？我会信任这家公司吗？」
3	转化率优化专家	「内容是否清晰、具体，能驱动用户行动？」
4	资深文案	「内容是否有吸引力、有差异化、撰写精良？」
5	你的CEO/创始人	「直接、关注ROI、没有空话。我会把这个内容放在我们的官网上吗？」

自定义设置： 你可以用自己的CEO/创始人的说话风格替换第5个角色，在
references/founder-voice.md
文件中定义他们的优先级和沟通风格即可。

每个评委打0-100分，最终得分 = 5位评委的平均分。

Round Structure (Per Content Element)

轮次结构（针对每个内容元素）

Round 1:
  → Generate 10 variants of the element
  → Batch-score all 10 with the 5-expert panel (1 API call)
  → Rank by average score
  → Keep top 3

Round 2 (Evolution):
  → Analyze what the top 3 did right
  → Generate 10 new variants that push those winning patterns further
  → Batch-score all 10 (1 API call)
  → Keep top 3

Round 3 (If score < threshold):
  → Identify weakest scoring dimension
  → Generate 10 variants optimized for that dimension
  → Batch-score → keep top 1

Multi-element cross-breeding:
  → Take top 1 winner from each element
  → Generate 5 combinations that mix winning elements
  → Score holistically as complete units
  → Output the single best combination

Stop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.

第1轮：
  → 生成该元素的10个变体
  → 用5人专家小组批量给10个变体评分（1次API调用）
  → 按平均分排序
  → 保留前3名

第2轮（迭代进化）：
  → 分析前3名的优势点
  → 生成10个新变体，进一步放大这些优胜特征
  → 批量给10个新变体评分（1次API调用）
  → 保留前3名

第3轮（如果得分低于阈值）：
  → 找出得分最低的维度
  → 生成10个针对该维度优化的变体
  → 批量评分 → 保留第1名

多元素交叉组合：
  → 取每个元素的第1名优胜方案
  → 生成5个组合方案，混合所有元素的优胜特征
  → 作为完整单元进行整体评分
  → 输出得分最高的唯一组合方案

终止条件： 排名第一的变体达到最低得分阈值（默认80分）或完成3轮迭代。

Content Types & Score Dimensions

内容类型与评分维度

Landing Pages

着陆页

Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof

Score dimensions:

```
first_impression
```
— Does it grab immediately?
```
clarity
```
— Is the offer instantly understood?
```
trust
```
— Does it feel credible?
```
urgency
```
— Is there a reason to act now?
```
would_convert
```
— Would the judge actually click?

优化元素： 主标题、副标题、CTA文本、问题阐述板块、社交证明

评分维度：

```
first_impression
```
— 是否能立刻抓住注意力？
```
clarity
```
— 提供的权益是否能立刻被理解？
```
trust
```
— 内容是否让人觉得可信？
```
urgency
```
— 是否有立刻行动的理由？
```
would_convert
```
— 评委是否真的会点击？

Email Sequences

邮件序列

Elements to optimize: Subject line, opening line, body copy, CTA, PS line

Score dimensions:

```
would_open
```
— Subject line pass rate
```
would_read
```
— Does the opening hook?
```
would_click
```
— Is the CTA compelling?
```
would_reply
```
— Does it feel personal enough to respond to?
```
spam_risk
```
— Does it feel spammy? (lower = better; invert for final score)

优化元素： 主题行、开头语、正文文案、CTA、附言

评分维度：

```
would_open
```
— 主题行的打开率
```
would_read
```
— 开头是否能吸引用户读下去？
```
would_click
```
— CTA是否有吸引力？
```
would_reply
```
— 内容是否足够个性化，能让用户愿意回复？
```
spam_risk
```
— 内容看起来像垃圾邮件吗？（越低越好，计算最终得分时取反）

Ad Copy

广告文案

Elements to optimize: Headline, description, CTA

Score dimensions:

```
scroll_stopping
```
— Does it interrupt the scroll?
```
clarity
```
— Is the value prop clear in 3 seconds?
```
click_worthiness
```
— Does the judge want to click?
```
relevance
```
— Does it match likely audience intent?
```
differentiation
```
— Does it stand out from competitors?

优化元素： 标题、描述、CTA

评分维度：

```
scroll_stopping
```
— 是否能中断用户的滑动浏览？
```
clarity
```
— 价值主张是否能在3秒内被理解？
```
click_worthiness
```
— 评委是否愿意点击？
```
relevance
```
— 是否匹配目标受众的潜在意图？
```
differentiation
```
— 是否能从竞品中脱颖而出？

Form Pages

表单页

Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy

Score dimensions:

```
first_impression
```
— Does it feel worth filling out?
```
trust
```
— Do they believe their info is safe and the offer is real?
```
completion_likelihood
```
— Would the judge start filling it out?
```
lead_quality
```
— Would this attract serious prospects (not tire-kickers)?
```
would_fill_out
```
— Final gut check: would they submit?

优化元素： 标题、辅助文本、价值主张要点、按钮文本、字段顺序、感谢语

评分维度：

```
first_impression
```
— 看起来值得填写吗？
```
trust
```
— 用户是否相信他们的信息是安全的，权益是真实的？
```
completion_likelihood
```
— 评委是否会开始填写表单？
```
lead_quality
```
— 是否能吸引真正的潜在客户，而非随便看看的用户？
```
would_fill_out
```
— 最终直觉判断：他们会提交表单吗？

Step-by-Step Execution Protocol

分步执行协议

Step 1: Intake & Parse

步骤1：接收与解析

Read the source content. Identify content type automatically or confirm with user:

HTML file → landing page or form page
Markdown / plain text → email or ad copy
If ambiguous, ask: "Is this a landing page, email sequence, ad copy, or form page?"

Extract all optimizable elements. List them back to user:

Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]

Optimizing: all | Variants per round: 10 | Min score: 80

读取源内容，自动识别内容类型或向用户确认：

HTML文件 → 着陆页或表单页
Markdown / 纯文本 → 邮件或广告文案
如果识别不明确，询问用户：「这是着陆页、邮件序列、广告文案还是表单页？」

提取所有可优化元素，向用户反馈确认：

找到5个可优化元素：
1. 主标题："We help B2B companies grow"
2. 副标题："Full-service digital marketing..."
3. CTA："Get Started"
4. 问题阐述：[节选]
5. 社交证明：[节选]

优化范围：全部 | 每轮变体数量：10 | 最低得分阈值：80

Step 2: Get API Key

步骤2：获取API密钥

Check for Anthropic API key:

$ANTHROPIC_API_KEY

environment variable.

bash

export ANTHROPIC_API_KEY="your-api-key-here"

检查Anthropic API密钥：

$ANTHROPIC_API_KEY

环境变量。

bash

export ANTHROPIC_API_KEY="your-api-key-here"

Step 3: Run Optimization Rounds

步骤3：运行优化轮次

For each element, run the round structure above.

Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.

Model preference (in order):

```
claude-sonnet-4-5
```
(preferred — fast + smart)
```
claude-opus-4
```
(if highest quality needed)
Any claude-3.5+ model if the above aren't available

针对每个元素，运行上述轮次结构。

关键API效率规则： 始终将所有变体合并到单个提示词中，永远不要为单个变体单独调用API。10个变体的轮次 = 1次API调用。

模型优先级（从高到低）：

```
claude-sonnet-4-5
```
（首选——速度快 + 效果好）
```
claude-opus-4
```
（如果需要最高质量）
任意claude-3.5+模型（如果上述模型不可用）

Step 4: Cross-Breed (Multi-Element)

步骤4：交叉组合（多元素场景）

After all elements have winners:

Assemble the top winner from each element into a complete unit
Generate 5 holistic variants that naturally combine the winning elements
Score the complete units (not just individual parts)
Pick the winner with the highest holistic score

所有元素都得出优胜方案后：

将每个元素的第一名优胜方案组装成完整单元
生成5个整体变体，自然融合所有元素的优胜特征
对完整单元进行评分（而非单独给每个元素评分）
选择整体得分最高的方案作为最终优胜者

Step 5: Write Output Files

步骤5：写入输出文件

bash

undefined

bash

undefined

Create output directory

创建输出目录

mkdir -p data

Write optimized content

写入优化后内容

Write experiments JSON

写入实验JSON文件

Write optimization report

写入优化报告


**Experiments JSON structure:**
```json
{
  "run_id": "autoresearch-{name}-{timestamp}",
  "content_type": "landing_page",
  "source_file": "path/to/original",
  "min_score_threshold": 80,
  "rounds": [
    {
      "round": 1,
      "element": "hero_headline",
      "variants": [
        {
          "id": 1,
          "text": "...",
          "scores": {
            "cmo": 72,
            "skeptical_founder": 68,
            "cro": 75,
            "copywriter": 70,
            "founder": 65
          },
          "avg_score": 70
        }
      ],
      "top_3": [1, 4, 7],
      "winner_score": 82
    }
  ],
  "final_winner": {
    "hero_headline": "...",
    "subheadline": "...",
    "cta": "...",
    "holistic_score": 87
  }
}


**实验JSON结构：**
```json
{
  "run_id": "autoresearch-{name}-{timestamp}",
  "content_type": "landing_page",
  "source_file": "path/to/original",
  "min_score_threshold": 80,
  "rounds": [
    {
      "round": 1,
      "element": "hero_headline",
      "variants": [
        {
          "id": 1,
          "text": "...",
          "scores": {
            "cmo": 72,
            "skeptical_founder": 68,
            "cro": 75,
            "copywriter": 70,
            "founder": 65
          },
          "avg_score": 70
        }
      ],
      "top_3": [1, 4, 7],
      "winner_score": 82
    }
  ],
  "final_winner": {
    "hero_headline": "...",
    "subheadline": "...",
    "cta": "...",
    "holistic_score": 87
  }
}

Step 6: Report Back

步骤6：结果反馈

Summarize results to user:

Final winning score
Biggest score jump (which element improved most)
Top 2 runner-up alternatives (in case winner doesn't feel right)
Path to all 3 output files
Clear next step

向用户总结结果：

最终优胜方案得分
得分提升最大的元素（哪个元素优化效果最明显）
前2名备选方案（如果优胜方案不符合预期可选用）
3个输出文件的路径
清晰的下一步建议

User Options

用户可配置选项

Option	Default	Description
`elements`	all	Which elements to optimize
`variants_per_round`	10	How many variants to generate per round
`min_score`	80	Stop when this score is hit
`rounds`	3	Max rounds before stopping
`auto_apply`	false	Whether to overwrite the source file with winners
`content_type`	auto-detect	Force a content type if auto-detect is wrong

选项	默认值	说明
`elements`	全部	要优化的元素范围
`variants_per_round`	10	每轮生成的变体数量
`min_score`	80	达到该得分时终止优化
`rounds`	3	最大迭代轮次
`auto_apply`	false	是否用优胜方案覆盖源文件
`content_type`	自动识别	如果自动识别错误可手动指定内容类型

Quality Gates

质量门限

< 70: Don't ship. Something fundamental is broken.
70-79: Marginal. One more round targeting the lowest-scoring dimension.
80-84: Good. Shippable. Validate with real traffic.
85-89: Strong. Ship with confidence.
90+: Rare. Ship immediately.

< 70分： 不要上线，存在根本性问题。
70-79分： 刚达标，再针对最低分维度多做一轮优化。
80-84分： 良好，可以上线，用真实流量验证效果。
85-89分： 优秀，可以放心上线。
90+分： 非常罕见，可以直接上线。

Anti-Patterns to Avoid

需避免的反模式

Never call the API once per variant. Always batch. A 10-variant round = 1 call.
Don't over-optimize for one dimension. If you're hitting 95 on clarity but 45 on trust, the overall score is misleading.
Don't run more than 5 rounds. If you're not hitting 80 after 3 rounds, the problem is strategic (wrong positioning), not tactical (wrong words).
Don't cross-breed until each element has its own winner. Premature cross-breeding creates incoherent combinations.

永远不要为单个变体单独调用API，始终批量处理。 10个变体的轮次 = 1次API调用。
不要过度优化单一维度。 如果清晰度得分95，但信任得分只有45，整体得分没有参考意义。
不要运行超过5轮迭代。 如果3轮后还达不到80分，问题出在战略层面（定位错误），而非战术层面（措辞错误）。
不要在每个元素都得出优胜方案前进行交叉组合。 过早交叉组合会产生逻辑不连贯的方案。