ai-scaling-laws-amodei

Original：🇺🇸 English

Translated

Strategic guidance on AI scaling laws, capability trajectories, and building products at the frontier of AI capabilities. Use when users ask about AI scaling trends, capability forecasting, planning AI product development timelines, understanding pretraining vs reinforcement learning phases, interpreting AI benchmark improvements, deciding when to build AI products that don't quite work yet, or strategizing around rapidly advancing AI capabilities. Also triggers for questions about task horizon doubling, Jevons paradox in AI, or how to position products for future model improvements.

4installs

Sourcejona/ycombinator-skills

Added on2026-03-12

NPX Install

npx skill4agent add jona/ycombinator-skills ai-scaling-laws-amodei

SKILL.md Content

View Translation Comparison →

Scaling and the Road to Human-Level AI

Strategic framework for understanding AI scaling laws and building products that leverage predictable AI capability improvements.

Core Concepts

Two Phases of AI Training

Pretraining: Models learn to predict the next token by imitating human-written text, understanding underlying correlations in data.

Reinforcement Learning (RL): Models are optimized based on human feedback, reinforcing helpful/honest/harmless behaviors and discouraging harmful ones.

Scaling laws exist for both phases—performance improves predictably with increased compute, data, and parameters.

Key Metrics

Task Horizon: Length/complexity of tasks AI can complete, measured in equivalent human time
Elo Scores: Rating system measuring model preference comparisons
Context Window: Amount of information processable in a single conversation

Scaling Law Reliability

Scaling laws have held across 5+ orders of magnitude with physics-level precision. When scaling appears broken, assume training implementation issues first, not fundamental limits.

Strategic Decision Framework

Assess Current AI Capabilities

Use the two-axis capability framework:

Y-axis (Flexibility): What modalities can the model handle?
X-axis (Task Horizon): What equivalent human-time tasks can it complete?

Current trajectory: Task horizons double approximately every 7 months.

Product Timing Strategy

Current capability assessment:
├── Works reliably now → Build and ship immediately
├── Works 70-80% of time → Viable for error-tolerant use cases
├── Works marginally → Build now, ship when next model releases
└── Doesn't work at all → Wait 1-2 model generations

Key insight: Build products that don't quite work yet with current AI capabilities. Target capabilities slightly beyond current models—future models will make marginal products work.

Use Case Selection Criteria

Prioritize applications where:

70-80% accuracy is acceptable
Breadth of knowledge matters more than deep focus on one hard problem
Cross-domain synthesis creates value (biology + psychology + history)
Human review can catch and correct errors

Deprioritize applications requiring:

Near-perfect accuracy on first attempt
Deep specialized reasoning without verification
Tasks where errors compound catastrophically

Human-AI Collaboration Model

Role Division

Position humans as managers and sanity-checkers:

AI generates options and drafts
Humans verify, select, and course-correct
AI's judgment-generation gap is smaller than humans'

Leverage AI's Strengths

Breadth over depth: AI excels at synthesizing information across many domains simultaneously. Target applications requiring:

Literature synthesis across fields
Pattern recognition across diverse data sources
Rapid exploration of solution spaces

Practical Workflow

Define the task scope and success criteria
Have AI generate initial approach/draft
Review for sanity and strategic alignment
Iterate with targeted corrections
Use AI to refine based on feedback

Forecasting AI Capabilities

Timeline Estimation Method

To estimate when a capability becomes viable:

1. Identify current task horizon (what length tasks work reliably)
2. Apply 7-month doubling rule
3. Calculate generations needed:
   - Hour-long tasks → Day-long tasks: ~3 doublings (~21 months)
   - Day-long tasks → Week-long tasks: ~3 doublings (~21 months)
   - Week-long tasks → Month-long tasks: ~4 doublings (~28 months)

Self-Correction Multiplier

Each improvement in a model's ability to notice and correct its own mistakes roughly doubles task horizon length. Factor this into capability forecasts.

Integration Strategy

Avoid the Steam Engine Mistake

Don't just replace existing processes with AI equivalents. Redesign entire systems around AI capabilities (electricity adoption analogy—factories were redesigned around electric motors, not just swapping steam for electric).

Accelerate Adoption

Use AI to integrate AI into products and businesses. The bottleneck is adoption speed, not capability. When facing integration challenges:

Have AI analyze your current workflow
Identify substitution points and redesign opportunities
Prototype with AI assistance
Iterate rapidly

Jevons Paradox Awareness

Expect that increased AI efficiency leads to increased consumption, not decreased cost. Plan for:

More AI usage as capabilities improve
New use cases emerging from better performance
Expanding scope rather than shrinking budgets

Diagnostic Framework

When Scaling Appears Broken

Before concluding a capability limit exists:

Verify training/prompting methodology
Check for data quality issues
Test with alternative approaches
Compare against scaling law predictions

Default assumption: Implementation issues, not fundamental limits.

Evaluating Model Improvements

Compare new models against:

Expected scaling law trajectory
Task horizon benchmarks
Cross-domain performance consistency

Deviations from smooth improvement suggest training issues worth investigating.

Example Applications

Product Development Decision

Scenario: Building an AI code review tool

Assessment:
- Current models: Reliable for single-file reviews (~minutes)
- Target capability: Full PR reviews with context (~hours)
- Gap: ~2-3 doublings needed

Decision: Build now with single-file scope, architecture for expansion.
Ship current capability, expand automatically as models improve.

Capability Targeting

Scenario: Choosing between deep analysis vs broad synthesis features

AI strength analysis:
- Deep focus on one hard problem: Human-competitive, not superior
- Synthesizing across 10 domains: Clear AI advantage

Decision: Prioritize cross-domain synthesis features.
Example: Research assistant that connects findings across biology,
psychology, and economics papers simultaneously.

Timeline Planning

Scenario: When will AI handle week-long research projects reliably?

Current state (2024): Hour-long tasks reliable
Doubling rate: ~7 months

Calculation:
- Hour → Day: 3 doublings = 21 months
- Day → Week: 3 doublings = 21 months
- Total: ~42 months (rough estimate)

Planning implication: Build infrastructure now, expect capability 2027-2028.