algo-social-virality

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Viral Spread Models

病毒式传播模型

Overview

概述

Compartmental models (SIR, SIS, SEIR) model how content/information spreads through populations. Susceptible → Infected → Recovered mirrors unaware → sharing → stopped sharing. Key metric: R0 (basic reproduction number). Solves as ODEs in O(T × N) for T timesteps, N compartments.
仓室模型(SIR、SIS、SEIR)用于模拟内容/信息在人群中的传播过程。易感者(Susceptible)→ 感染者(Infected)→ 康复者(Recovered)的流程对应着未知情→分享→停止分享的行为。关键指标:R0(基本再生数)。通过求解常微分方程(ODEs)实现,时间复杂度为O(T × N),其中T为时间步长,N为仓室数量。

When to Use

使用场景

Trigger conditions:
  • Modeling how content spreads through a social network
  • Estimating whether a campaign will achieve viral threshold
  • Analyzing post-hoc spread dynamics of viral events
When NOT to use:
  • When predicting individual user behavior (use influence scoring)
  • When measuring engagement metrics (use engagement rate calculator)
触发场景:
  • 模拟内容在社交网络中的传播路径
  • 估算营销活动是否能达到病毒式传播阈值
  • 事后分析病毒式事件的传播动态
不适用场景:
  • 预测单个用户行为(使用影响力评分模型)
  • 衡量参与度指标(使用参与率计算器)

Algorithm

算法

IRON LAW: Viral Spread Occurs ONLY When R0 > 1
R0 = transmission rate (β) / recovery rate (γ).
Below R0 = 1, content dies out regardless of initial seed size.
Above R0 = 1, exponential growth phase begins before saturation.
Design interventions (seeding, incentives) to push R0 above threshold.
铁律:只有当R0 > 1时才会发生病毒式传播
R0 = 传播率(β)/ 恢复率(γ)。
当R0 < 1时,无论初始种子规模多大,内容都会逐渐消失。
当R0 > 1时,会先进入指数增长阶段,随后达到饱和。
可通过种子用户投放、激励措施等干预手段将R0推至阈值以上。

Phase 1: Input Validation

阶段1:输入验证

Define: population size (N), initial seed size (I₀), transmission rate (β — probability of sharing upon exposure), recovery rate (γ — rate of losing interest). Gate: Parameters non-negative, β and γ estimated from historical data or assumed.
定义:人群规模(N)、初始种子用户数(I₀)、传播率(β — 接触后分享的概率)、恢复率(γ — 失去兴趣的速率)。 校验规则: 参数需非负,β和γ可通过历史数据估算或假设取值。

Phase 2: Core Algorithm

阶段2:核心算法

SIR Model: dS/dt = -βSI/N, dI/dt = βSI/N - γI, dR/dt = γI
  1. Initialize: S=N-I₀, I=I₀, R=0
  2. Iterate using Euler method or RK4 at discrete timesteps
  3. Track peak infected (maximum simultaneous sharers) and total ever-infected
SIS variant: No recovery to immune state — recovered become susceptible again (recurring content).
SIR模型: dS/dt = -βSI/N, dI/dt = βSI/N - γI, dR/dt = γI
  1. 初始化:S=N-I₀, I=I₀, R=0
  2. 使用欧拉法或RK4法在离散时间步长上迭代计算
  3. 记录峰值感染数(同时分享的最大用户数)和总感染数(所有参与过分享的用户数)
SIS变体: 无免疫状态,康复者会重新变为易感者(适用于反复传播的内容)。

Phase 3: Verification

阶段3:结果验证

Check: S+I+R = N at all timesteps (conservation). Peak and final sizes plausible for given R0. Gate: Population conserved, dynamics consistent with R0.
校验:在所有时间步长上需满足S+I+R = N(人群守恒)。峰值和最终规模需与给定的R0相符。 校验规则: 人群数量守恒,传播动态与R0一致。

Phase 4: Output

阶段4:输出

Return time series of compartments and summary metrics.
返回仓室的时间序列数据及汇总指标。

Output Format

输出格式

json
{
  "time_series": [{"t": 0, "S": 9900, "I": 100, "R": 0}],
  "summary": {"R0": 2.5, "peak_infected": 3200, "peak_day": 12, "total_infected": 8500},
  "metadata": {"model": "SIR", "beta": 0.5, "gamma": 0.2, "population": 10000}
}
json
{
  "time_series": [{"t": 0, "S": 9900, "I": 100, "R": 0}],
  "summary": {"R0": 2.5, "peak_infected": 3200, "peak_day": 12, "total_infected": 8500},
  "metadata": {"model": "SIR", "beta": 0.5, "gamma": 0.2, "population": 10000}
}

Examples

示例

Sample I/O

输入输出示例

Input: N=10000, I₀=10, β=0.3, γ=0.1 (R0=3.0) Expected: Exponential growth, peak ~4000 at day ~15, total infected ~9500
输入: N=10000, I₀=10, β=0.3, γ=0.1(R0=3.0) 预期结果: 指数增长,约在第15天达到峰值4000,总感染数9500

Edge Cases

边缘案例

InputExpectedWhy
R0 = 0.8Rapid decayBelow threshold, dies out
I₀ = 1Slower start but same eventual dynamicsSingle seed takes longer to ignite
β = γ (R0=1)Linear, no growthCritical threshold, endemic equilibrium
输入预期结果原因
R0 = 0.8快速衰减低于阈值,内容逐渐消失
I₀ = 1启动较慢但最终动态一致单个种子用户需要更长时间引爆传播
β = γ (R0=1)线性增长,无指数扩张临界阈值,达到地方性平衡

Gotchas

注意事项

  • Homogeneous mixing assumption: SIR assumes everyone interacts equally. Real networks have hubs, clusters, and weak ties. Use network-based models for realistic spread.
  • Parameter estimation: β and γ are hard to estimate for social content. Use early spread data to fit parameters, then project.
  • Content ≠ disease: Unlike diseases, content sharing is voluntary and influenced by content quality, platform algorithms, and trends. Models give rough dynamics, not precise predictions.
  • Platform algorithms: Social media algorithms amplify or suppress content. The "transmission rate" is partly determined by the platform, not just user behavior.
  • Temporal dynamics: Content virality often has a much shorter lifecycle than disease (hours-days vs weeks-months). Adjust timescales accordingly.
  • 均匀混合假设: SIR模型假设所有用户的互动概率均等。但真实网络中存在枢纽节点、集群和弱连接。如需更真实的传播模拟,请使用基于网络的模型。
  • 参数估算: 针对社交内容的β和γ值很难估算。可利用早期传播数据拟合参数,再进行预测。
  • 内容≠疾病: 与疾病不同,内容分享是自愿行为,受内容质量、平台算法和趋势影响。模型仅提供大致的传播动态,而非精确预测。
  • 平台算法: 社交媒体算法会放大或抑制内容的传播。“传播率”部分由平台决定,并非完全由用户行为主导。
  • 时间动态: 内容的病毒式传播周期通常远短于疾病(小时/天 vs 周/月)。需相应调整时间尺度。

References

参考资料

  • For network-based epidemic models, see
    references/network-sir.md
  • For parameter estimation from early data, see
    references/parameter-fitting.md
  • 如需了解基于网络的流行病模型,请查看
    references/network-sir.md
  • 如需了解如何利用早期数据拟合参数,请查看
    references/parameter-fitting.md