kotlin-coroutines-structured-concurrency

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kotlin coroutines: structured concurrency

Kotlin协程:结构化并发

Core principle

核心原则

A well-structured coroutine is a self-contained unit of asynchronous work — single entry, single exit, scoped to a lifecycle known at the call site.
Scopes should usually be tied to the caller's lifecycle, not stored as a property on the callee. A stored
CoroutineScope
is a strong review signal: the class must prove it owns cancellation, error reporting, restart behavior, and lifecycle. Most repositories, managers, use cases, and data sources cannot prove that, so they should expose
suspend
APIs instead.
The fix is almost always the same: make the API
suspend
and let the caller own the scope.
结构良好的协程是一个独立的异步工作单元——单一入口、单一出口,作用域绑定到调用方已知的生命周期。
作用域通常应与调用方的生命周期绑定,而不是作为被调用方的属性存储。 存储
CoroutineScope
是一个强烈的审查信号:该类必须证明它拥有取消、错误报告、重启行为和生命周期的控制权。大多数仓库、管理器、用例和数据源都无法证明这一点,因此它们应该暴露
suspend
API。
修复方案几乎总是相同的:将API改为
suspend
,让调用方拥有作用域。

When to use this skill

适用场景

You're writing or reviewing Kotlin code and you see any of these:
  • A class with
    private val scope: CoroutineScope
    (constructor param stored as a property)
  • An
    init { scope.launch { ... } }
    block
  • A non-suspending public function whose body is
    scope.launch { ... }
  • runBlocking { ... }
    in suspend-capable application code, or in tests where
    runTest
    should apply
  • runCatching { suspendCall() }
    or a
    catch
    on
    Exception
    /
    Throwable
    around a
    suspend
    call without rethrowing
    CancellationException
  • A
    catch (e: CancellationException)
    (or equivalent) around suspension that does not rethrow
当你编写或审查Kotlin代码时,遇到以下任意情况即可使用本技能:
  • 类中包含
    private val scope: CoroutineScope
    (构造函数参数被存储为属性)
  • 存在
    init { scope.launch { ... } }
    代码块
  • 非挂起的公共函数体为
    scope.launch { ... }
  • 在支持挂起的应用代码中,或在应使用
    runTest
    的测试中出现
    runBlocking { ... }
  • 在挂起调用周围使用
    runCatching { suspendCall() }
    或捕获
    Exception
    /
    Throwable
    但未重新抛出
    CancellationException
  • 在挂起操作周围捕获
    CancellationException
    (或等效操作)但未重新抛出

The silent-cancellation bug

静默取消bug

The reason an unowned
CoroutineScope
property is so dangerous: once a scope is cancelled, every future
launch
on it silently completes as cancelled — no exception, no log, nothing.
The work just doesn't happen. This is one of the hardest coroutine bugs to diagnose, and it appears when a class holds a long-lived reference to a lifecycle it does not own.
If APIs are
suspend
, this can't happen: the caller's scope is either alive (work runs) or the call site cancels (the caller knows).
未被正确管控的
CoroutineScope
属性之所以危险,原因在于:一旦作用域被取消,后续在该作用域上发起的所有
launch
都会静默地以取消状态完成——无异常、无日志、无任何提示。任务只是未执行。
这是最难排查的协程bug之一,当类持有一个不属于它的生命周期的长期引用时就会出现。
如果API是
suspend
类型,就不会出现这种情况:调用方的作用域要么处于活跃状态(任务执行),要么调用方主动取消(调用方已知晓)。

Anti-patterns and fixes

反模式与修复方案

1. CoroutineScope stored as a property

1. CoroutineScope存储为属性

kotlin
// ❌ BAD
@Inject
class UserRepository(
    private val scope: CoroutineScope,
    private val api: UserApi,
) {
    fun refresh() {
        scope.launch { _state.value = api.fetchUser() }
    }
}

// ✅ GOOD
@Inject
class UserRepository(
    private val api: UserApi,
) {
    suspend fun refresh(): User = api.fetchUser()
}
The repository no longer needs to know about coroutines at all. The caller (a ViewModel, a use case) decides on what scope, with what error handling, with what cancellation semantics.
kotlin
// ❌ BAD
@Inject
class UserRepository(
    private val scope: CoroutineScope,
    private val api: UserApi,
) {
    fun refresh() {
        scope.launch { _state.value = api.fetchUser() }
    }
}

// ✅ GOOD
@Inject
class UserRepository(
    private val api: UserApi,
) {
    suspend fun refresh(): User = api.fetchUser()
}
仓库不再需要了解协程相关内容。调用方(如ViewModel、用例)决定使用哪个作用域、采用何种错误处理方式以及取消语义。

2. init-block launches

2. init代码块中启动协程

kotlin
// ❌ BAD: construction-time side effect, unbounded work
class UserSession(private val scope: CoroutineScope, private val api: Api) {
    init { scope.launch { _user.value = api.load() } }
}
The constructor returns immediately. The caller can't
await
the load, can't see errors, can't cancel. The class is "alive" but its state is undefined.
kotlin
// ✅ GOOD: explicit bootstrap, caller owns the suspension
class UserSession(private val api: Api) {
    private var _user: User? = null
    val user: User get() = checkNotNull(_user) { "Call init() first" }

    suspend fun init() { _user = api.load() }
}
kotlin
// ❌ BAD: 构造时产生副作用,任务无边界
class UserSession(private val scope: CoroutineScope, private val api: Api) {
    init { scope.launch { _user.value = api.load() } }
}
构造函数会立即返回。调用方无法
await
加载完成、无法查看错误、无法取消任务。类处于“活跃”状态,但内部状态未定义。
kotlin
// ✅ GOOD: 显式初始化,调用方管控挂起
class UserSession(private val api: Api) {
    private var _user: User? = null
    val user: User get() = checkNotNull(_user) { "Call init() first" }

    suspend fun init() { _user = api.load() }
}

3. Fire-and-forget from non-UI classes

3. 非UI类中的“即发即弃”模式

A non-suspending public function on a non-UI class (repository, manager, use case, data source) that launches into a class-owned scope. The caller gets no result, no error, no cancellation, and no guarantee the work ever ran.
kotlin
// ❌ BAD — repository with stored scope and fire-and-forget public API
class AnalyticsClient(private val scope: CoroutineScope, private val api: Api) {
    fun track(event: Event) {
        scope.launch { api.send(event) }      // caller has no idea what happens
    }
    fun signOut() {
        scope.launch { api.signOut() }        // silent failure if scope cancelled
    }
}
kotlin
// ✅ GOOD
class AnalyticsClient(private val api: Api) {
    suspend fun track(event: Event) = api.send(event)
    suspend fun signOut() = api.signOut()
}
非UI类(仓库、管理器、用例、数据源)中的非挂起公共函数,在类自身的作用域中启动协程。调用方无法获取结果、无法感知错误、无法取消任务,也无法保证任务是否执行。
kotlin
// ❌ BAD — 存储作用域且使用“即发即弃”公共API的仓库
class AnalyticsClient(private val scope: CoroutineScope, private val api: Api) {
    fun track(event: Event) {
        scope.launch { api.send(event) }      // 调用方对执行情况一无所知
    }
    fun signOut() {
        scope.launch { api.signOut() }        // 若作用域已取消则静默失败
    }
}
kotlin
// ✅ GOOD
class AnalyticsClient(private val api: Api) {
    suspend fun track(event: Event) = api.send(event)
    suspend fun signOut() = api.signOut()
}

Carve-out: the UI ↔ state-holder boundary

例外情况:UI ↔ 状态持有者边界

UI frameworks are non-suspending. A Composable's
onClick
, a Fragment's
onKeyEvent
, an Activity's
onNewIntent
— none can
suspend
. The state holder (ViewModel, Decompose Component, feature model, etc. — anything whose role is to absorb UI events and hold UI state) is the boundary that translates one-shot UI events into asynchronous work bound to the UI lifecycle. That's its job.
kotlin
// ✅ GOOD — state holder absorbs a non-suspending UI event onto its scope
class FavouritesViewModel(private val repo: FavouritesRepository) : ViewModel() {
    fun onToggleFavourite(item: Item) {
        viewModelScope.launch { repo.toggleFavourite(item) }
    }
}

// in Compose:
ListItem(onClick = { viewModel.onToggleFavourite(item) })
This is not the fire-and-forget anti-pattern. All three conditions must hold:
  1. State holder for a UI surface — a ViewModel, Decompose Component, feature model, or equivalent UI state holder. Not a repository, manager, use case, or data source.
  2. Lifecycle-bound scope
    viewModelScope
    , a Component's
    coroutineScope
    that's cancelled on destroy, a Composable's
    rememberCoroutineScope()
    . Not
    AppScope
    , not an injected long-lived scope, not an ad-hoc
    CoroutineScope(...)
    .
  3. Caller really is a UI event — Composable callback, key handler, lifecycle hook. Not another business-logic class calling through the state holder.
The repository / use case / data source layers underneath still expose
suspend
APIs. The state holder is the only layer where the non-suspending → suspending translation belongs.
"It feels like a state holder" isn't enough. The question is "does the UI directly bind to this?" If no, the carve-out doesn't apply.
UI框架不支持挂起。Composable的
onClick
、Fragment的
onKeyEvent
、Activity的
onNewIntent
——这些都无法
suspend
。状态持有者(ViewModel、Decompose组件、功能模型等——任何负责接收UI事件并持有UI状态的角色)正是将一次性UI事件转换为绑定UI生命周期的异步任务的边界层,这是它的职责所在。
kotlin
// ✅ GOOD — 状态持有者将非挂起UI事件转换到自身作用域
class FavouritesViewModel(private val repo: FavouritesRepository) : ViewModel() {
    fun onToggleFavourite(item: Item) {
        viewModelScope.launch { repo.toggleFavourite(item) }
    }
}

// 在Compose中:
ListItem(onClick = { viewModel.onToggleFavourite(item) })
不属于“即发即弃”反模式。需同时满足以下三个条件:
  1. UI界面的状态持有者——ViewModel、Decompose组件、功能模型或等效的UI状态持有者。而非仓库、管理器、用例或数据源。
  2. 绑定生命周期的作用域——
    viewModelScope
    、组件的
    coroutineScope
    (销毁时取消)、Composable的
    rememberCoroutineScope()
    。而非
    AppScope
    、注入的长期作用域或临时创建的
    CoroutineScope(...)
  3. 调用方确实是UI事件——Composable回调、按键处理器、生命周期钩子。而非其他业务逻辑类通过状态持有者调用。
底层的仓库/用例/数据源层仍需暴露
suspend
API。状态持有者是唯一适合完成“非挂转挂”转换的层级。
仅仅“感觉像状态持有者”还不够,关键问题是“UI是否直接绑定到它?”如果不是,则不适用此例外。

4. Stored scopes that aren't injected

4. 未注入的存储作用域

The same anti-pattern, without an injected scope:
kotlin
// ❌ BAD — same problem, scope is constructed in-class instead of injected
class FooManager {
    private val scope = MainScope()
    private val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
}
Lifecycle is now owned by nothing and lives forever. Replace with
suspend
APIs.
The same is true if the instantiation is nested inside a function body —
fun foo() { CoroutineScope(...).launch { … } }
is just a stored scope with extra steps. Each call leaks a new uncancellable scope; bundling it into a
by lazy
property doesn't fix the underlying issue (the scope shouldn't exist at all).
同样的反模式,只是作用域未通过注入获取:
kotlin
// ❌ BAD — 问题相同,只是作用域在类内部构造而非注入
class FooManager {
    private val scope = MainScope()
    private val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
}
此时生命周期不受任何对象管控,会永久存在。替换为
suspend
API即可。
如果在函数体内嵌套创建作用域——
fun foo() { CoroutineScope(...).launch { … } }
——本质上只是存储作用域的变体,每次调用都会泄漏一个新的不可取消作用域;即使通过
by lazy
属性封装也无法解决根本问题(该作用域本就不该存在)。

5. DI-bound singletons / initializers that launch

5. DI绑定的单例/初始化器中启动协程

A specific pattern that is hard to spot: a DI-bound class (
@SingleIn(AppScope)
,
@Singleton
, an
Initializer.initialize()
) launches a coroutine from its constructor /
init
block /
initialize()
. The launched work then has:
  • A non-deterministic start time — whenever the graph realizes the binding. Cold-start ordering is invisible.
  • No observable lifecycle. Nothing else in the codebase can see whether it's running or has crashed.
  • No
    stop()
    / restart path.
    If upstream enters a bad state, the loop is uncancellable.
  • No calling code to grep for. Readers can't find "who starts this and when".
§1 says scopes should be tied to the caller's lifecycle. The DI-bound variant violates this indirectly: the scope may be injected, but the launch is hidden inside construction — same effect, harder to see.
kotlin
// ❌ BAD — singleton boots work as a side effect of being constructed
@SingleIn(AppScope::class)
@Inject
class TokenRefresher(
    @ForScope(AppScope::class) private val scope: CoroutineScope,
    private val auth: AuthService,
) {
    init {
        scope.launch {
            while (isActive) {
                delay(5.minutes)
                auth.refreshIfNeeded()
            }
        }
    }
}

// ❌ ALSO BAD — Initializer.initialize() that *launches*, not just registers
class TokenInvalidatorInitializer @Inject constructor(
    @ForScope(AppScope::class) private val scope: CoroutineScope,
    private val store: AuthStore,
    private val invalidator: TokenInvalidator,
) : Initializer {
    override fun initialize() {
        scope.launch { store.tokenChanges.collect { invalidator.invalidate() } }
    }
}
Both look like "application-scoped singletons", but the When NOT to apply carve-out is not permission to launch from
init
/
initialize()
. It's permission for a singleton to own a scope when its API is suspending.
一种难以察觉的特定模式:DI绑定的类(
@SingleIn(AppScope)
@Singleton
Initializer.initialize()
)在构造函数/
init
块/
initialize()
中启动协程。启动的任务会具有以下特性:
  • 启动时间不确定——依赖DI容器何时解析绑定。冷启动顺序不可见。
  • 生命周期不可观测——代码库中其他部分无法知晓它是否在运行或已崩溃。
  • 无停止/重启路径——若上游进入异常状态,循环任务无法被取消。
  • 无法通过搜索定位启动代码——阅读者无法找到“谁在何时启动它”。
第1节提到作用域应与调用方的生命周期绑定。DI绑定的变体间接违反了这一点:虽然作用域是注入的,但启动操作隐藏在构造过程中——效果相同,更难发现。
kotlin
// ❌ BAD — 单例在构造时启动任务作为副作用
@SingleIn(AppScope::class)
@Inject
class TokenRefresher(
    @ForScope(AppScope::class) private val scope: CoroutineScope,
    private val auth: AuthService,
) {
    init {
        scope.launch {
            while (isActive) {
                delay(5.minutes)
                auth.refreshIfNeeded()
            }
        }
    }
}

// ❌ ALSO BAD — Initializer.initialize()启动协程而非仅注册
class TokenInvalidatorInitializer @Inject constructor(
    @ForScope(AppScope::class) private val scope: CoroutineScope,
    private val store: AuthStore,
    private val invalidator: TokenInvalidator,
) : Initializer {
    override fun initialize() {
        scope.launch { store.tokenChanges.collect { invalidator.invalidate() } }
    }
}
两者看似是“应用级单例”,但不适用场景中的例外情况并非允许从
init
/
initialize()
中启动协程
——而是允许单例在API为挂起类型时拥有作用域。

First ask: does this background-loop class need to exist at all?

首先思考:这个后台循环类真的有必要存在吗?

Most background-loop classes exist only because no one inverted the observation. Three answers, in order of preference:
Pattern 1 — invert into the consumer. The class observes state forever to react when it changes. But someone mutates the state — sign-out flow, profile switch, flag-update handler. That mutation site is already in a coroutine context and is the natural place to do the work directly.
kotlin
// ✅ GOOD — no background loop, no scope, no class. The mutation site does the work.
class Authenticator(
    private val authStore: AuthStore,
    private val tokenInvalidator: TokenInvalidator,
) {
    suspend fun signOut() {
        authStore.clearTokens()
        tokenInvalidator.invalidate()   // direct call at the mutation site
    }
}
The background-loop class is deleted. The work happens where the state changes.
When this applies: the consumer of the state has a clear lifecycle (a use case, an Authenticator, a service handler) and can perform the reaction inline.
Pattern 2 — scheduled work. Genuinely periodic or deferred. Use WorkManager / BGTaskScheduler. The enqueue is one-shot; make it suspending and call it once from an orchestrator that already runs at startup.
Pattern 3 — explicit named launch site. Sometimes the consumer is a synchronous API with no observable lifecycle (e.g., OpenTelemetry's
Sampler.shouldSample(...)
, an AIDL stub fanout, a broadcast receiver bridge). The observation has to live somewhere coroutine-aware, but it must live at an explicit named call site — not in the class's own
init
.
kotlin
// ✅ GOOD — work is named; an explicit call site owns the launch
@SingleIn(AppScope::class)
class OtelConfigurableSampler(...) : Sampler {
    @Volatile private var delegate: Sampler = ...

    suspend fun observeRate(featureFlags: FeatureFlags) {
        featureFlags.observe(OTEL_SAMPLING_RATE).collect { rate ->
            delegate = Sampler.traceIdRatioBased(rate.coerceIn(0.0, 1.0))
        }
    }

    override fun shouldSample(...) = delegate.shouldSample(...)
}

// wired explicitly at the OTel SDK init module:
applicationScope.launch { otelSampler.observeRate(featureFlags) }
When this applies: the consumer is a synchronous API that calls into you with no observable lifecycle. The launch can't be invertible, but it must still be visible at a named call site.
大多数后台循环类的存在只是因为没有人反转观察逻辑。以下是三种优先选择的解决方案:
模式1 — 转移到消费者端。该类持续观察状态以做出响应,但总有对象会修改状态——比如退出登录流程、切换用户、标志更新处理器。修改状态的代码已经处于协程上下文,是直接执行任务的天然位置。
kotlin
// ✅ GOOD — 无后台循环、无作用域、无额外类。状态修改处直接执行任务
class Authenticator(
    private val authStore: AuthStore,
    private val tokenInvalidator: TokenInvalidator,
) {
    suspend fun signOut() {
        authStore.clearTokens()
        tokenInvalidator.invalidate()   // 在状态修改处直接调用
    }
}
后台循环类被删除,任务在状态变化时执行。
适用场景:状态的消费者具有明确的生命周期(如用例、Authenticator、服务处理器),可以内联执行响应逻辑。
模式2 — 调度任务。确实需要周期性或延迟执行的任务,使用WorkManager/BGTaskScheduler。入队操作是一次性的,将其改为挂起函数并在启动时由编排器调用一次。
模式3 — 显式命名的启动点。有时消费者是无生命周期可观测的同步API(如OpenTelemetry的
Sampler.shouldSample(...)
、AIDL存根分发、广播接收器桥接)。观察逻辑必须放在支持协程的位置,但必须位于显式命名的调用点——而非类自身的
init
块中。
kotlin
// ✅ GOOD — 任务有明确名称;显式调用点管控启动
@SingleIn(AppScope::class)
class OtelConfigurableSampler(...) : Sampler {
    @Volatile private var delegate: Sampler = ...

    suspend fun observeRate(featureFlags: FeatureFlags) {
        featureFlags.observe(OTEL_SAMPLING_RATE).collect { rate ->
            delegate = Sampler.traceIdRatioBased(rate.coerceIn(0.0, 1.0))
        }
    }

    override fun shouldSample(...) = delegate.shouldSample(...)
}

// 在OTel SDK初始化模块中显式配置:
applicationScope.launch { otelSampler.observeRate(featureFlags) }
适用场景:消费者是同步API,且调用方无生命周期可观测。启动逻辑无法反转,但必须在命名调用点可见。

Test for which pattern fits

测试哪种模式适用

"Is the consumer's lifecycle observable to me?"
  • Yes, and they're already in a coroutine context → Pattern 1. Push the subscription into them; delete the background-loop class.
  • The work is periodic / deferred → Pattern 2. Suspend enqueue called once.
  • No, they're a synchronous API with no observable lifecycle → Pattern 3. Explicit launch site, not
    init
    .
If a fourth answer seems to fit — e.g., "I want a
Bootable
interface that launches everything for me" — that's the same anti-pattern with an extra layer of abstraction. The whole point is that launches be visible; auto-discovery by interface defeats it.
“我能否观测到消费者的生命周期?”
  • 是,且消费者已处于协程上下文 → 模式1。将订阅逻辑转移到消费者端,删除后台循环类。
  • 任务是周期性/延迟执行的 → 模式2。挂起入队函数仅调用一次。
  • 否,消费者是无生命周期可观测的同步API → 模式3。使用显式启动点,而非
    init
    块。
如果出现第四种情况——比如“我想要一个
Bootable
接口来自动启动所有任务”——这只是增加了一层抽象的反模式。核心目标是让启动操作可见,通过接口自动发现会违背这一点。

Initializers are still fine — if they only register

初始化器仍可正常使用——仅当它们仅执行注册操作时

The
Initializer
pattern is correct when
initialize()
registers a listener or hook. The bug is when
initialize()
launches a coroutine.
kotlin
// ✅ GOOD Initializer — registers a contributor, doesn't launch
class FavouritesContributorInitializer @Inject constructor(
    private val registry: ContributorRegistry,
    private val favouritesContributor: FavouritesContributor,
) : Initializer {
    override fun initialize() {
        registry.register(favouritesContributor)
    }
}
Initializer.initialize()
must not
launch
a coroutine.
If yours does, it's a Pattern 1/2/3 candidate.
initialize()
仅注册监听器或钩子时,
Initializer
模式是正确的。错误出现在
initialize()
启动协程时。
kotlin
// ✅ GOOD Initializer — 仅注册贡献者,不启动协程
class FavouritesContributorInitializer @Inject constructor(
    private val registry: ContributorRegistry,
    private val favouritesContributor: FavouritesContributor,
) : Initializer {
    override fun initialize() {
        registry.register(favouritesContributor)
    }
}
Initializer.initialize()
不得启动协程
。如果你的初始化器这么做了,应考虑使用模式1/2/3。

Diagnostic for review

审查诊断要点

  • Where is the start moment defined? If "wherever DI realizes me", bad.
  • Who can observe whether the work is running? If "no one", bad.
  • Who can stop or restart it? If "no one", bad.
  • Can a reader grep for the launch site? If no, bad.
If the answers are "the consumer / the orchestrator / the named call site" — you're good.
  • 启动时机在哪里定义?如果是“DI解析该类时”,则存在问题。
  • 谁能观测任务是否在运行?如果是“无人能观测”,则存在问题。
  • 谁能停止或重启任务?如果是“无人能操作”,则存在问题。
  • 阅读者能否通过搜索找到启动点?如果不能,则存在问题。
如果答案是“消费者/编排器/命名调用点”——则没问题。

6. Swallowing
CancellationException

6. 吞掉
CancellationException

A
catch
clause around a
suspend
call that matches
CancellationException
— directly, or through
Exception
/
Throwable
— and doesn't rethrow usually turns cancellation into silent success. The parent coroutine thinks the child finished; the child keeps running (or its side effects do); the cancellation contract is broken.
Same failure shape as §1's stored-scope bug, viewed from the other end: §1 hides the work from the caller's lifecycle; this hides cancellation from the work.
kotlin
// ❌ BAD — catches CancellationException, never rethrows
suspend fun fetch() {
    try {
        api.load()
    } catch (e: Exception) {           // matches CancellationException too
        logger.warn("load failed", e)
    }
}

// ❌ ALSO BAD — runCatching has the same problem
suspend fun fetch() {
    runCatching { api.load() }
        .onFailure { logger.warn("load failed", it) }
}
The acceptable shapes:
kotlin
// ✅ Separate catch first
try { api.load() }
catch (e: CancellationException) { throw e }
catch (e: Exception) { logger.warn("load failed", e) }

// ✅ Conditional rethrow inside the broad catch
try { api.load() }
catch (e: Exception) {
    if (e is CancellationException) throw e
    logger.warn("load failed", e)
}

// ✅ ensureActive() — good when the catch handles ordinary failures and you only need
// to rethrow if the current coroutine is cancelled
try { api.load() }
catch (e: Exception) {
    currentCoroutineContext().ensureActive()
    logger.warn("load failed", e)
}

// ✅ runCatching with explicit guard
runCatching { api.load() }
    .onFailure {
        if (it is CancellationException) throw it
        logger.warn("load failed", it)
    }

// ✅ runCatching terminated with getOrThrow (cancellation flows back out)
runCatching { api.load() }.getOrThrow()
The trigger is "a suspend call inside the
try
", not "the enclosing function is declared
suspend
". This applies inside any suspending body —
suspend fun
, a
launch { … }
lambda, a Flow
collect { … }
, etc.
The common carve-out is an intentionally local timeout: catching
TimeoutCancellationException
from your own
withTimeout
and converting it to a domain result can be correct. Keep that catch narrow and close to the timeout. Do not use it as permission to swallow arbitrary cancellation.
Catching a non-cancellation subtype (
IOException
, your own exception types) is fine — they don't extend
CancellationException
.
挂起调用周围的
catch
块捕获
CancellationException
(直接捕获或通过
Exception
/
Throwable
间接捕获)但未重新抛出,通常会将取消操作转换为静默成功。父协程认为子协程已完成;子协程继续运行(或其副作用继续执行);取消契约被破坏。
这与第1节中存储作用域bug的失败形式类似,只是视角相反:第1节是将任务与调用方生命周期隔离;本节是将取消操作与任务隔离。
kotlin
// ❌ BAD — 捕获CancellationException但未重新抛出
suspend fun fetch() {
    try {
        api.load()
    } catch (e: Exception) {           // 也匹配CancellationException
        logger.warn("load failed", e)
    }
}

// ❌ ALSO BAD — runCatching存在同样问题
suspend fun fetch() {
    runCatching { api.load() }
        .onFailure { logger.warn("load failed", it) }
}
可接受的写法:
kotlin
// ✅ 先单独捕获CancellationException
try { api.load() }
catch (e: CancellationException) { throw e }
catch (e: Exception) { logger.warn("load failed", e) }

// ✅ 在宽泛捕获中条件性重新抛出
try { api.load() }
catch (e: Exception) {
    if (e is CancellationException) throw e
    logger.warn("load failed", e)
}

// ✅ ensureActive() — 当捕获仅处理普通失败且只需在当前协程被取消时重新抛出时适用
try { api.load() }
catch (e: Exception) {
    currentCoroutineContext().ensureActive()
    logger.warn("load failed", e)
}

// ✅ 带显式防护的runCatching
runCatching { api.load() }
    .onFailure {
        if (it is CancellationException) throw it
        logger.warn("load failed", it)
    }

// ✅ 使用getOrThrow终止runCatching(取消操作会向外传递)
runCatching { api.load() }.getOrThrow()
触发条件是“
try
块内包含挂起调用”,而非“外层函数声明为
suspend
”。这适用于任何挂起体——
suspend fun
launch { … }
lambda、Flow
collect { … }
等。
常见的例外情况是有意设置的本地超时:捕获自身
withTimeout
抛出的
TimeoutCancellationException
并转换为领域结果是可行的。但需确保捕获范围窄且靠近超时操作,不得以此为借口吞掉任意取消操作。
捕获非取消类型的异常(
IOException
、自定义异常类型)是没问题的——它们不继承自
CancellationException

7.
runBlocking

7.
runBlocking

runBlocking
parks the current thread until the lambda finishes. Inside suspend-capable or lifecycle-scoped application paths it is wrong: a thread that meant to be async is now blocked, structured concurrency is broken, and any cancellation upstream has no effect. It is the "callee makes a structural decision for the caller" anti-pattern at its most direct.
kotlin
// ❌ BAD — bridging to suspend by blocking the calling thread
fun saveUser(user: User) {
    runBlocking { repository.save(user) }
}
Three fixes, by context:
Suspend-capable application code — make the function
suspend
:
kotlin
// ✅ GOOD
suspend fun saveUser(user: User) = repository.save(user)
If the immediate caller can't suspend either (a non-suspending UI callback, a
BroadcastReceiver
hook), use the existing lifecycle-bound scope at the boundary — see §3's UI ↔ state-holder carve-out. The fix is at the boundary, not inside
saveUser
.
Legitimate blocking boundaries exist:
main
in a CLI tool, Java interop APIs that must return synchronously, framework callbacks with no suspending alternative, and migration shims. Keep
runBlocking
at that outer boundary, keep the body small, and call suspending code immediately.
Tests — use
runTest
:
kotlin
// ❌ BAD — real time, slow tests, no virtual delay
@Test fun loadsUser() = runBlocking {
    assertThat(repository.load().name).isEqualTo("Alice")
}

// ✅ GOOD
@Test fun loadsUser() = runTest {
    assertThat(repository.load().name).isEqualTo("Alice")
}
runTest
gives you virtual time (
delay()
returns immediately),
TestDispatcher
integration, and proper coroutine cleanup. Real-time
runBlocking
in tests makes them slow and flaky.
ContentProvider
carve-out
— Android's
ContentProvider
methods (
query
,
insert
,
update
,
delete
,
onCreate
,
call
) are synchronous from outside the process. There is no way to suspend them. Inside member functions of a
ContentProvider
subclass (direct or indirect — not companion objects),
runBlocking
is the unavoidable bridge. Keep the body as short as possible and call into suspending code immediately:
kotlin
// ✅ Acceptable in ContentProvider members only
class MyProvider : ContentProvider() {
    override fun query(...): Cursor? = runBlocking { dao.query(...) }
}
This carve-out is for
android.content.ContentProvider
subclasses only. "It's like a
ContentProvider
" doesn't apply, and a
runBlocking
in a
ContentProvider
's companion object is still a regular violation — the helper isn't part of the framework's synchronous surface.
runBlocking
会阻塞当前线程直到lambda执行完成。在支持挂起或绑定生命周期的应用流程中使用它是错误的:本应异步执行的线程被阻塞,结构化并发被破坏,上游的任何取消操作都无法生效。这是最直接的“被调用方为调用方做出结构性决策”的反模式。
kotlin
// ❌ BAD — 通过阻塞调用线程桥接挂起函数
fun saveUser(user: User) {
    runBlocking { repository.save(user) }
}
根据上下文有三种修复方案:
支持挂起的应用代码——将函数改为
suspend
kotlin
// ✅ GOOD
suspend fun saveUser(user: User) = repository.save(user)
如果直接调用方也无法挂起(如非挂起UI回调、
BroadcastReceiver
钩子),则在边界层使用已有的绑定生命周期的作用域——参见第3节的UI ↔ 状态持有者例外情况。修复应在边界层进行,而非
saveUser
内部。
合法的阻塞边界确实存在:CLI工具的
main
函数、必须同步返回的Java互操作API、无挂起替代方案的框架回调、迁移垫片。将
runBlocking
放在最外层边界,保持函数体简洁,并立即调用挂起代码。
测试代码——使用
runTest
kotlin
// ❌ BAD — 实时执行,测试缓慢,无虚拟延迟
@Test fun loadsUser() = runBlocking {
    assertThat(repository.load().name).isEqualTo("Alice")
}

// ✅ GOOD
@Test fun loadsUser() = runTest {
    assertThat(repository.load().name).isEqualTo("Alice")
}
runTest
提供虚拟时间(
delay()
立即返回)、
TestDispatcher
集成以及正确的协程清理机制。测试中使用实时
runBlocking
会导致测试缓慢且不稳定。
ContentProvider
例外情况
——Android的
ContentProvider
方法(
query
insert
update
delete
onCreate
call
)从进程外部调用时是同步的。无法将它们改为挂起。在
ContentProvider
子类的成员函数中(直接或间接——非伴生对象),
runBlocking
是不可避免的桥接方式。保持函数体尽可能简洁并立即调用挂起代码:
kotlin
// ✅ 仅在ContentProvider成员函数中可接受
class MyProvider : ContentProvider() {
    override fun query(...): Cursor? = runBlocking { dao.query(...) }
}
此例外仅适用于
android.content.ContentProvider
子类。“它类似
ContentProvider
”不适用,且
ContentProvider
伴生对象中的
runBlocking
仍属于常规违规——该辅助函数不属于框架的同步接口。

Quick reference

快速参考

SymptomAnti-patternFix
Class has
private val scope: CoroutineScope
Stored scope on the calleeRemove. Make public APIs
suspend
.
init { scope.launch { ... } }
Construction-time launchMove to
suspend fun init()
/
login()
fun foo() { scope.launch { ... } }
on a repository/manager/use case
Fire-and-forget from non-UI class
suspend fun foo()
, let UI state holder pick the scope
fun onClick() { viewModelScope.launch { ... } }
on a state holder, called from UI
UI ↔ state-holder boundary — fineKeep as-is (see §3 carve-out)
private val scope = MainScope()
Internally-constructed stored scopeSame — remove, make APIs
suspend
@SingleIn(AppScope) class X(scope) { init { scope.launch { … } } }
DI-bound opaque launch (§5)Expose
suspend fun run()
, launch from startup orchestrator
class Y : Initializer { override fun initialize() { scope.launch { … } } }
Initializer that launches, not registers (§5)Same —
suspend fun run()
, orchestrator owns lifecycle
try { suspendCall() } catch (e: Exception|Throwable|CancellationException) { … }
with no rethrow
Swallowed cancellation (§6)Prefer
catch (e: CancellationException) { throw e }
; use
ensureActive()
only when that matches the intent
runCatching { suspendCall() }.onFailure { … }
with no cancellation guard
Same shape as above (§6)Add
if (it is CancellationException) throw it
, or terminate with
.getOrThrow()
runBlocking { … }
inside suspend-capable app code
Thread-blocking bridge (§7)Make caller
suspend
; or use a lifecycle scope at the boundary
runBlocking { … }
in a test
Same — real-time bridging (§7)Use
runTest { … }
runBlocking { … }
inside a
ContentProvider.query
/
insert
/… member
Carve-out (§7)Acceptable; keep the body minimal
症状反模式修复方案
类包含
private val scope: CoroutineScope
被调用方存储作用域删除该属性。将公共API改为
suspend
存在
init { scope.launch { ... } }
构造时启动协程迁移到
suspend fun init()
/
login()
仓库/管理器/用例中存在
fun foo() { scope.launch { ... } }
非UI类中的“即发即弃”模式改为
suspend fun foo()
,让UI状态持有者选择作用域
状态持有者中存在
fun onClick() { viewModelScope.launch { ... } }
且由UI调用
UI ↔ 状态持有者边界——合法保持原样(参见第3节例外情况)
存在
private val scope = MainScope()
内部构造的存储作用域同上——删除该属性,将API改为
suspend
@SingleIn(AppScope) class X(scope) { init { scope.launch { … } } }
DI绑定的隐式启动(第5节)暴露
suspend fun run()
,由启动编排器发起
class Y : Initializer { override fun initialize() { scope.launch { … } } }
启动协程而非仅注册的初始化器(第5节)同上——改为
suspend fun run()
,由编排器管控生命周期
try { suspendCall() } catch (e: Exception|Throwable|CancellationException) { … }
未重新抛出
吞掉取消操作(第6节)优先使用
catch (e: CancellationException) { throw e }
;仅当符合意图时使用
ensureActive()
runCatching { suspendCall() }.onFailure { … }
未添加取消防护
同上(第6节)添加
if (it is CancellationException) throw it
,或使用
.getOrThrow()
终止
支持挂起的应用代码中存在
runBlocking { … }
线程阻塞桥接(第7节)将调用方改为
suspend
;或在边界层使用生命周期作用域
测试中存在
runBlocking { … }
同上——实时桥接(第7节)使用
runTest { … }
ContentProvider.query
/
insert
/…成员函数中存在
runBlocking { … }
例外情况(第7节)可接受;保持函数体最小化

Refactoring guidance

重构指南

Removing an existing offender:
  1. Start at the leaf. Pick the class farthest from any UI — usually a repository or data source. Its public surface should be the easiest to convert.
  2. Convert public functions to
    suspend
    one at a time. The compiler will surface every caller.
  3. At each caller, choose the scope deliberately:
    viewModelScope
    ,
    lifecycleScope
    ,
    coroutineScope { }
    , or an explicit job. This is the choice that was missing before.
  4. Delete the
    CoroutineScope
    constructor parameter
    once nothing uses it. Remove the injection binding.
Don't try to fix every class in one MR. Removing an anti-pattern is incremental work.
移除现有违规代码的步骤:
  1. 从叶子节点开始。选择离UI最远的类——通常是仓库或数据源。它的公共接口最容易转换。
  2. 逐个将公共函数改为
    suspend
    。编译器会提示所有调用方。
  3. 在每个调用方处,刻意选择作用域
    viewModelScope
    lifecycleScope
    coroutineScope { }
    或显式Job。这正是之前缺失的决策环节。
  4. 删除
    CoroutineScope
    构造函数参数
    ,当没有代码使用它时。移除注入绑定。
不要试图一次性修复所有类。移除反模式是增量式工作。

When NOT to apply

不适用场景

  • UI state holders absorbing UI events. A ViewModel/Component/feature model with
    fun onClick(...) { viewModelScope.launch { ... } }
    is correct — that's the boundary the framework needs. See §3 carve-out.
  • Lifecycle owners with explicit cancellation and error policy. Actors/services, app infrastructure, or application-scoped singletons may own a scope when they expose clear
    close
    /
    cancel
    /restart behavior or otherwise map directly to an application lifecycle. Inject
    Application.applicationScope
    explicitly rather than creating one ad-hoc. This is not permission to launch from
    init
    /
    initialize()
    — see §5.
  • Already-suspending APIs don't need any of this work.
  • Tests sometimes use
    TestScope
    as a deliberate ambient scope — that's a different pattern with explicit virtual-time control.
  • 处理UI事件的UI状态持有者。ViewModel/组件/功能模型中使用
    fun onClick(...) { viewModelScope.launch { ... } }
    是正确的——这是框架所需的边界层。参见第3节例外情况。
  • 具有显式取消和错误策略的生命周期所有者。Actor/服务、应用基础设施或应用级单例在暴露明确的
    close
    /
    cancel
    /重启行为,或直接映射到应用生命周期时,可以拥有作用域。显式注入
    Application.applicationScope
    而非临时创建。这并非允许从
    init
    /
    initialize()
    中启动协程
    ——参见第5节。
  • 已为挂起类型的API无需进行任何修改。
  • 测试代码有时会使用
    TestScope
    作为刻意的全局作用域——这是一种具有显式虚拟时间控制的不同模式。

Red flags during review

审查中的危险信号

These thoughts mean the anti-pattern is back:
ThoughtReality
"I'll just add a
CoroutineExceptionHandler
to the scope"
The problem isn't error handling. The problem is the scope shouldn't exist.
"I need to launch from
init
so the data's ready when consumers arrive"
Consumers reading state that isn't ready is the bug. Use phasing.
"The caller doesn't want to deal with
suspend
"
Then the caller chooses fire-and-forget at their scope. Don't decide for them.
"It's just a small fire-and-forget call"Silent cancellation makes every fire-and-forget a potential silent failure.
"We caught and logged the exception, so we're fine"Did the catch rethrow
CancellationException
? If no, the coroutine is silently un-cancelled. (§6)
"It's just one
runBlocking
, in a non-critical path"
Every
runBlocking
asserts the caller has no async option. If they do, it's the wrong primitive. (§7)
"Tests are simpler with
runBlocking
"
They run in real time, can't fast-forward
delay
, and lose
TestDispatcher
semantics. Use
runTest
. (§7)
以下想法意味着反模式再次出现:
想法实际情况
“我只需给作用域添加
CoroutineExceptionHandler
问题不在于错误处理,而在于该作用域本就不该存在。
“我需要从
init
中启动协程,以便消费者访问时数据已准备好”
消费者读取未准备好的状态才是bug。使用阶段化处理。
“调用方不想处理
suspend
那么调用方可在自己的作用域中选择“即发即弃”模式。不要替他们做决定。
“这只是一个小的‘即发即弃’调用”静默取消会让每个“即发即弃”调用都可能成为潜在的静默失败。
“我们捕获并记录了异常,所以没问题”捕获时是否重新抛出了
CancellationException
?如果没有,协程会被静默取消。(第6节)
“只是一个
runBlocking
,在非关键路径中”
每个
runBlocking
都表明调用方没有异步选项。如果有,那它就是错误的工具。(第7节)
“使用
runBlocking
让测试更简单”
测试会实时执行,无法快进
delay
,且丢失
TestDispatcher
语义。使用
runTest
。(第7节)

Related

相关内容

  • kotlin-flow-state-event-modeling
    StateFlow
    ,
    SharedFlow
    ,
    Channel
    ,
    stateIn
    , one-shot events, and related modeling.
  • kotlin-flow-state-event-modeling
    StateFlow
    SharedFlow
    Channel
    stateIn
    、一次性事件及相关建模。