distill

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Distillation guide

蒸馏指南

This guide covers extracting Allium specifications from existing codebases. The core challenge is the same as forward elicitation: finding the right level of abstraction. In elicitation you filter out implementation ideas as they arise. In distillation you filter out implementation details that already exist. Both require the same judgement about what matters at the domain level.
Code tells you how something works. A specification captures what it does and why it matters. The skill is asking "why does the stakeholder care about this?" and "could this be different while still being the same system?"
本指南介绍如何从现有代码库中提取Allium规范。核心挑战与正向启发相同:找到合适的抽象层次。在启发过程中,你需要过滤掉出现的实现思路;在蒸馏过程中,你需要过滤掉已存在的实现细节。两者都需要判断领域层面的关键内容。
代码告诉你如何实现功能,而规范捕捉的是系统做什么以及为什么重要。关键在于问自己:“利益相关者为什么关心这个?”以及“如果实现方式不同,系统本质是否依然不变?”

Scoping the distillation effort

界定蒸馏范围

Before diving into code, establish what you are trying to specify. Not every line of code deserves a place in the spec.
在深入代码之前,先明确你要规范的内容。并非每一行代码都值得纳入规范。

Questions to ask first

首先要问的问题

  1. "What subset of this codebase are we specifying?" Mono repos often contain multiple distinct systems. You may only need a spec for one service or domain. Clarify boundaries explicitly before starting.
  2. "Is there code we should deliberately exclude?"
    • Legacy code: features kept for backwards compatibility but not part of the core system
    • Incidental code: supporting infrastructure that is not domain-level (logging, metrics, deployment)
    • Deprecated paths: code scheduled for removal
    • Experimental features: behind feature flags, not yet design decisions
  3. "Who owns this spec?" Different teams may own different parts of a mono repo. Each team's spec should focus on their domain.
  1. “我们要规范代码库的哪一部分?” 单体仓库通常包含多个独立系统。你可能只需要为某个服务或领域生成规范。开始前请明确界定边界。
  2. “是否有需要刻意排除的代码?”
    • 遗留代码:为向后兼容保留,但不属于核心系统的功能
    • 附带代码:非领域层面的支撑基础设施(日志、指标、部署相关)
    • 废弃路径:计划移除的代码
    • 实验性功能:处于功能 flag 之后,尚未成为最终设计决策的功能
  3. “谁拥有这个规范?” 不同团队可能负责单体仓库的不同部分。每个团队的规范应聚焦于各自的领域。

The "Would we rebuild this?" test

“我们会重构这部分吗?”测试

For any code path you encounter, ask: "If we rebuilt this system from scratch, would this be in the requirements?"
  • Yes: include in spec
  • No, it is legacy: exclude
  • No, it is infrastructure: exclude
  • No, it is a workaround: exclude (but note the underlying need it addresses)
对于遇到的任何代码路径,问自己:“如果从零开始重构这个系统,这部分会包含在需求里吗?”
  • 是:纳入规范
  • 否,属于遗留代码:排除
  • 否,属于基础设施:排除
  • 否,属于临时解决方案:排除(但需记录它解决的底层需求)

Documenting scope decisions

记录范围决策

At the top of a distilled spec, document what is included and excluded:
-- allium: 3
-- interview-scheduling.allium

-- Scope: Interview scheduling flow only
-- Includes: Candidacy, Interview, InterviewSlot, Invitation, Feedback
-- Excludes:
--   - User authentication (use auth library spec)
--   - Analytics/reporting (separate spec)
--   - Legacy V1 API (deprecated, not specified)
--   - Greenhouse sync (use greenhouse library spec)
The version marker (
-- allium: N
) must be the first line of every
.allium
file. Use the current language version number.
在提炼后的规范顶部,记录纳入和排除的内容:
-- allium: 3
-- interview-scheduling.allium

-- Scope: Interview scheduling flow only
-- Includes: Candidacy, Interview, InterviewSlot, Invitation, Feedback
-- Excludes:
--   - User authentication (use auth library spec)
--   - Analytics/reporting (separate spec)
--   - Legacy V1 API (deprecated, not specified)
--   - Greenhouse sync (use greenhouse library spec)
版本标记(
-- allium: N
)必须是每个
.allium
文件的第一行,请使用当前语言版本号。

Finding the right level of abstraction

找到合适的抽象层次

Distillation and elicitation share the same fundamental challenge: choosing what to include. The tests below work in both directions, whether you are hearing a stakeholder describe a feature or reading code that implements it.
蒸馏与启发面临相同的核心挑战:选择纳入内容。以下测试适用于两种场景,无论是听取利益相关者描述功能,还是阅读实现代码。

The "Why" test

“为什么”测试

For every detail in the code, ask: "Why does the stakeholder care about this?"
Code detailWhy?Include?
Invitation expires in 7 daysAffects candidate experienceYes
Token is 32 bytes URL-safeSecurity implementationNo
Sessions stored in RedisPerformance choiceNo
Uses PostgreSQL JSONBDatabase implementationNo
Slot status changes to 'proposed'Affects what candidate seesYes
Email sent when invitation acceptedCommunication requirementYes
If you cannot articulate why a stakeholder would care, it is probably implementation.
对于代码中的每个细节,问:“利益相关者为什么关心这个?”
代码细节原因是否纳入?
邀请7天后过期影响候选人体验
Token为32字节URL安全格式安全实现细节
会话存储在Redis中性能选择
使用PostgreSQL JSONB数据库实现细节
时段状态变为“提议中”影响候选人看到的内容
邀请被接受时发送邮件沟通需求
如果你无法说明利益相关者关心的原因,这部分很可能是实现细节。

The "Could it be different?" test

“能否替换”测试

Ask: "Could this be implemented differently while still being the same system?"
  • If yes: probably implementation detail, abstract it away
  • If no: probably domain-level, include it
DetailCould be different?Include?
secrets.token_urlsafe(32)
Yes, any secure token generationNo
7-day invitation expiryNo, this is the design decisionYes
PostgreSQL databaseYes, any databaseNo
"Pending, Confirmed, Completed" statesNo, this is the workflowYes
问:“这部分是否可以用不同方式实现,但系统本质依然不变?”
  • 是:可能是实现细节,需抽象掉
  • 否:可能是领域层面内容,需纳入
细节能否替换?是否纳入?
secrets.token_urlsafe(32)
是,任何安全令牌生成方式均可
邀请7天过期否,这是设计决策
PostgreSQL数据库是,任何数据库均可
“待处理、已确认、已完成”状态否,这是工作流核心

The "Template vs Instance" test

“模板 vs 实例”测试

Is this a category of thing, or a specific instance?
Instance (often implementation)Template (often domain-level)
Google OAuthAuthentication provider
Slack webhookNotification channel
SendGrid APIEmail delivery
timedelta(hours=3)
Confirmation deadline
Sometimes the instance IS the domain concern. See "The concrete detail problem" below.
这是类别事物,还是具体实例
实例(通常为实现细节)模板(通常为领域层面内容)
Google OAuth身份验证提供商
Slack webhook通知渠道
SendGrid API邮件投递服务
timedelta(hours=3)
确认截止期限
有时实例本身就是领域关注点,请参考下文“具体细节问题”。

The distillation mindset

蒸馏思维

Code is over-specified

代码过度规范

Every line of code makes decisions that might not matter at the domain level:
python
undefined
每一行代码都做出了可能在领域层面无关的决策:
python
undefined

Code tells you:

Code tells you:

def send_invitation(candidate_id: int, slot_ids: List[int]) -> Invitation: candidate = db.session.query(Candidate).get(candidate_id) slots = db.session.query(InterviewSlot).filter( InterviewSlot.id.in_(slot_ids), InterviewSlot.status == 'confirmed' ).all()
invitation = Invitation(
    candidate_id=candidate_id,
    token=secrets.token_urlsafe(32),
    expires_at=datetime.utcnow() + timedelta(days=7),
    status='pending'
)
db.session.add(invitation)

for slot in slots:
    slot.status = 'proposed'
    invitation.slots.append(slot)

db.session.commit()

send_email(
    to=candidate.email,
    template='interview_invitation',
    context={'invitation': invitation, 'slots': slots}
)

return invitation
undefined
-- Specification should say: rule SendInvitation { when: SendInvitation(candidacy, slots)
requires: slots.all(s => s.status = confirmed)

ensures:
    for s in slots:
        s.status = proposed
ensures: Invitation.created(
    candidacy: candidacy,
    slots: slots,
    expires_at: now + 7.days,
    status: pending
)
ensures: Email.created(
    to: candidacy.candidate.email,
    template: interview_invitation
)
}

What we dropped:
- `candidate_id: int` became just `candidacy`
- `db.session.query(...)` became relationship traversal
- `secrets.token_urlsafe(32)` removed entirely (token is implementation)
- `datetime.utcnow() + timedelta(...)` became `now + 7.days`
- `db.session.add/commit` implied by `created`
- `invitation.slots.append(slot)` implied by relationship
def send_invitation(candidate_id: int, slot_ids: List[int]) -> Invitation: candidate = db.session.query(Candidate).get(candidate_id) slots = db.session.query(InterviewSlot).filter( InterviewSlot.id.in_(slot_ids), InterviewSlot.status == 'confirmed' ).all()
invitation = Invitation(
    candidate_id=candidate_id,
    token=secrets.token_urlsafe(32),
    expires_at=datetime.utcnow() + timedelta(days=7),
    status='pending'
)
db.session.add(invitation)

for slot in slots:
    slot.status = 'proposed'
    invitation.slots.append(slot)

db.session.commit()

send_email(
    to=candidate.email,
    template='interview_invitation',
    context={'invitation': invitation, 'slots': slots}
)

return invitation
undefined
-- Specification should say: rule SendInvitation { when: SendInvitation(candidacy, slots)
requires: slots.all(s => s.status = confirmed)

ensures:
    for s in slots:
        s.status = proposed
ensures: Invitation.created(
    candidacy: candidacy,
    slots: slots,
    expires_at: now + 7.days,
    status: pending
)
ensures: Email.created(
    to: candidacy.candidate.email,
    template: interview_invitation
)
}

我们省略的内容:
- `candidate_id: int` 简化为 `candidacy`
- `db.session.query(...)` 转换为关系遍历
- `secrets.token_urlsafe(32)` 完全移除(令牌是实现细节)
- `datetime.utcnow() + timedelta(...)` 简化为 `now + 7.days`
- `db.session.add/commit` 隐含在 `created` 中
- `invitation.slots.append(slot)` 隐含在关系定义中

Ask "Would a product owner care?"

问“产品负责人会关心吗?”

For every detail in the code, ask:
Code detailProduct owner cares?Include?
Invitation expires in 7 daysYes, affects candidate experienceYes
Token is 32 bytes URL-safeNo, security implementationNo
Uses SQLAlchemy ORMNo, persistence mechanismNo
Email template nameMaybe, if templates are design decisionsMaybe
Slot status changes to 'proposed'Yes, affects what candidate seesYes
Database transaction commitsNo, implementation detailNo
对于代码中的每个细节,问:
代码细节产品负责人关心?是否纳入?
邀请7天后过期是,影响候选人体验
Token为32字节URL安全格式否,安全实现细节
使用SQLAlchemy ORM否,持久化机制
邮件模板名称可能,如果模板是设计决策可能
时段状态变为“提议中”是,影响候选人看到的内容
数据库事务提交否,实现细节

Distinguish means from ends

区分手段与目标

Means: how the code achieves something. Ends: what outcome the system needs.
Means (code)Ends (spec)
requests.post('https://slack.com/api/...')
Notification.created(channel: slack)
candidate.oauth_token = google.exchange(code)
Candidate authenticated
redis.setex(f'session:{id}', 86400, data)
Session.created(expires: 24.hours)
for slot in slots: slot.status = 'cancelled'
for s in slots: s.status = cancelled
**手段:**代码实现功能的方式。 **目标:**系统需要达成的结果。
手段(代码)目标(规范)
requests.post('https://slack.com/api/...')
Notification.created(channel: slack)
candidate.oauth_token = google.exchange(code)
Candidate authenticated
redis.setex(f'session:{id}', 86400, data)
Session.created(expires: 24.hours)
for slot in slots: slot.status = 'cancelled'
for s in slots: s.status = cancelled

The concrete detail problem

具体细节问题

The hardest judgement call: when is a concrete detail part of the domain vs just implementation?
最难的判断:具体细节何时属于领域内容,何时只是实现细节?

Google OAuth example

Google OAuth示例

You find this code:
python
OAUTH_PROVIDERS = {
    'google': GoogleOAuthProvider(client_id=..., client_secret=...),
}

def authenticate(provider: str, code: str) -> User:
    return OAUTH_PROVIDERS[provider].authenticate(code)
Question: Is "Google OAuth" domain-level or implementation?
It is implementation if:
  • Google is just the auth mechanism chosen
  • It could be replaced with any OAuth provider
  • Users do not see or care which provider
  • The code is written generically (provider is a parameter)
It is domain-level if:
  • Users explicitly choose Google (vs Microsoft, etc.)
  • "Sign in with Google" is a feature
  • Google-specific scopes or permissions are used
  • Multiple providers are supported as a feature
How to tell: Look at the UI and user flows. If users see "Sign in with Google" as a choice, it is domain-level. If they just see "Sign in" and Google happens to be behind it, it is implementation.
你发现这段代码:
python
OAUTH_PROVIDERS = {
    'google': GoogleOAuthProvider(client_id=..., client_secret=...),
}

def authenticate(provider: str, code: str) -> User:
    return OAUTH_PROVIDERS[provider].authenticate(code)
问题:“Google OAuth”是领域层面内容还是实现细节?
属于实现细节的情况:
  • Google只是选择的认证机制
  • 可以替换为任何OAuth提供商
  • 用户看不到也不关心使用哪个提供商
  • 代码是通用实现(提供商是参数)
属于领域层面的情况:
  • 用户明确选择Google(而非微软等)
  • “使用Google登录”是一项功能
  • 使用了Google特定的权限范围
  • 支持多个提供商作为一项功能
**判断方法:**查看UI和用户流程。如果用户看到“使用Google登录”作为选项,那么它是领域层面内容;如果用户只看到“登录”,而Google只是背后的实现,那么它是实现细节。

Database choice example

数据库选择示例

You find PostgreSQL-specific code:
python
from sqlalchemy.dialects.postgresql import JSONB, ARRAY

class Candidate(Base):
    skills = Column(ARRAY(String))
    metadata = Column(JSONB)
Almost always implementation. The spec should say:
entity Candidate {
    skills: Set<String>
    metadata: String?              -- or model specific fields
}
The specific database is rarely domain-level. Exception: if the system explicitly promises PostgreSQL compatibility or specific PostgreSQL features to users.
你发现PostgreSQL特定代码:
python
from sqlalchemy.dialects.postgresql import JSONB, ARRAY

class Candidate(Base):
    skills = Column(ARRAY(String))
    metadata = Column(JSONB)
几乎总是实现细节。规范应写成:
entity Candidate {
    skills: Set<String>
    metadata: String?              -- 或建模为特定字段
}
特定数据库很少是领域层面内容,除非系统明确向用户承诺PostgreSQL兼容性或特定功能。

Third-party integration example

第三方集成示例

You find Greenhouse ATS integration:
python
class GreenhouseSync:
    def import_candidate(self, greenhouse_id: str) -> Candidate:
        data = self.client.get_candidate(greenhouse_id)
        return Candidate(
            name=data['name'],
            email=data['email'],
            greenhouse_id=greenhouse_id,
            source='greenhouse'
        )
Could be either:
Implementation if:
  • Greenhouse is just where candidates happen to come from
  • Could be swapped for Lever, Workable, etc.
  • The integration is an implementation detail of "candidates are imported"
Spec:
external entity Candidate {
    name: String
    email: String
    source: CandidateSource
}
Product-level if:
  • "Greenhouse integration" is a selling point
  • Users configure their Greenhouse connection
  • Greenhouse-specific features are exposed (like syncing feedback back)
Spec:
external entity Candidate {
    name: String
    email: String
    greenhouse_id: String?  -- explicitly modeled
}

rule SyncFromGreenhouse {
    when: GreenhouseWebhookReceived(candidate_data)
    ensures: Candidate.created(
        ...
        greenhouse_id: candidate_data.id
    )
}
你发现Greenhouse ATS集成代码:
python
class GreenhouseSync:
    def import_candidate(self, greenhouse_id: str) -> Candidate:
        data = self.client.get_candidate(greenhouse_id)
        return Candidate(
            name=data['name'],
            email=data['email'],
            greenhouse_id=greenhouse_id,
            source='greenhouse'
        )
两种可能性:
属于实现细节的情况:
  • Greenhouse只是候选人的来源
  • 可以替换为Lever、Workable等其他系统
  • 集成只是“导入候选人”的实现细节
规范:
external entity Candidate {
    name: String
    email: String
    source: CandidateSource
}
属于产品层面的情况:
  • “Greenhouse集成”是卖点
  • 用户配置自己的Greenhouse连接
  • 暴露Greenhouse特定功能(如同步反馈)
规范:
external entity Candidate {
    name: String
    email: String
    greenhouse_id: String?  -- 显式建模
}

rule SyncFromGreenhouse {
    when: GreenhouseWebhookReceived(candidate_data)
    ensures: Candidate.created(
        ...
        greenhouse_id: candidate_data.id
    )
}

The "Multiple implementations" heuristic

“多实现”启发法

Look for variation in the codebase:
  • If there is only one OAuth provider, probably implementation
  • If there are multiple OAuth providers, probably domain-level
  • If there is only one notification channel, probably implementation
  • If there are Slack AND email AND SMS, probably domain-level
The presence of multiple implementations suggests the variation itself is a domain concern.
查看代码库中的变体:
  • 如果只有一个OAuth提供商,可能是实现细节
  • 如果有多个OAuth提供商,可能是领域层面内容
  • 如果只有一个通知渠道,可能是实现细节
  • 如果同时支持Slack、邮件和SMS,可能是领域层面内容
多实现的存在表明,这种变体本身就是领域关注点。

Distillation process

蒸馏流程

Step 1: Map the territory

步骤1:绘制领域地图

Before extracting any specification, understand the codebase structure:
  1. Identify entry points. API routes, CLI commands, message handlers, scheduled jobs.
  2. Find the domain models. Usually in
    models/
    ,
    entities/
    ,
    domain/
    .
  3. Locate business logic. Services, use cases, handlers.
  4. Note external integrations. What third parties does it talk to?
Create a rough map:
Entry points:
  - API: /api/candidates/*, /api/interviews/*, /api/invitations/*
  - Webhooks: /webhooks/greenhouse, /webhooks/calendar
  - Jobs: send_reminders, expire_invitations, sync_calendars

Models:
  - Candidate, Interview, InterviewSlot, Invitation, Feedback

Services:
  - SchedulingService, NotificationService, CalendarService

Integrations:
  - Google Calendar, Slack, Greenhouse, SendGrid
在提取规范之前,先理解代码库结构:
  1. 识别入口点:API路由、CLI命令、消息处理器、定时任务。
  2. 找到领域模型:通常在
    models/
    entities/
    domain/
    目录下。
  3. 定位业务逻辑:服务、用例、处理器。
  4. 记录外部集成:与哪些第三方系统交互?
创建粗略地图:
Entry points:
  - API: /api/candidates/*, /api/interviews/*, /api/invitations/*
  - Webhooks: /webhooks/greenhouse, /webhooks/calendar
  - Jobs: send_reminders, expire_invitations, sync_calendars

Models:
  - Candidate, Interview, InterviewSlot, Invitation, Feedback

Services:
  - SchedulingService, NotificationService, CalendarService

Integrations:
  - Google Calendar, Slack, Greenhouse, SendGrid

Step 2: Extract entity states

步骤2:提取实体状态

Look at enum fields and status columns:
python
class Invitation(Base):
    status = Column(Enum('pending', 'accepted', 'declined', 'expired'))
Becomes:
entity Invitation {
    status: pending | accepted | declined | expired
}
Look for enum definitions, status or state columns, constants like
STATUS_PENDING = 'pending'
, and state machine libraries (e.g.
transitions
,
django-fsm
).
查看枚举字段和状态列:
python
class Invitation(Base):
    status = Column(Enum('pending', 'accepted', 'declined', 'expired'))
转换为:
entity Invitation {
    status: pending | accepted | declined | expired
}
查找枚举定义、状态列、
STATUS_PENDING = 'pending'
这类常量,以及状态机库(如
transitions
django-fsm
)。

Step 3: Extract transitions

步骤3:提取状态转换

Find where status changes happen:
python
def accept_invitation(invitation_id: int, slot_id: int):
    invitation = get_invitation(invitation_id)

    if invitation.status != 'pending':
        raise InvalidStateError()
    if invitation.expires_at < datetime.utcnow():
        raise ExpiredError()

    slot = get_slot(slot_id)
    if slot not in invitation.slots:
        raise InvalidSlotError()

    invitation.status = 'accepted'
    slot.status = 'booked'

    # Release other slots
    for other_slot in invitation.slots:
        if other_slot.id != slot_id:
            other_slot.status = 'available'

    # Create the interview
    interview = Interview(
        candidate_id=invitation.candidate_id,
        slot_id=slot_id,
        status='scheduled'
    )

    notify_interviewers(interview)
    send_confirmation_email(invitation.candidate, interview)
Extract:
rule CandidateAcceptsInvitation {
    when: CandidateAccepts(invitation, slot)

    requires: invitation.status = pending
    requires: invitation.expires_at > now
    requires: slot in invitation.slots

    ensures: invitation.status = accepted
    ensures: slot.status = booked
    ensures:
        for s in invitation.slots:
            if s != slot: s.status = available
    ensures: Interview.created(
        candidacy: invitation.candidacy,
        slot: slot,
        status: scheduled
    )
    ensures: Notification.created(to: slot.interviewers, ...)
    ensures: Email.created(to: invitation.candidate.email, ...)
}
Key extraction patterns:
Code patternSpec pattern
if x.status != 'pending': raise
requires: x.status = pending
if x.expires_at < now: raise
requires: x.expires_at > now
if item not in collection: raise
requires: item in collection
x.status = 'accepted'
ensures: x.status = accepted
Model.create(...)
ensures: Model.created(...)
send_email(...)
ensures: Email.created(...)
notify(...)
ensures: Notification.created(...)
Assertions, checks and validations found in code (e.g.
assert balance >= 0
, class-level validators) may map to expression-bearing invariants rather than rule preconditions. Consider whether they describe a system-wide property or a rule-specific guard.
找到状态变化的位置:
python
def accept_invitation(invitation_id: int, slot_id: int):
    invitation = get_invitation(invitation_id)

    if invitation.status != 'pending':
        raise InvalidStateError()
    if invitation.expires_at < datetime.utcnow():
        raise ExpiredError()

    slot = get_slot(slot_id)
    if slot not in invitation.slots:
        raise InvalidSlotError()

    invitation.status = 'accepted'
    slot.status = 'booked'

    # Release other slots
    for other_slot in invitation.slots:
        if other_slot.id != slot_id:
            other_slot.status = 'available'

    # Create the interview
    interview = Interview(
        candidate_id=invitation.candidate_id,
        slot_id=slot_id,
        status='scheduled'
    )

    notify_interviewers(interview)
    send_confirmation_email(invitation.candidate, interview)
提取为:
rule CandidateAcceptsInvitation {
    when: CandidateAccepts(invitation, slot)

    requires: invitation.status = pending
    requires: invitation.expires_at > now
    requires: slot in invitation.slots

    ensures: invitation.status = accepted
    ensures: slot.status = booked
    ensures:
        for s in invitation.slots:
            if s != slot: s.status = available
    ensures: Interview.created(
        candidacy: invitation.candidacy,
        slot: slot,
        status: scheduled
    )
    ensures: Notification.created(to: slot.interviewers, ...)
    ensures: Email.created(to: invitation.candidate.email, ...)
}
核心提取模式:
代码模式规范模式
if x.status != 'pending': raise
requires: x.status = pending
if x.expires_at < now: raise
requires: x.expires_at > now
if item not in collection: raise
requires: item in collection
x.status = 'accepted'
ensures: x.status = accepted
Model.create(...)
ensures: Model.created(...)
send_email(...)
ensures: Email.created(...)
notify(...)
ensures: Notification.created(...)
代码中的断言、检查和验证(如
assert balance >= 0
、类级验证器)可能对应表达式形式的不变量,而非规则前置条件。需判断它们描述的是系统全局属性还是特定规则的守卫条件。

Step 4: Find temporal triggers

步骤4:识别时间触发逻辑

Look for scheduled jobs and time-based logic:
python
undefined
查找定时任务和基于时间的逻辑:
python
undefined

In celery tasks or cron jobs

In celery tasks or cron jobs

@app.task def expire_invitations(): expired = Invitation.query.filter( Invitation.status == 'pending', Invitation.expires_at < datetime.utcnow() ).all()
for invitation in expired:
    invitation.status = 'expired'
    for slot in invitation.slots:
        slot.status = 'available'
    notify_candidate_expired(invitation)
@app.task def send_reminders(): upcoming = Interview.query.filter( Interview.status == 'scheduled', Interview.slot.time.between( datetime.utcnow() + timedelta(hours=1), datetime.utcnow() + timedelta(hours=2) ) ).all()
for interview in upcoming:
    send_reminder_notification(interview)

Extract:
rule InvitationExpires { when: invitation: Invitation.expires_at <= now requires: invitation.status = pending
ensures: invitation.status = expired
ensures:
    for s in invitation.slots:
        s.status = available
ensures: CandidateInformed(candidate: invitation.candidate, about: invitation_expired)
}
rule InterviewReminder { when: interview: Interview.slot.time - 1.hour <= now requires: interview.status = scheduled
ensures: Notification.created(to: interview.interviewers, template: reminder)
}
undefined
@app.task def expire_invitations(): expired = Invitation.query.filter( Invitation.status == 'pending', Invitation.expires_at < datetime.utcnow() ).all()
for invitation in expired:
    invitation.status = 'expired'
    for slot in invitation.slots:
        slot.status = 'available'
    notify_candidate_expired(invitation)
@app.task def send_reminders(): upcoming = Interview.query.filter( Interview.status == 'scheduled', Interview.slot.time.between( datetime.utcnow() + timedelta(hours=1), datetime.utcnow() + timedelta(hours=2) ) ).all()
for interview in upcoming:
    send_reminder_notification(interview)

提取为:
rule InvitationExpires { when: invitation: Invitation.expires_at <= now requires: invitation.status = pending
ensures: invitation.status = expired
ensures:
    for s in invitation.slots:
        s.status = available
ensures: CandidateInformed(candidate: invitation.candidate, about: invitation_expired)
}
rule InterviewReminder { when: interview: Interview.slot.time - 1.hour <= now requires: interview.status = scheduled
ensures: Notification.created(to: interview.interviewers, template: reminder)
}
undefined

Step 5: Identify external boundaries

步骤5:识别外部边界

Look for third-party API calls, webhook handlers, import/export functions, and data that is read but never written (or vice versa).
These often indicate external entities:
python
undefined
查找第三方API调用、Webhook处理器、导入/导出功能,以及只读取不写入(或反之)的数据。
这些通常表示外部实体:
python
undefined

Candidate data comes from Greenhouse, we don't create it

Candidate data comes from Greenhouse, we don't create it

def import_from_greenhouse(webhook_data): candidate = Candidate.query.filter_by( greenhouse_id=webhook_data['id'] ).first()
if not candidate:
    candidate = Candidate(greenhouse_id=webhook_data['id'])

candidate.name = webhook_data['name']
candidate.email = webhook_data['email']

Suggests:
external entity Candidate { name: String email: String }

When repeated interface patterns appear across service boundaries (e.g. the same serialisation contract expected by multiple consumers), these suggest `contract` declarations for reuse rather than duplicated inline obligation blocks.
def import_from_greenhouse(webhook_data): candidate = Candidate.query.filter_by( greenhouse_id=webhook_data['id'] ).first()
if not candidate:
    candidate = Candidate(greenhouse_id=webhook_data['id'])

candidate.name = webhook_data['name']
candidate.email = webhook_data['email']

对应规范:
external entity Candidate { name: String email: String }

当服务边界出现重复接口模式(如多个消费者期望相同的序列化契约)时,建议使用`contract`声明复用,而非重复内联义务块。

Step 6: Abstract away implementation

步骤6:抽象实现细节

Now make a pass through your extracted spec and remove implementation details.
Before (too concrete):
entity Invitation {
    candidate_id: Integer
    token: String(32)
    created_at: DateTime
    expires_at: DateTime
    status: pending | accepted | declined | expired
}
After (domain-level):
entity Invitation {
    candidacy: Candidacy
    created_at: Timestamp
    expires_at: Timestamp
    status: pending | accepted | declined | expired

    is_expired: expires_at <= now
}
Changes:
  • candidate_id: Integer
    became
    candidacy: Candidacy
    (relationship, not FK)
  • token: String(32)
    removed (implementation)
  • DateTime
    became
    Timestamp
    (domain type)
  • Added derived
    is_expired
    for clarity
Config values that derive from other config values (e.g.
extended_timeout = base_timeout * 2
) should use qualified references or expression-form defaults in the config block rather than independent literal values.
现在遍历提取的规范,移除实现细节。
抽象前(过于具体):
entity Invitation {
    candidate_id: Integer
    token: String(32)
    created_at: DateTime
    expires_at: DateTime
    status: pending | accepted | declined | expired
}
抽象后(领域层面):
entity Invitation {
    candidacy: Candidacy
    created_at: Timestamp
    expires_at: Timestamp
    status: pending | accepted | declined | expired

    is_expired: expires_at <= now
}
变更点:
  • candidate_id: Integer
    改为
    candidacy: Candidacy
    (关系,而非外键)
  • token: String(32)
    移除(实现细节)
  • DateTime
    改为
    Timestamp
    (领域类型)
  • 添加派生属性
    is_expired
    以提升可读性
从其他配置值派生的配置(如
extended_timeout = base_timeout * 2
)应在配置块中使用限定引用或表达式默认值,而非独立字面量。

Step 7: Validate with stakeholders

步骤7:与利益相关者验证

The extracted spec is a hypothesis. Validate it:
  1. Show the spec to the original developers. "Is this what the system does?"
  2. Show to stakeholders. "Is this what the system should do?"
  3. Look for gaps. Code often has bugs or missing features; the spec might reveal them.
Common findings:
  • "Oh, that retry logic was a hack, we should remove it"
  • "Actually we wanted X but never built it"
  • "These two code paths should be the same but aren't"
提取的规范是一个假设,需要验证:
  1. 展示给原开发者:“这是系统的实际功能吗?”
  2. 展示给利益相关者:“这是系统应该实现的功能吗?”
  3. 查找差距:代码通常存在bug或缺失功能,规范可能会暴露这些问题。
常见发现:
  • “哦,那个重试逻辑是临时方案,我们应该移除它”
  • “实际上我们想要实现X,但从未完成”
  • “这两个代码路径应该一致,但实际并非如此”

Recognising library spec candidates

识别库规范候选

During distillation, stay alert for code that implements generic integration patterns rather than application-specific logic. These belong in library specs, not your main specification.
The same principle applies in elicitation. When a stakeholder describes "we use Google for login" or "payments go through Stripe", pause and consider whether this is a library spec.
在蒸馏过程中,留意实现通用集成模式而非应用特定逻辑的代码。这些代码应属于库规范,而非主规范。
这一原则同样适用于启发过程。当利益相关者描述“我们使用Google登录”或“通过Stripe处理支付”时,暂停并考虑这是否属于库规范。

Signals in the code

代码中的信号

Third-party integration modules:
python
undefined
第三方集成模块:
python
undefined

Finding code like this suggests a library spec

Finding code like this suggests a library spec

class StripeWebhookHandler: def handle_invoice_paid(self, event): ... def handle_subscription_cancelled(self, event): ...
class GoogleOAuthProvider: def exchange_code(self, code): ... def refresh_token(self, refresh_token): ...

**Generic patterns with specific providers:**
- OAuth flows (Google, Microsoft, GitHub)
- Payment processing (Stripe, PayPal)
- Email delivery (SendGrid, Postmark, SES)
- Calendar sync (Google Calendar, Outlook)
- ATS integrations (Greenhouse, Lever)
- File storage (S3, GCS)

**Configuration-driven integrations:**
```python
class StripeWebhookHandler: def handle_invoice_paid(self, event): ... def handle_subscription_cancelled(self, event): ...
class GoogleOAuthProvider: def exchange_code(self, code): ... def refresh_token(self, refresh_token): ...

**带有特定提供商的通用模式:**
- OAuth流程(Google、微软、GitHub)
- 支付处理(Stripe、PayPal)
- 邮件投递(SendGrid、Postmark、SES)
- 日历同步(Google Calendar、Outlook)
- ATS集成(Greenhouse、Lever)
- 文件存储(S3、GCS)

**配置驱动的集成:**
```python

Heavy configuration suggests the integration itself is separable

Heavy configuration suggests the integration itself is separable

OAUTH_CONFIG = { 'google': {'client_id': ..., 'scopes': ...}, 'microsoft': {'client_id': ..., 'scopes': ...}, }
undefined
OAUTH_CONFIG = { 'google': {'client_id': ..., 'scopes': ...}, 'microsoft': {'client_id': ..., 'scopes': ...}, }
undefined

Questions to ask

需要问的问题

  1. "Is this integration logic, or application logic?" Integration: how to talk to Stripe. Application: what to do when payment succeeds.
  2. "Would another application integrate the same way?" If yes, library spec candidate. If no, probably application-specific.
  3. "Does the code separate integration from application concerns?" If cleanly separated, easy to extract to library spec. If tangled, might need refactoring first (but the spec should still separate them).
  1. “这是集成逻辑,还是应用逻辑?” 集成逻辑:如何与Stripe交互。 应用逻辑:支付成功后要做什么。
  2. “其他应用会以相同方式集成吗?” 是:库规范候选。否:可能是应用特定逻辑。
  3. “代码是否将集成与应用关注点分离?” 如果分离清晰,容易提取为库规范;如果耦合紧密,可能需要先重构(但规范仍应分离两者)。

How to handle

处理方式

Option 1: Reference an existing library spec
If a standard library spec exists for this integration:
use "github.com/allium-specs/stripe-billing/abc123" as stripe

-- Application responds to Stripe events
rule ActivateSubscription {
    when: stripe/PaymentSucceeded(invoice)
    ...
}
Option 2: Create a separate library spec
If no standard spec exists but the integration is generic:
-- greenhouse-ats.allium (library spec)
-- Specifies: Greenhouse webhook events, candidate sync, etc.

-- interview-scheduling.allium (application spec)
use "./greenhouse-ats.allium" as greenhouse

rule ImportCandidate {
    when: greenhouse/CandidateCreated(data)
    ensures: Candidacy.created(...)
}
Option 3: Abstract and move on
If the integration is minor, just abstract it:
-- Don't specify Slack details, just:
ensures: Notification.created(
    to: interviewers,
    channel: slack
)
选项1:引用现有库规范
如果该集成已有标准库规范:
use "github.com/allium-specs/stripe-billing/abc123" as stripe

-- Application responds to Stripe events
rule ActivateSubscription {
    when: stripe/PaymentSucceeded(invoice)
    ...
}
选项2:创建独立库规范
如果没有标准规范但集成具有通用性:
-- greenhouse-ats.allium (library spec)
-- Specifies: Greenhouse webhook events, candidate sync, etc.

-- interview-scheduling.allium (application spec)
use "./greenhouse-ats.allium" as greenhouse

rule ImportCandidate {
    when: greenhouse/CandidateCreated(data)
    ensures: Candidacy.created(...)
}
选项3:抽象后继续
如果集成影响较小,直接抽象:
-- Don't specify Slack details, just:
ensures: Notification.created(
    to: interviewers,
    channel: slack
)

Red flags: integration logic in your spec

警示:规范中包含集成逻辑

If you find yourself writing spec like this, stop and reconsider:
-- TOO DETAILED - this is Stripe's domain, not yours
rule ProcessStripeWebhook {
    when: WebhookReceived(payload, signature)

    requires: verify_stripe_signature(payload, signature)

    let event = parse_stripe_event(payload)

    if event.type = "invoice.paid":
        ...
}
Instead:
-- Application responds to payment events (integration handled elsewhere)
rule PaymentReceived {
    when: stripe/InvoicePaid(invoice)
    ...
}
如果你写出这样的规范,请停止并重新考虑:
-- TOO DETAILED - this is Stripe's domain, not yours
rule ProcessStripeWebhook {
    when: WebhookReceived(payload, signature)

    requires: verify_stripe_signature(payload, signature)

    let event = parse_stripe_event(payload)

    if event.type = "invoice.paid":
        ...
}
应改为:
-- Application responds to payment events (integration handled elsewhere)
rule PaymentReceived {
    when: stripe/InvoicePaid(invoice)
    ...
}

Common library spec extractions

常见库规范提取场景

Code pattern foundLibrary spec candidate
OAuth token exchange, refresh, session management
oauth2.allium
Stripe webhook handling, subscription lifecycle
stripe-billing.allium
Email sending with templates, bounce handling
email-delivery.allium
Calendar event sync, availability checking
calendar-integration.allium
ATS candidate import, status sync
greenhouse-ats.allium
,
lever-ats.allium
File upload, virus scanning, thumbnail generation
file-storage.allium
See patterns.md Pattern 8 for detailed examples of integrating library specs.
发现的代码模式库规范候选
OAuth令牌交换、刷新、会话管理
oauth2.allium
Stripe Webhook处理、订阅生命周期
stripe-billing.allium
带模板的邮件发送、退信处理
email-delivery.allium
日历事件同步、可用性检查
calendar-integration.allium
ATS候选人导入、状态同步
greenhouse-ats.allium
,
lever-ats.allium
文件上传、病毒扫描、缩略图生成
file-storage.allium
查看patterns.md中的模式8,获取集成库规范的详细示例。

Common distillation challenges

常见蒸馏挑战

Challenge: Duplicate terminology

挑战:术语重复

When you find two terms for the same concept (across specs, within a spec, or between spec and code) treat it as a blocking problem.
-- BAD: Acknowledges duplication without resolving it
-- Order vs Purchase
-- checkout.allium uses "Purchase" - these are equivalent concepts.
This is not a resolution. When different parts of a codebase are built against different specs, both terms end up in the implementation: duplicate models, redundant join tables, foreign keys pointing both ways.
What to do:
  • Choose one term. Cross-reference related specs before deciding.
  • Update all references. Do not leave the old term in comments or "see also" notes.
  • Note the rename in a changelog, not in the spec itself.
Warning signs in code:
  • Two models representing the same concept (
    Order
    and
    Purchase
    )
  • Join tables for both (
    order_items
    ,
    purchase_items
    )
  • Comments like "equivalent to X" or "same as Y"
The spec you extract must pick one term. Flag the other as technical debt to remove.
当你发现同一概念有两个术语(跨规范、同一规范内、或规范与代码之间),需将其视为阻塞问题。
-- BAD: Acknowledges duplication without resolving it
-- Order vs Purchase
-- checkout.allium uses "Purchase" - these are equivalent concepts.
这不是解决方案。当代码库的不同部分基于不同规范构建时,两个术语都会出现在实现中:重复模型、冗余关联表、双向外键。
解决方法:
  • 选择一个术语,决定前交叉参考相关规范。
  • 更新所有引用,不要在注释或“另见”说明中保留旧术语。
  • 在变更日志中记录重命名,而非规范本身。
代码中的警示信号:
  • 两个模型代表同一概念(
    Order
    Purchase
  • 两者的关联表(
    order_items
    purchase_items
  • 类似“等同于X”或“与Y相同”的注释
你提取的规范必须选择一个术语,将另一个标记为待移除的技术债务。

Challenge: Implicit state machines

挑战:隐式状态机

Code often has implicit states that are not modelled:
python
undefined
代码中常存在未建模的隐式状态:
python
undefined

No explicit status field, but there's a state machine hiding here

No explicit status field, but there's a state machine hiding here

class FeedbackRequest: interview_id = Column(Integer) interviewer_id = Column(Integer) requested_at = Column(DateTime) reminded_at = Column(DateTime, nullable=True) feedback_id = Column(Integer, nullable=True) # FK to Feedback if submitted

The implicit states are:
- `pending`: requested_at set, feedback_id null, reminded_at null
- `reminded`: reminded_at set, feedback_id null
- `submitted`: feedback_id set

Extract to explicit:
entity FeedbackRequest { interview: Interview interviewer: Interviewer requested_at: Timestamp reminded_at: Timestamp? status: pending | reminded | submitted }
undefined
class FeedbackRequest: interview_id = Column(Integer) interviewer_id = Column(Integer) requested_at = Column(DateTime) reminded_at = Column(DateTime, nullable=True) feedback_id = Column(Integer, nullable=True) # FK to Feedback if submitted

隐式状态包括:
- `pending`:requested_at已设置,feedback_id为空,reminded_at为空
- `reminded`:reminded_at已设置,feedback_id为空
- `submitted`:feedback_id已设置

提取为显式状态:
entity FeedbackRequest { interview: Interview interviewer: Interviewer requested_at: Timestamp reminded_at: Timestamp? status: pending | reminded | submitted }
undefined

Challenge: Scattered logic

挑战:逻辑分散

The same conceptual rule might be spread across multiple places:
python
undefined
同一概念规则可能分散在多个位置:
python
undefined

In API handler

In API handler

def accept_invitation(request): if invitation.status != 'pending': return error(400, "Already responded") ...
def accept_invitation(request): if invitation.status != 'pending': return error(400, "Already responded") ...

In model

In model

class Invitation: def can_accept(self): return self.expires_at > datetime.utcnow()
class Invitation: def can_accept(self): return self.expires_at > datetime.utcnow()

In service

In service

def process_acceptance(invitation, slot): if slot not in invitation.slots: raise InvalidSlot() ...

Consolidate into one rule:
rule CandidateAccepts { when: CandidateAccepts(invitation, slot)
requires: invitation.status = pending
requires: invitation.expires_at > now
requires: slot in invitation.slots
...
}
undefined
def process_acceptance(invitation, slot): if slot not in invitation.slots: raise InvalidSlot() ...

合并为一个规则:
rule CandidateAccepts { when: CandidateAccepts(invitation, slot)
requires: invitation.status = pending
requires: invitation.expires_at > now
requires: slot in invitation.slots
...
}
undefined

Challenge: Dead code and historical accidents

挑战:死代码与历史遗留问题

Codebases accumulate features that were built but never used, workarounds for bugs that are now fixed, and code paths that are never executed.
Do not include these in the spec. If you are unsure:
  1. Check if the code is actually reachable
  2. Ask developers if it is intentional
  3. Check git history for context
代码库会积累从未使用的功能、已修复bug的临时解决方案,以及从未执行的代码路径。
不要将这些纳入规范。如果不确定:
  1. 检查代码是否可访问
  2. 询问开发者是否是有意为之
  3. 查看git历史获取上下文

Challenge: Missing error handling

挑战:缺失错误处理

Code might silently fail or have incomplete error handling:
python
def send_notification(user, message):
    try:
        slack.send(user.slack_id, message)
    except SlackError:
        pass  # Silently ignore failures
The spec should capture the intended behaviour, not the bug:
ensures: Notification.created(to: user, channel: slack)
Whether the current implementation properly handles failures is separate from what the system should do.
代码可能静默失败或错误处理不完整:
python
def send_notification(user, message):
    try:
        slack.send(user.slack_id, message)
    except SlackError:
        pass  # Silently ignore failures
规范应捕捉预期行为,而非bug:
ensures: Notification.created(to: user, channel: slack)
当前实现是否能正确处理失败,与系统应该实现的功能是两个独立问题。

Challenge: Over-engineered abstractions

挑战:过度设计的抽象

Enterprise codebases often have abstraction layers that obscure intent:
java
public interface NotificationStrategy {
    void notify(NotificationContext context);
}

public class SlackNotificationStrategy implements NotificationStrategy {
    @Override
    public void notify(NotificationContext context) {
        // Actual Slack call buried 5 levels deep
    }
}
Cut through to the actual behaviour. The spec does not need strategy patterns, dependency injection or abstract factories. Just:
ensures: Notification.created(channel: slack, ...)
企业代码库常存在模糊意图的抽象层:
java
public interface NotificationStrategy {
    void notify(NotificationContext context);
}

public class SlackNotificationStrategy implements NotificationStrategy {
    @Override
    public void notify(NotificationContext context) {
        // Actual Slack call buried 5 levels deep
    }
}
直接关注实际行为。规范不需要策略模式、依赖注入或抽象工厂,只需:
ensures: Notification.created(channel: slack, ...)

Checklist: Have you abstracted enough?

检查清单:是否足够抽象?

Before finalising a distilled spec:
  • No database column types (Integer, VARCHAR, etc.)
  • No ORM or query syntax
  • No HTTP status codes or API paths
  • No framework-specific concepts (middleware, decorators, etc.)
  • No programming language types (int, str, List, etc.)
  • No variable names from the code (use domain terms)
  • No infrastructure (Redis, Kafka, S3, etc.)
  • Foreign keys replaced with relationships
  • Tokens/secrets removed (implementation of identity)
  • Timestamps use domain Duration, not timedelta/seconds
If any remain, ask: "Would a stakeholder include this in a requirements doc?"
在最终确定提炼后的规范前:
  • 无数据库列类型(Integer、VARCHAR等)
  • 无ORM或查询语法
  • 无HTTP状态码或API路径
  • 无框架特定概念(中间件、装饰器等)
  • 无编程语言类型(int、str、List等)
  • 无代码中的变量名(使用领域术语)
  • 无基础设施(Redis、Kafka、S3等)
  • 外键已替换为关系
  • 令牌/密钥已移除(身份的实现细节)
  • 时间戳使用领域Duration,而非timedelta/秒
如果仍有上述内容,问自己:“利益相关者会把这个纳入需求文档吗?”

Checklist: Terminology consistency

检查清单:术语一致性

  • Each concept has exactly one name throughout the spec
  • No "also known as" or "equivalent to" comments
  • Cross-referenced related specs for conflicting terms
  • Duplicate models in code flagged as technical debt to remove
  • 每个概念在规范中只有一个名称
  • 无“又称”或“等同于”注释
  • 交叉参考相关规范以避免术语冲突
  • 代码中的重复模型已标记为待移除的技术债务

After distillation

蒸馏完成后

The extracted spec is a starting point. For targeted changes as requirements evolve, use the
tend
skill. For checking ongoing alignment between the spec and implementation, use the
weed
skill.
提取的规范是起点。当需求演进需要针对性变更时,使用
tend
技能;如需持续检查规范与实现的一致性,使用
weed
技能。

References

参考资料

  • Language reference, full Allium syntax
  • Worked examples, complete code-to-spec examples in Python, TypeScript and Java
  • 语言参考,完整Allium语法
  • 实战示例,Python、TypeScript和Java的代码转规范完整示例