Extract an Allium specification from an existing codebase. Use when the user has existing code and wants to distil behaviour into a spec, reverse engineer a specification from implementation, generate a spec from code, turn implementation into a behavioural specification, or document what a codebase does in Allium terms.
This guide covers extracting Allium specifications from existing codebases. The core challenge is the same as forward elicitation: finding the right level of abstraction. In elicitation you filter out implementation ideas as they arise. In distillation you filter out implementation details that already exist. Both require the same judgement about what matters at the domain level.
Code tells you how something works. A specification captures what it does and why it matters. The skill is asking "why does the stakeholder care about this?" and "could this be different while still being the same system?"
Scoping the distillation effort
Before diving into code, establish what you are trying to specify. Not every line of code deserves a place in the spec.
Questions to ask first
"What subset of this codebase are we specifying?"
Mono repos often contain multiple distinct systems. You may only need a spec for one service or domain. Clarify boundaries explicitly before starting.
"Is there code we should deliberately exclude?"
Legacy code: features kept for backwards compatibility but not part of the core system
Incidental code: supporting infrastructure that is not domain-level (logging, metrics, deployment)
Deprecated paths: code scheduled for removal
Experimental features: behind feature flags, not yet design decisions
"Who owns this spec?"
Different teams may own different parts of a mono repo. Each team's spec should focus on their domain.
The "Would we rebuild this?" test
For any code path you encounter, ask: "If we rebuilt this system from scratch, would this be in the requirements?"
Yes: include in spec
No, it is legacy: exclude
No, it is infrastructure: exclude
No, it is a workaround: exclude (but note the underlying need it addresses)
Documenting scope decisions
At the top of a distilled spec, document what is included and excluded:
Distillation and elicitation share the same fundamental challenge: choosing what to include. The tests below work in both directions, whether you are hearing a stakeholder describe a feature or reading code that implements it.
The "Why" test
For every detail in the code, ask: "Why does the stakeholder care about this?"
Code detail
Why?
Include?
Invitation expires in 7 days
Affects candidate experience
Yes
Token is 32 bytes URL-safe
Security implementation
No
Sessions stored in Redis
Performance choice
No
Uses PostgreSQL JSONB
Database implementation
No
Slot status changes to 'proposed'
Affects what candidate sees
Yes
Email sent when invitation accepted
Communication requirement
Yes
If you cannot articulate why a stakeholder would care, it is probably implementation.
The "Could it be different?" test
Ask: "Could this be implemented differently while still being the same system?"
If yes: probably implementation detail, abstract it away
If no: probably domain-level, include it
Detail
Could be different?
Include?
secrets.token_urlsafe(32)
Yes, any secure token generation
No
7-day invitation expiry
No, this is the design decision
Yes
PostgreSQL database
Yes, any database
No
"Pending, Confirmed, Completed" states
No, this is the workflow
Yes
The "Template vs Instance" test
Is this a category of thing, or a specific instance?
Instance (often implementation)
Template (often domain-level)
Google OAuth
Authentication provider
Slack webhook
Notification channel
SendGrid API
Email delivery
timedelta(hours=3)
Confirmation deadline
Sometimes the instance IS the domain concern. See "The concrete detail problem" below.
The distillation mindset
Code is over-specified
Every line of code makes decisions that might not matter at the domain level:
Question: Is "Google OAuth" domain-level or implementation?
It is implementation if:
Google is just the auth mechanism chosen
It could be replaced with any OAuth provider
Users do not see or care which provider
The code is written generically (provider is a parameter)
It is domain-level if:
Users explicitly choose Google (vs Microsoft, etc.)
"Sign in with Google" is a feature
Google-specific scopes or permissions are used
Multiple providers are supported as a feature
How to tell: Look at the UI and user flows. If users see "Sign in with Google" as a choice, it is domain-level. If they just see "Sign in" and Google happens to be behind it, it is implementation.
Almost always implementation. The spec should say:
entity Candidate {
skills: Set<String>
metadata: String? -- or model specific fields
}
The specific database is rarely domain-level. Exception: if the system explicitly promises PostgreSQL compatibility or specific PostgreSQL features to users.
rule CandidateAcceptsInvitation {
when: CandidateAccepts(invitation, slot)
requires: invitation.status = pending
requires: invitation.expires_at > now
requires: slot in invitation.slots
ensures: invitation.status = accepted
ensures: slot.status = booked
ensures:
for s in invitation.slots:
if s != slot: s.status = available
ensures: Interview.created(
candidacy: invitation.candidacy,
slot: slot,
status: scheduled
)
ensures: Notification.created(to: slot.interviewers, ...)
ensures: Email.created(to: invitation.candidate.email, ...)
}
Key extraction patterns:
Code pattern
Spec pattern
if x.status != 'pending': raise
requires: x.status = pending
if x.expires_at < now: raise
requires: x.expires_at > now
if item not in collection: raise
requires: item in collection
x.status = 'accepted'
ensures: x.status = accepted
Model.create(...)
ensures: Model.created(...)
send_email(...)
ensures: Email.created(...)
notify(...)
ensures: Notification.created(...)
Assertions, checks and validations found in code (e.g.
assert balance >= 0
, class-level validators) may map to expression-bearing invariants rather than rule preconditions. Consider whether they describe a system-wide property or a rule-specific guard.
Step 4: Find temporal triggers
Look for scheduled jobs and time-based logic:
python
# In celery tasks or cron jobs@app.taskdefexpire_invitations(): expired = Invitation.query.filter( Invitation.status =='pending', Invitation.expires_at < datetime.utcnow()).all()for invitation in expired: invitation.status ='expired'for slot in invitation.slots: slot.status ='available' notify_candidate_expired(invitation)@app.taskdefsend_reminders(): upcoming = Interview.query.filter( Interview.status =='scheduled', Interview.slot.time.between( datetime.utcnow()+ timedelta(hours=1), datetime.utcnow()+ timedelta(hours=2))).all()for interview in upcoming: send_reminder_notification(interview)
Extract:
rule InvitationExpires {
when: invitation: Invitation.expires_at <= now
requires: invitation.status = pending
ensures: invitation.status = expired
ensures:
for s in invitation.slots:
s.status = available
ensures: CandidateInformed(candidate: invitation.candidate, about: invitation_expired)
}
rule InterviewReminder {
when: interview: Interview.slot.time - 1.hour <= now
requires: interview.status = scheduled
ensures: Notification.created(to: interview.interviewers, template: reminder)
}
Step 5: Identify external boundaries
Look for third-party API calls, webhook handlers, import/export functions, and data that is read but never written (or vice versa).
These often indicate external entities:
python
# Candidate data comes from Greenhouse, we don't create itdefimport_from_greenhouse(webhook_data): candidate = Candidate.query.filter_by( greenhouse_id=webhook_data['id']).first()ifnot candidate: candidate = Candidate(greenhouse_id=webhook_data['id']) candidate.name = webhook_data['name'] candidate.email = webhook_data['email']
Config values that derive from other config values (e.g.
extended_timeout = base_timeout * 2
) should use qualified references or expression-form defaults in the config block rather than independent literal values.
Step 7: Validate with stakeholders
The extracted spec is a hypothesis. Validate it:
Show the spec to the original developers. "Is this what the system does?"
Show to stakeholders. "Is this what the system should do?"
Look for gaps. Code often has bugs or missing features; the spec might reveal them.
Common findings:
"Oh, that retry logic was a hack, we should remove it"
"Actually we wanted X but never built it"
"These two code paths should be the same but aren't"
Recognising library spec candidates
During distillation, stay alert for code that implements generic integration patterns rather than application-specific logic. These belong in library specs, not your main specification.
The same principle applies in elicitation. When a stakeholder describes "we use Google for login" or "payments go through Stripe", pause and consider whether this is a library spec.
Signals in the code
Third-party integration modules:
python
# Finding code like this suggests a library specclassStripeWebhookHandler:defhandle_invoice_paid(self, event):...defhandle_subscription_cancelled(self, event):...classGoogleOAuthProvider:defexchange_code(self, code):...defrefresh_token(self, refresh_token):...
Generic patterns with specific providers:
OAuth flows (Google, Microsoft, GitHub)
Payment processing (Stripe, PayPal)
Email delivery (SendGrid, Postmark, SES)
Calendar sync (Google Calendar, Outlook)
ATS integrations (Greenhouse, Lever)
File storage (S3, GCS)
Configuration-driven integrations:
python
# Heavy configuration suggests the integration itself is separableOAUTH_CONFIG ={'google':{'client_id':...,'scopes':...},'microsoft':{'client_id':...,'scopes':...},}
Questions to ask
"Is this integration logic, or application logic?"
Integration: how to talk to Stripe.
Application: what to do when payment succeeds.
"Would another application integrate the same way?"
If yes, library spec candidate. If no, probably application-specific.
"Does the code separate integration from application concerns?"
If cleanly separated, easy to extract to library spec. If tangled, might need refactoring first (but the spec should still separate them).
How to handle
Option 1: Reference an existing library spec
If a standard library spec exists for this integration:
use "github.com/allium-specs/stripe-billing/abc123" as stripe
-- Application responds to Stripe events
rule ActivateSubscription {
when: stripe/PaymentSucceeded(invoice)
...
}
Option 2: Create a separate library spec
If no standard spec exists but the integration is generic:
-- greenhouse-ats.allium (library spec)
-- Specifies: Greenhouse webhook events, candidate sync, etc.
-- interview-scheduling.allium (application spec)
use "./greenhouse-ats.allium" as greenhouse
rule ImportCandidate {
when: greenhouse/CandidateCreated(data)
ensures: Candidacy.created(...)
}
If you find yourself writing spec like this, stop and reconsider:
-- TOO DETAILED - this is Stripe's domain, not yours
rule ProcessStripeWebhook {
when: WebhookReceived(payload, signature)
requires: verify_stripe_signature(payload, signature)
let event = parse_stripe_event(payload)
if event.type = "invoice.paid":
...
}
See patterns.md Pattern 8 for detailed examples of integrating library specs.
Common distillation challenges
Challenge: Duplicate terminology
When you find two terms for the same concept (across specs, within a spec, or between spec and code) treat it as a blocking problem.
-- BAD: Acknowledges duplication without resolving it
-- Order vs Purchase
-- checkout.allium uses "Purchase" - these are equivalent concepts.
This is not a resolution. When different parts of a codebase are built against different specs, both terms end up in the implementation: duplicate models, redundant join tables, foreign keys pointing both ways.
What to do:
Choose one term. Cross-reference related specs before deciding.
Update all references. Do not leave the old term in comments or "see also" notes.
Note the rename in a changelog, not in the spec itself.
Warning signs in code:
Two models representing the same concept (
Order
and
Purchase
)
Join tables for both (
order_items
,
purchase_items
)
Comments like "equivalent to X" or "same as Y"
The spec you extract must pick one term. Flag the other as technical debt to remove.
Challenge: Implicit state machines
Code often has implicit states that are not modelled:
python
# No explicit status field, but there's a state machine hiding hereclassFeedbackRequest: interview_id = Column(Integer) interviewer_id = Column(Integer) requested_at = Column(DateTime) reminded_at = Column(DateTime, nullable=True) feedback_id = Column(Integer, nullable=True)# FK to Feedback if submitted
The same conceptual rule might be spread across multiple places:
python
# In API handlerdefaccept_invitation(request):if invitation.status !='pending':return error(400,"Already responded")...# In modelclassInvitation:defcan_accept(self):return self.expires_at > datetime.utcnow()# In servicedefprocess_acceptance(invitation, slot):if slot notin invitation.slots:raise InvalidSlot()...
Consolidate into one rule:
rule CandidateAccepts {
when: CandidateAccepts(invitation, slot)
requires: invitation.status = pending
requires: invitation.expires_at > now
requires: slot in invitation.slots
...
}
Challenge: Dead code and historical accidents
Codebases accumulate features that were built but never used, workarounds for bugs that are now fixed, and code paths that are never executed.
Do not include these in the spec. If you are unsure:
Check if the code is actually reachable
Ask developers if it is intentional
Check git history for context
Challenge: Missing error handling
Code might silently fail or have incomplete error handling: