archive.today — Printing Press CLI
Prerequisites: Install the CLI
This skill drives the
binary.
You must verify the CLI is installed before invoking any command from this skill. If it is missing, install it first:
- Install via the Printing Press installer:
bash
npx -y @mvanhorn/printing-press install archive-is --cli-only
- Verify:
archive-is-pp-cli --version
- Ensure (or ) is on .
If the
install fails (no Node, offline, etc.), fall back to a direct Go install (requires Go 1.23+):
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
If
reports "command not found" after install, the install step did not put the binary on
. Do not proceed with skill commands until verification succeeds.
When to Use This CLI
Reach for this whenever a user wants to archive a URL, read a paywalled article, check whether something was previously archived, or batch-capture a list of URLs for research. Specifically good when:
- A user sends a paywalled link and asks "can you read this" → fetches text via archive
- They want to preserve a URL that might change → forces a fresh capture
- They want historical versions → lists all known snapshots
- They have 20+ URLs to archive → runs rate-limited batch archival
Don't reach for this if the URL is trivially scrapeable without archive services (no paywall, robots-allowed, direct HTTP works), or if the user wants the original source rather than a cached version.
Unique Capabilities
The whole CLI is unique — archive.today has no official API. But within this CLI, certain commands are the differentiators.
The hero commands
-
— Find or create an archive for a URL. Looks up existing snapshots first (Memento timegate → CDX fallback); submits a fresh capture only if nothing exists. The "always do the right thing" command.
This is how 90% of agent calls should start. It's idempotent — calling it twice on the same URL doesn't double-submit.
-
get <url> [--format text|html]
/ — Fetch article text, optionally LLM-summarized. Automatic Wayback fallback when archive.today serves a CAPTCHA (which happens daily to cloud IPs).
pipes the fetched text through a summarization step — useful for agent chains where you want a short take without shipping 20KB of HTML back.
Durability operations
-
— Force a fresh capture via
/submit/?url=<x>&anyway=1
. Use when
returns an existing snapshot that's too old or missing a paywall update.
-
— List all known snapshots via Memento timemap parsing. Shows every capture date across both archive.today and Wayback.
-
— Rate-limited batch archiving from a file or stdin. Reads URLs one per line, submits each with backoff, returns a report of successes / failures / pre-existing.
grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk -
archives every URL in a markdown file.
-
— Fire-and-forget submit with optional wait+poll. Useful for long captures where you want to come back later.
Observability
-
— Just the newest snapshot URL for a target, useful in scripts.
-
— List your local capture index (post-sync).
-
— archive.today's global recent-archives feed.
-
--backend archive-is,wayback
— Every read/get accepts a backend preference. Defaults to archive-is with Wayback fallback; flip the order for Wayback-primary.
Command Reference
Archive + retrieve:
archive-is-pp-cli read <url>
— Find or create (hero command)
archive-is-pp-cli get <url>
— Fetch article text (with Wayback fallback)
archive-is-pp-cli tldr <url>
— Fetch + summarize
archive-is-pp-cli save <url>
— Force fresh capture
archive-is-pp-cli request <url>
— Fire-and-forget submit
archive-is-pp-cli check <url>
— Does an archive exist?
Listing + history:
archive-is-pp-cli history <url>
— All known snapshots
archive-is-pp-cli newest <url>
— Newest snapshot URL
archive-is-pp-cli captures
— Local capture index
- — Global recent feed
Batch:
archive-is-pp-cli bulk [file]
— Batch from file or stdin
Local store:
Auth + health:
- — Config (no API key needed; auth is a no-op)
- — Verify backend reachability
Recipes
Read a paywalled article
bash
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent
# or: return just the text
archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent
returns the archive URL (finding existing or creating new).
returns the article body, falling back to Wayback if archive.today CAPTCHAs.
Preserve a URL before it changes
bash
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent # verify
Force capture, then check history to confirm the new snapshot registered.
Bulk archive a research batch
bash
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent
# or from a file:
archive-is-pp-cli bulk urls.txt --agent
Reads URLs one per line, submits each with exponential backoff, returns per-URL status (archived, pre-existing, failed) as JSON.
Wayback-preferred for a reliable-read
bash
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent
Use when the Wayback Machine snapshot is known to be cleaner or archive.today is rate-limiting.
Auth Setup
No API key required. Archive.today and Wayback Machine are both public. The
subcommand exists for consistency but is a no-op —
reports "Auth: not required" which is the expected state.
Optional env:
- — override archive.today host (for mirrors)
- — override Wayback Machine host
Agent Mode
Add
to any command. Expands to
--json --compact --no-input --no-color --yes --no-prompt
. Every action command also prints structured
hints on stderr when called non-interactively — the calling agent sees "tried X, got Y, consider Z" automatically.
Notable flags:
--submit-timeout <duration>
— max wait for a fresh submit (default ; = unbounded)
--backend archive-is,wayback
— backend preference and fallback order
- — / output format
Filtering output
accepts dotted paths to descend into nested responses; arrays traverse element-wise:
bash
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name
Use this to narrow huge payloads to the fields you actually need — critical for deeply nested API responses.
Response envelope
Data-layer commands wrap output in
{"meta": {...}, "results": <data>}
. Parse
for data and
to know whether it's
or local. The
summary is printed to stderr only when stdout is a TTY; piped/agent consumers see pure JSON on stdout.
Exit Codes
| Code | Meaning |
|---|
| 0 | Success |
| 2 | Usage error |
| 3 | Not found (no snapshot exists) |
| 5 | API error (archive.today or Wayback down) |
| 7 | Rate limited (too many submits) |
Installation
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctor
MCP Server
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcp
Argument Parsing
- Empty, , or → run
- → CLI; → MCP
- Anything that looks like a URL, or "archive <url>" / "bypass paywall on <url>" → is the default — it's idempotent and covers the 90% case.
- "bulk archive" / "archive these" → from stdin if URLs are pasted, else ask for the file path.
<!-- pr-218-features -->
Agent Workflow Features
This CLI exposes three shared agent-workflow capabilities patched in from cli-printing-press PR #218.
Named profiles
Persist a set of flags under a name and reuse them across invocations.
bash
# Save the current non-default flags as a named profile
archive-is-pp-cli profile save <name>
# Use a profile — overlays its values onto any flag you don't set explicitly
archive-is-pp-cli --profile <name> <command>
# List / inspect / remove
archive-is-pp-cli profile list
archive-is-pp-cli profile show <name>
archive-is-pp-cli profile delete <name> --yes
Flag precedence: explicit flag > env var > profile > default.
--deliver
Route command output to a sink other than stdout. Useful when an agent needs to hand a result to a file, a webhook, or another process without plumbing.
bash
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in
File sinks write atomically (tmp + rename). Webhook sinks POST
(or
when
is set). Unknown schemes produce a structured refusal listing the supported set.
feedback
Record in-band feedback about this CLI from the agent side of the loop. Local-only by default; safe to call without configuration.
bash
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list # show local entries
archive-is-pp-cli feedback clear --yes # wipe
Entries append to
~/.archive-is-pp-cli/feedback.jsonl
as JSON lines. When
ARCHIVE_IS_FEEDBACK_ENDPOINT
is set and either
is passed or
ARCHIVE_IS_FEEDBACK_AUTO_SEND=true
, the entry is also POSTed upstream (non-blocking — local write always succeeds).