pp-archive-is

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

archive.today — Printing Press CLI

archive.today — Printing Press CLI

Prerequisites: Install the CLI

前置条件:安装CLI

This skill drives the
archive-is-pp-cli
binary. You must verify the CLI is installed before invoking any command from this skill. If it is missing, install it first:
  1. Install via the Printing Press installer:
    bash
    npx -y @mvanhorn/printing-press install archive-is --cli-only
  2. Verify:
    archive-is-pp-cli --version
  3. Ensure
    $GOPATH/bin
    (or
    $HOME/go/bin
    ) is on
    $PATH
    .
If the
npx
install fails (no Node, offline, etc.), fall back to a direct Go install (requires Go 1.23+):
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
If
--version
reports "command not found" after install, the install step did not put the binary on
$PATH
. Do not proceed with skill commands until verification succeeds.
此技能基于
archive-is-pp-cli
二进制文件运行。在调用此技能的任何命令前,必须确认CLI已安装。如果未安装,请先执行以下步骤:
  1. 通过Printing Press安装器安装:
    bash
    npx -y @mvanhorn/printing-press install archive-is --cli-only
  2. 验证安装:
    archive-is-pp-cli --version
  3. 确保
    $GOPATH/bin
    (或
    $HOME/go/bin
    )已添加到
    $PATH
    环境变量中。
如果
npx
安装失败(无Node环境、离线等),可改用Go直接安装(需要Go 1.23及以上版本):
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
如果安装后执行
--version
提示“command not found”,说明安装未将二进制文件添加到
$PATH
。请在验证成功前不要执行技能命令。

When to Use This CLI

何时使用此CLI

Reach for this whenever a user wants to archive a URL, read a paywalled article, check whether something was previously archived, or batch-capture a list of URLs for research. Specifically good when:
  • A user sends a paywalled link and asks "can you read this" →
    read
    fetches text via archive
  • They want to preserve a URL that might change →
    save
    forces a fresh capture
  • They want historical versions →
    history
    lists all known snapshots
  • They have 20+ URLs to archive →
    bulk
    runs rate-limited batch archival
Don't reach for this if the URL is trivially scrapeable without archive services (no paywall, robots-allowed, direct HTTP works), or if the user wants the original source rather than a cached version.
当用户需要归档URL、阅读付费文章、检查内容是否已归档,或者批量捕获URL列表用于研究时,均可使用此工具。尤其适用于以下场景:
  • 用户发送付费链接并询问“你能读取这个内容吗” → 使用
    read
    命令通过归档服务获取文本
  • 用户想要保存可能会变更的URL → 使用
    save
    命令强制生成新的归档快照
  • 用户需要查看历史版本 → 使用
    history
    命令列出所有已知快照
  • 用户有20个以上URL需要归档 → 使用
    bulk
    命令进行限流批量归档
如果URL无需归档服务即可轻松抓取(无付费墙、允许爬虫、直接HTTP访问有效),或者用户需要原始源而非缓存版本,则无需使用此工具。

Unique Capabilities

独特功能

The whole CLI is unique — archive.today has no official API. But within this CLI, certain commands are the differentiators.
整个CLI工具都是独特的——archive.today没有官方API。但在这个CLI中,某些命令是核心差异化功能。

The hero commands

核心命令

  • read <url>
    — Find or create an archive for a URL. Looks up existing snapshots first (Memento timegate → CDX fallback); submits a fresh capture only if nothing exists. The "always do the right thing" command.
    This is how 90% of agent calls should start. It's idempotent — calling it twice on the same URL doesn't double-submit.
  • get <url> [--format text|html]
    /
    tldr <url>
    — Fetch article text, optionally LLM-summarized. Automatic Wayback fallback when archive.today serves a CAPTCHA (which happens daily to cloud IPs).
    tldr
    pipes the fetched text through a summarization step — useful for agent chains where you want a short take without shipping 20KB of HTML back.
  • read <url>
    — 查找或创建URL的归档。先查询现有快照(Memento timegate → CDX备用);仅当无现有快照时才提交新的捕获请求。这是“始终做正确的事”的命令。
    90%的Agent调用都应从此命令开始。它具有幂等性——对同一URL调用两次不会重复提交。
  • get <url> [--format text|html]
    /
    tldr <url>
    — 获取文章文本,可选LLM摘要功能。当archive.today出现验证码(云IP每天都会遇到)时,会自动切换到Wayback Machine作为备用。
    tldr
    命令会将获取到的文本传入摘要步骤——在Agent链式调用中非常有用,无需返回20KB的HTML内容,只需简短摘要。

Durability operations

持久化操作

  • save <url>
    — Force a fresh capture via
    /submit/?url=<x>&anyway=1
    . Use when
    read
    returns an existing snapshot that's too old or missing a paywall update.
  • history <url>
    — List all known snapshots via Memento timemap parsing. Shows every capture date across both archive.today and Wayback.
  • bulk [file]
    — Rate-limited batch archiving from a file or stdin. Reads URLs one per line, submits each with backoff, returns a report of successes / failures / pre-existing.
    grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk -
    archives every URL in a markdown file.
  • request <url>
    — Fire-and-forget submit with optional wait+poll. Useful for long captures where you want to come back later.
  • save <url>
    — 通过
    /submit/?url=<x>&anyway=1
    强制生成新的捕获快照。当
    read
    返回的现有快照过旧或未包含付费墙更新内容时使用。
  • history <url>
    — 通过解析Memento timemap列出所有已知快照。显示archive.today和Wayback Machine中的所有捕获日期。
  • bulk [file]
    — 从文件或标准输入进行限流批量归档。逐行读取URL,提交时带有退避机制,返回成功/失败/已存在的报告。
    grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk -
    可归档markdown文件中的所有URL。
  • request <url>
    — 提交后无需等待的异步请求,可选等待+轮询。适用于需要稍后查看结果的长时捕获任务。

Observability

可观测性

  • snapshots newest <url>
    — Just the newest snapshot URL for a target, useful in scripts.
  • captures
    — List your local capture index (post-sync).
  • feeds
    — archive.today's global recent-archives feed.
  • --backend archive-is,wayback
    — Every read/get accepts a backend preference. Defaults to archive-is with Wayback fallback; flip the order for Wayback-primary.
  • snapshots newest <url>
    — 仅返回目标URL的最新快照链接,适用于脚本。
  • captures
    — 列出本地捕获索引(同步后)。
  • feeds
    — archive.today的全局近期归档订阅源。
  • --backend archive-is,wayback
    — 所有read/get命令都支持指定后端偏好。默认优先使用archive-is,Wayback作为备用;可调整顺序优先使用Wayback。

Command Reference

命令参考

Archive + retrieve:
  • archive-is-pp-cli read <url>
    — Find or create (hero command)
  • archive-is-pp-cli get <url>
    — Fetch article text (with Wayback fallback)
  • archive-is-pp-cli tldr <url>
    — Fetch + summarize
  • archive-is-pp-cli save <url>
    — Force fresh capture
  • archive-is-pp-cli request <url>
    — Fire-and-forget submit
  • archive-is-pp-cli check <url>
    — Does an archive exist?
Listing + history:
  • archive-is-pp-cli history <url>
    — All known snapshots
  • archive-is-pp-cli newest <url>
    — Newest snapshot URL
  • archive-is-pp-cli captures
    — Local capture index
  • archive-is-pp-cli feeds
    — Global recent feed
Batch:
  • archive-is-pp-cli bulk [file]
    — Batch from file or stdin
Local store:
  • archive-is-pp-cli sync
    /
    archive
    /
    export
    /
    import
    — Local SQLite ops
Auth + health:
  • archive-is-pp-cli auth
    — Config (no API key needed; auth is a no-op)
  • archive-is-pp-cli doctor
    — Verify backend reachability
归档与检索:
  • archive-is-pp-cli read <url>
    — 查找或创建归档(核心命令)
  • archive-is-pp-cli get <url>
    — 获取文章文本(支持Wayback备用)
  • archive-is-pp-cli tldr <url>
    — 获取文本并生成摘要
  • archive-is-pp-cli save <url>
    — 强制生成新捕获快照
  • archive-is-pp-cli request <url>
    — 异步提交请求
  • archive-is-pp-cli check <url>
    — 检查是否存在归档
列表与历史:
  • archive-is-pp-cli history <url>
    — 所有已知快照
  • archive-is-pp-cli newest <url>
    — 最新快照链接
  • archive-is-pp-cli captures
    — 本地捕获索引
  • archive-is-pp-cli feeds
    — 全局近期订阅源
批量操作:
  • archive-is-pp-cli bulk [file]
    — 从文件或标准输入批量归档
本地存储:
  • archive-is-pp-cli sync
    /
    archive
    /
    export
    /
    import
    — 本地SQLite操作
认证与健康检查:
  • archive-is-pp-cli auth
    — 配置(无需API密钥;此命令无实际操作)
  • archive-is-pp-cli doctor
    — 验证后端服务可达性

Recipes

使用示例

Read a paywalled article

阅读付费文章

bash
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent
bash
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent

or: return just the text

或者:仅返回文本

archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent

`read` returns the archive URL (finding existing or creating new). `get --format text` returns the article body, falling back to Wayback if archive.today CAPTCHAs.
archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent

`read`命令返回归档链接(查找现有或创建新归档)。`get --format text`命令返回文章正文,当archive.today出现验证码时会自动切换到Wayback。

Preserve a URL before it changes

在URL变更前保存归档

bash
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent  # verify
Force capture, then check history to confirm the new snapshot registered.
bash
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent  # 验证
强制生成捕获快照,然后通过history命令确认新快照已记录。

Bulk archive a research batch

批量归档研究用URL

bash
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent
bash
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent

or from a file:

或者从文件读取:

archive-is-pp-cli bulk urls.txt --agent

Reads URLs one per line, submits each with exponential backoff, returns per-URL status (archived, pre-existing, failed) as JSON.
archive-is-pp-cli bulk urls.txt --agent

逐行读取URL,使用指数退避机制提交每个请求,返回每个URL的状态(已归档、已存在、失败)JSON结果。

Wayback-preferred for a reliable-read

优先使用Wayback以确保可靠读取

bash
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent
Use when the Wayback Machine snapshot is known to be cleaner or archive.today is rate-limiting.
bash
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent
当Wayback Machine的快照更清晰,或archive.today存在限流时使用此命令。

Auth Setup

认证设置

No API key required. Archive.today and Wayback Machine are both public. The
auth
subcommand exists for consistency but is a no-op —
doctor
reports "Auth: not required" which is the expected state.
Optional env:
  • ARCHIVE_IS_BASE_URL
    — override archive.today host (for mirrors)
  • WAYBACK_BASE_URL
    — override Wayback Machine host
无需API密钥。archive.today和Wayback Machine都是公共服务。
auth
子命令仅为保持一致性而存在,无实际操作——
doctor
命令会显示“Auth: not required”,这是预期状态。
可选环境变量:
  • ARCHIVE_IS_BASE_URL
    — 覆盖archive.today的主机地址(用于镜像站点)
  • WAYBACK_BASE_URL
    — 覆盖Wayback Machine的主机地址

Agent Mode

Agent模式

Add
--agent
to any command. Expands to
--json --compact --no-input --no-color --yes --no-prompt
. Every action command also prints structured
next_actions
hints on stderr when called non-interactively — the calling agent sees "tried X, got Y, consider Z" automatically.
Notable flags:
  • --submit-timeout <duration>
    — max wait for a fresh submit (default
    10m
    ;
    0
    = unbounded)
  • --backend archive-is,wayback
    — backend preference and fallback order
  • --format text|html
    get
    /
    tldr
    output format
在任何命令后添加
--agent
参数。此参数等价于
--json --compact --no-input --no-color --yes --no-prompt
。当以非交互方式调用时,所有操作命令还会在标准错误输出中打印结构化的
next_actions
提示——调用Agent会自动看到“尝试了X,得到Y,建议考虑Z”的信息。
重要参数:
  • --submit-timeout <duration>
    — 新提交请求的最大等待时间(默认
    10m
    0
    表示无限制)
  • --backend archive-is,wayback
    — 后端偏好和备用顺序
  • --format text|html
    get
    /
    tldr
    命令的输出格式

Filtering output

输出过滤

--select
accepts dotted paths to descend into nested responses; arrays traverse element-wise:
bash
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name
Use this to narrow huge payloads to the fields you actually need — critical for deeply nested API responses.
--select
参数接受点路径以深入嵌套响应;数组会遍历每个元素:
bash
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name
使用此参数可将庞大的响应缩小到实际需要的字段——这对深度嵌套的API响应至关重要。

Response envelope

响应信封

Data-layer commands wrap output in
{"meta": {...}, "results": <data>}
. Parse
.results
for data and
.meta.source
to know whether it's
live
or local. The
N results (live)
summary is printed to stderr only when stdout is a TTY; piped/agent consumers see pure JSON on stdout.
数据层命令会将输出包装在
{"meta": {...}, "results": <data>}
中。解析
.results
获取数据,
.meta.source
可了解数据是来自
live
(在线服务)还是本地。当标准输出为终端时,
N results (live)
摘要仅会打印到标准错误输出;管道/Agent消费者会在标准输出看到纯JSON内容。

Exit Codes

退出码

CodeMeaning
0Success
2Usage error
3Not found (no snapshot exists)
5API error (archive.today or Wayback down)
7Rate limited (too many submits)
代码含义
0成功
2使用错误
3未找到(无快照存在)
5API错误(archive.today或Wayback服务不可用)
7限流(提交次数过多)

Installation

安装

bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctor
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctor

MCP Server

MCP服务器

bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcp
bash
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcp

Argument Parsing

参数解析

Given
$ARGUMENTS
:
  1. Empty,
    help
    , or
    --help
    → run
    archive-is-pp-cli --help
  2. install
    → CLI;
    install mcp
    → MCP
  3. Anything that looks like a URL, or "archive <url>" / "bypass paywall on <url>"
    read <url> --agent
    is the default — it's idempotent and covers the 90% case.
  4. "bulk archive" / "archive these"
    bulk
    from stdin if URLs are pasted, else ask for the file path.
<!-- pr-218-features -->
针对
$ARGUMENTS
的处理逻辑:
  1. 为空、
    help
    --help
    → 执行
    archive-is-pp-cli --help
  2. install
    → 安装CLI;
    install mcp
    → 安装MCP
  3. 任何类似URL的内容,或包含"archive <url>" / "bypass paywall on <url>" → 默认执行
    read <url> --agent
    ——此命令具有幂等性,覆盖90%的使用场景。
  4. "bulk archive" / "archive these" → 如果已粘贴URL,则从标准输入执行
    bulk
    命令;否则询问文件路径。
<!-- pr-218-features -->

Agent Workflow Features

Agent工作流特性

This CLI exposes three shared agent-workflow capabilities patched in from cli-printing-press PR #218.
此CLI提供了三个从cli-printing-press PR #218中引入的通用Agent工作流功能。

Named profiles

命名配置文件

Persist a set of flags under a name and reuse them across invocations.
bash
undefined
将一组参数保存为命名配置文件,可在多次调用中复用。
bash
undefined

Save the current non-default flags as a named profile

将当前非默认参数保存为命名配置文件

archive-is-pp-cli profile save <name>
archive-is-pp-cli profile save <name>

Use a profile — overlays its values onto any flag you don't set explicitly

使用配置文件——会覆盖未显式设置的参数

archive-is-pp-cli --profile <name> <command>
archive-is-pp-cli --profile <name> <command>

List / inspect / remove

列出/查看/删除配置文件

archive-is-pp-cli profile list archive-is-pp-cli profile show <name> archive-is-pp-cli profile delete <name> --yes

Flag precedence: explicit flag > env var > profile > default.
archive-is-pp-cli profile list archive-is-pp-cli profile show <name> archive-is-pp-cli profile delete <name> --yes

参数优先级:显式参数 > 环境变量 > 配置文件 > 默认值。

--deliver

--deliver

Route command output to a sink other than stdout. Useful when an agent needs to hand a result to a file, a webhook, or another process without plumbing.
bash
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in
File sinks write atomically (tmp + rename). Webhook sinks POST
application/json
(or
application/x-ndjson
when
--compact
is set). Unknown schemes produce a structured refusal listing the supported set.
将命令输出路由到标准输出以外的目标。当Agent需要将结果发送到文件、webhook或其他进程而无需额外处理时非常有用。
bash
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in
文件目标会原子写入(先写入临时文件再重命名)。Webhook目标会POST
application/json
(当设置
--compact
时为
application/x-ndjson
)。未知协议会返回结构化的拒绝信息,列出支持的协议类型。

feedback

feedback

Record in-band feedback about this CLI from the agent side of the loop. Local-only by default; safe to call without configuration.
bash
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list         # show local entries
archive-is-pp-cli feedback clear --yes  # wipe
Entries append to
~/.archive-is-pp-cli/feedback.jsonl
as JSON lines. When
ARCHIVE_IS_FEEDBACK_ENDPOINT
is set and either
--send
is passed or
ARCHIVE_IS_FEEDBACK_AUTO_SEND=true
, the entry is also POSTed upstream (non-blocking — local write always succeeds).
从Agent侧记录关于此CLI的反馈。默认仅本地存储;无需配置即可安全调用。
bash
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list         # 显示本地记录
archive-is-pp-cli feedback clear --yes  # 清空记录
记录会以JSON行格式追加到
~/.archive-is-pp-cli/feedback.jsonl
。当设置
ARCHIVE_IS_FEEDBACK_ENDPOINT
且传递
--send
参数或设置
ARCHIVE_IS_FEEDBACK_AUTO_SEND=true
时,记录还会被POST到上游服务(非阻塞——本地写入始终成功)。