content-parser

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to Use

适用场景

User provides a URL and wants to extract/read its content
Another skill needs to parse source material from a URL before generation
User says "parse this URL", "extract content from this link"
User says "解析链接", "提取内容"

用户提供URL并希望提取/查看其内容
其他技能在生成内容前需要从URL解析源材料
用户发送“parse this URL”“extract content from this link”指令
用户发送“解析链接”“提取内容”指令

When NOT to Use

不适用场景

User already has text content and doesn't need URL parsing
User wants to generate audio/video content (not content extraction)
User wants to read a local file (use standard file reading tools)

用户已拥有文本内容，无需解析URL
用户希望生成音频/视频内容（非内容提取需求）
用户希望读取本地文件（使用标准文件读取工具）

Purpose

功能目标

Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.

从支持的平台的URL中提取并标准化内容。返回包含内容主体、元数据和引用的结构化数据。可作为内容生成技能的预处理步骤，也可独立用于内容提取。

Hard Constraints

硬性约束

No shell scripts. Construct curl commands from the API reference files listed in Resources
Always read
```
shared/authentication.md
```
for API key and headers
Follow
```
shared/common-patterns.md
```
for polling, errors, and interaction patterns
URL must be a valid HTTP(S) URL
Always read config following
```
shared/config-pattern.md
```
before any interaction
Never save files to
```
~/Downloads/
```
or
```
.listenhub/
```
— save to the current working directory

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After collecting URL and options, confirm with the user before calling the extraction API. </HARD-GATE>

禁止使用shell脚本。需根据资源中列出的API参考文件构建curl命令
必须阅读
```
shared/authentication.md
```
获取API密钥和请求头信息
遵循
```
shared/common-patterns.md
```
中的轮询、错误处理和交互模式
URL必须是有效的HTTP(S)链接
在进行任何交互前，必须遵循
```
shared/config-pattern.md
```
读取配置
不得将文件保存到
```
~/Downloads/
```
或
```
.listenhub/
```
目录——请保存到当前工作目录

<HARD-GATE> 在每个多选步骤中必须使用AskUserQuestion工具——不得将选项以纯文本形式打印。一次只提一个问题。等待用户回答后再进行下一步。收集完URL和选项后，在调用提取API前需与用户确认。 </HARD-GATE>

Step -1: API Key Check

步骤-1：API密钥检查

shared/config-pattern.md

§ API Key Check. If the key is missing, stop immediately.

遵循

shared/config-pattern.md

中的“API密钥检查”章节。如果密钥缺失，立即停止操作。

Step 0: Config Setup

步骤0：配置设置

shared/config-pattern.md

Step 0.

If file doesn't exist — ask location, then create immediately:

bash

mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"

遵循

shared/config-pattern.md

的步骤0。

如果配置文件不存在 —— 询问用户存储位置，然后立即创建：

bash

mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"

(or $HOME/.listenhub/content-parser/config.json for global)

(或全局配置路径 $HOME/.listenhub/content-parser/config.json)

Then run **Setup Flow** below.

**If file exists** — read config, display summary, and confirm:

当前配置 (content-parser)：自动下载：{是 / 否}

Ask: "使用已保存的配置？" → **确认，直接继续** / **重新配置**

然后执行下方的**配置流程**。

**如果配置文件已存在** —— 读取配置，显示摘要并确认：

当前配置 (content-parser)：自动下载：{是 / 否}

询问：“使用已保存的配置？” → **确认，直接继续** / **重新配置**

Setup Flow (first run or reconfigure)

配置流程（首次运行或重新配置）

autoDownload: "自动保存提取的内容到当前目录？"
- "是（推荐）" →
```
autoDownload: true
```
- "否" →
```
autoDownload: false
```

Save immediately:

bash

NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

autoDownload：“自动将提取的内容保存到当前目录？”
- “是（推荐）” →
```
autoDownload: true
```
- “否” →
```
autoDownload: false
```

立即保存配置：

bash

NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

交互流程

Step 1: URL Input

步骤1：URL输入

Free text input. Ask the user:

What URL would you like to extract content from?

自由文本输入。询问用户：

您希望从哪个URL提取内容？

Step 2: Options (optional)

步骤2：选项配置（可选）

Ask if the user wants to configure extraction options:

Question: "Do you want to configure extraction options?"
Options:
  - "No, use defaults" — Extract with default settings
  - "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count

If "Yes", ask follow-up questions:

Summarize: "Generate a summary of the content?" (Yes/No)
Max Length: "Set maximum content length?" (Free text, e.g., "5000")
Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)

询问用户是否需要配置提取选项：

问题："是否需要配置提取选项？"
选项：
  - "否，使用默认设置" —— 使用默认配置提取内容
  - "是，配置选项" —— 设置摘要、最大长度或Twitter推文数量

如果用户选择“是”，则继续询问以下问题：

摘要生成：“是否需要生成内容摘要？”（是/否）
最大长度：“是否设置内容最大长度？”（自由文本输入，例如“5000”）
Twitter推文数量（仅当URL为Twitter/X个人主页时）：“需要获取多少条推文？”（1-100，默认20）

Step 3: Confirm & Extract

步骤3：确认并提取

Summarize:

Ready to extract content:

  URL: {url}
  Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default

  Proceed?

Wait for explicit confirmation before calling the API.

汇总信息：

准备开始提取内容：

  URL：{url}
  选项：{summarize: true, maxLength: 5000, twitter.count: 50} / 默认配置

  是否继续？

等待用户明确确认后再调用API。

Workflow

工作流

Validate URL: Must be HTTP(S). Normalize if needed (see
```
references/supported-platforms.md
```
)

Build request body:

json

{
  "source": {
    "type": "url",
    "uri": "{url}"
  },
  "options": {
    "summarize": true/false,
    "maxLength": 5000,
    "twitter": {
      "count": 50
    }
  }
}

Omit

options

if user chose defaults.

Submit (foreground):
```
POST /v1/content/extract
```
→ extract
```
taskId
```
Tell the user extraction is in progress

Poll (background): Run the following exact bash command with

run_in_background: true

and

timeout: 300000

. Note: status field is

.data.status

(not

processStatus

), interval is 5s, values are

processing

completed

failed

bash

TASK_ID="<id-from-step-3>"
for i in $(seq 1 60); do
  RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
    -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
  STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
  case "$STATUS" in
    completed) echo "$RESULT"; exit 0 ;;
    failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
    *) sleep 5 ;;
  esac
done
echo "TIMEOUT" >&2; exit 2

When notified, download and present result:

autoDownload

true

Write
```
{taskId}-extracted.md
```
to the current directory — full extracted content in markdown
Write
```
{taskId}-extracted.json
```
to the current directory — full raw API response data

bash

echo "$CONTENT_MD" > "${TASK_ID}-extracted.md"
echo "$RESULT" > "${TASK_ID}-extracted.json"

Present:

内容提取完成！

来源：{url}
标题：{metadata.title}
长度：~{character count} 字符
消耗积分：{credits}

已保存到当前目录：
  {taskId}-extracted.md
  {taskId}-extracted.json

Show a preview of the extracted content (first ~500 chars)
Offer to use content in another skill (e.g.
```
/podcast
```
,
```
/tts
```
)

Estimated time: 10-30 seconds depending on content size and platform.

验证URL：必须为HTTP(S)链接。如有需要，进行标准化处理（参考
```
references/supported-platforms.md
```
）

构建请求体：

json

{
  "source": {
    "type": "url",
    "uri": "{url}"
  },
  "options": {
    "summarize": true/false,
    "maxLength": 5000,
    "twitter": {
      "count": 50
    }
  }
}

如果用户选择默认配置，可省略

options

字段。

提交请求（前台）：调用
```
POST /v1/content/extract
```
接口 → 提取返回的
```
taskId
```
告知用户提取操作正在进行中

轮询状态（后台）：执行以下精确的bash命令，设置

run_in_background: true

和

timeout: 300000

。注意：状态字段为

.data.status

（而非

processStatus

），轮询间隔为5秒，状态值包括

processing

completed

failed

：

bash

TASK_ID="<id-from-step-3>"
for i in $(seq 1 60); do
  RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
    -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
  STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
  case "$STATUS" in
    completed) echo "$RESULT"; exit 0 ;;
    failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
    *) sleep 5 ;;
  esac
done
echo "TIMEOUT" >&2; exit 2

收到完成通知后，下载并展示结果：

如果

autoDownload

设置为

true

：

将完整提取的内容以Markdown格式写入当前目录的
```
{taskId}-extracted.md
```
文件
将API返回的原始完整数据写入当前目录的
```
{taskId}-extracted.json
```
文件

bash

echo "$CONTENT_MD" > "${TASK_ID}-extracted.md"
echo "$RESULT" > "${TASK_ID}-extracted.json"

展示内容：

内容提取完成！

来源：{url}
标题：{metadata.title}
长度：~{character count} 字符
消耗积分：{credits}

已保存至当前目录：
  {taskId}-extracted.md
  {taskId}-extracted.json

展示提取内容的预览（前约500个字符）
提供将内容用于其他技能的选项（例如
```
/podcast
```
、
```
/tts
```
）

预计耗时：10-30秒，具体取决于内容大小和平台类型。

API Reference

API参考

Content extract:
```
shared/api-content-extract.md
```
Supported platforms:
```
references/supported-platforms.md
```
Polling:
```
shared/common-patterns.md
```
§ Async Polling
Error handling:
```
shared/common-patterns.md
```
§ Error Handling
Config pattern:
```
shared/config-pattern.md
```

内容提取：
```
shared/api-content-extract.md
```
支持的平台：
```
references/supported-platforms.md
```
轮询机制：
```
shared/common-patterns.md
```
§ 异步轮询
错误处理：
```
shared/common-patterns.md
```
§ 错误处理
配置模式：
```
shared/config-pattern.md
```

Example

示例

User: "Parse this article: https://en.wikipedia.org/wiki/Topology"

Agent workflow:

URL:
```
https://en.wikipedia.org/wiki/Topology
```
Options: defaults (omit options)
Submit extraction

bash

curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://en.wikipedia.org/wiki/Topology"
    }
  }'

Poll until complete:

bash

curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY"

Present extracted content preview and offer next actions.

User: "Extract recent tweets from @elonmusk, get 50 tweets"

Agent workflow:

URL:
```
https://x.com/elonmusk
```
Options:
```
{"twitter": {"count": 50}}
```
Submit extraction

bash

curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://x.com/elonmusk"
    },
    "options": {
      "twitter": {
        "count": 50
      }
    }
  }'

Poll until complete, present results.

用户："Parse this article: https://en.wikipedia.org/wiki/Topology"

Agent工作流：

URL：
```
https://en.wikipedia.org/wiki/Topology
```
选项：默认配置（省略options字段）
提交提取请求

bash

curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://en.wikipedia.org/wiki/Topology"
    }
  }'

轮询直到完成：

bash

curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY"

展示提取内容的预览并提供后续操作选项。

用户："Extract recent tweets from @elonmusk, get 50 tweets"

Agent工作流：

URL：
```
https://x.com/elonmusk
```
选项：
```
{"twitter": {"count": 50}}
```
提交提取请求

bash

curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://x.com/elonmusk"
    },
    "options": {
      "twitter": {
        "count": 50
      }
    }
  }'

轮询直到完成，展示结果。