content-parser

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

适用场景

  • User provides a URL and wants to extract/read its content
  • Another skill needs to parse source material from a URL before generation
  • User says "parse this URL", "extract content from this link"
  • User says "解析链接", "提取内容"
  • 用户提供URL并希望提取/查看其内容
  • 其他技能在生成内容前需要从URL解析源材料
  • 用户发送“parse this URL”“extract content from this link”指令
  • 用户发送“解析链接”“提取内容”指令

When NOT to Use

不适用场景

  • User already has text content and doesn't need URL parsing
  • User wants to generate audio/video content (not content extraction)
  • User wants to read a local file (use standard file reading tools)
  • 用户已拥有文本内容,无需解析URL
  • 用户希望生成音频/视频内容(非内容提取需求)
  • 用户希望读取本地文件(使用标准文件读取工具)

Purpose

功能目标

Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
从支持的平台的URL中提取并标准化内容。返回包含内容主体、元数据和引用的结构化数据。可作为内容生成技能的预处理步骤,也可独立用于内容提取。

Hard Constraints

硬性约束

  • No shell scripts. Construct curl commands from the API reference files listed in Resources
  • Always read
    shared/authentication.md
    for API key and headers
  • Follow
    shared/common-patterns.md
    for polling, errors, and interaction patterns
  • URL must be a valid HTTP(S) URL
  • Always read config following
    shared/config-pattern.md
    before any interaction
  • Never save files to
    ~/Downloads/
    or
    .listenhub/
    — save to the current working directory
<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After collecting URL and options, confirm with the user before calling the extraction API. </HARD-GATE>
  • 禁止使用shell脚本。需根据资源中列出的API参考文件构建curl命令
  • 必须阅读
    shared/authentication.md
    获取API密钥和请求头信息
  • 遵循
    shared/common-patterns.md
    中的轮询、错误处理和交互模式
  • URL必须是有效的HTTP(S)链接
  • 在进行任何交互前,必须遵循
    shared/config-pattern.md
    读取配置
  • 不得将文件保存到
    ~/Downloads/
    .listenhub/
    目录——请保存到当前工作目录
<HARD-GATE> 在每个多选步骤中必须使用AskUserQuestion工具——不得将选项以纯文本形式打印。一次只提一个问题。等待用户回答后再进行下一步。收集完URL和选项后,在调用提取API前需与用户确认。 </HARD-GATE>

Step -1: API Key Check

步骤-1:API密钥检查

Follow
shared/config-pattern.md
§ API Key Check. If the key is missing, stop immediately.
遵循
shared/config-pattern.md
中的“API密钥检查”章节。如果密钥缺失,立即停止操作。

Step 0: Config Setup

步骤0:配置设置

Follow
shared/config-pattern.md
Step 0.
If file doesn't exist — ask location, then create immediately:
bash
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
遵循
shared/config-pattern.md
的步骤0。
如果配置文件不存在 —— 询问用户存储位置,然后立即创建:
bash
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"

(or $HOME/.listenhub/content-parser/config.json for global)

(或全局配置路径 $HOME/.listenhub/content-parser/config.json)

Then run **Setup Flow** below.

**If file exists** — read config, display summary, and confirm:
当前配置 (content-parser): 自动下载:{是 / 否}
Ask: "使用已保存的配置?" → **确认,直接继续** / **重新配置**
然后执行下方的**配置流程**。

**如果配置文件已存在** —— 读取配置,显示摘要并确认:
当前配置 (content-parser): 自动下载:{是 / 否}
询问:“使用已保存的配置?” → **确认,直接继续** / **重新配置**

Setup Flow (first run or reconfigure)

配置流程(首次运行或重新配置)

  1. autoDownload: "自动保存提取的内容到当前目录?"
    • "是(推荐)" →
      autoDownload: true
    • "否" →
      autoDownload: false
Save immediately:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
  1. autoDownload:“自动将提取的内容保存到当前目录?”
    • “是(推荐)” →
      autoDownload: true
    • “否” →
      autoDownload: false
立即保存配置:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")

Interaction Flow

交互流程

Step 1: URL Input

步骤1:URL输入

Free text input. Ask the user:
What URL would you like to extract content from?
自由文本输入。询问用户:
您希望从哪个URL提取内容?

Step 2: Options (optional)

步骤2:选项配置(可选)

Ask if the user wants to configure extraction options:
Question: "Do you want to configure extraction options?"
Options:
  - "No, use defaults" — Extract with default settings
  - "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count
If "Yes", ask follow-up questions:
  • Summarize: "Generate a summary of the content?" (Yes/No)
  • Max Length: "Set maximum content length?" (Free text, e.g., "5000")
  • Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
询问用户是否需要配置提取选项:
问题:"是否需要配置提取选项?"
选项:
  - "否,使用默认设置" —— 使用默认配置提取内容
  - "是,配置选项" —— 设置摘要、最大长度或Twitter推文数量
如果用户选择“是”,则继续询问以下问题:
  • 摘要生成:“是否需要生成内容摘要?”(是/否)
  • 最大长度:“是否设置内容最大长度?”(自由文本输入,例如“5000”)
  • Twitter推文数量(仅当URL为Twitter/X个人主页时):“需要获取多少条推文?”(1-100,默认20)

Step 3: Confirm & Extract

步骤3:确认并提取

Summarize:
Ready to extract content:

  URL: {url}
  Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default

  Proceed?
Wait for explicit confirmation before calling the API.
汇总信息:
准备开始提取内容:

  URL:{url}
  选项:{summarize: true, maxLength: 5000, twitter.count: 50} / 默认配置

  是否继续?
等待用户明确确认后再调用API。

Workflow

工作流

  1. Validate URL: Must be HTTP(S). Normalize if needed (see
    references/supported-platforms.md
    )
  2. Build request body:
    json
    {
      "source": {
        "type": "url",
        "uri": "{url}"
      },
      "options": {
        "summarize": true/false,
        "maxLength": 5000,
        "twitter": {
          "count": 50
        }
      }
    }
    Omit
    options
    if user chose defaults.
  3. Submit (foreground):
    POST /v1/content/extract
    → extract
    taskId
  4. Tell the user extraction is in progress
  5. Poll (background): Run the following exact bash command with
    run_in_background: true
    and
    timeout: 300000
    . Note: status field is
    .data.status
    (not
    processStatus
    ), interval is 5s, values are
    processing
    /
    completed
    /
    failed
    :
    bash
    TASK_ID="<id-from-step-3>"
    for i in $(seq 1 60); do
      RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
        -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
      STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
      case "$STATUS" in
        completed) echo "$RESULT"; exit 0 ;;
        failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
        *) sleep 5 ;;
      esac
    done
    echo "TIMEOUT" >&2; exit 2
  6. When notified, download and present result:
    If
    autoDownload
    is
    true
    :
    • Write
      {taskId}-extracted.md
      to the current directory — full extracted content in markdown
    • Write
      {taskId}-extracted.json
      to the current directory — full raw API response data
    bash
    echo "$CONTENT_MD" > "${TASK_ID}-extracted.md"
    echo "$RESULT" > "${TASK_ID}-extracted.json"
    Present:
    内容提取完成!
    
    来源:{url}
    标题:{metadata.title}
    长度:~{character count} 字符
    消耗积分:{credits}
    
    已保存到当前目录:
      {taskId}-extracted.md
      {taskId}-extracted.json
  7. Show a preview of the extracted content (first ~500 chars)
  8. Offer to use content in another skill (e.g.
    /podcast
    ,
    /tts
    )
Estimated time: 10-30 seconds depending on content size and platform.
  1. 验证URL:必须为HTTP(S)链接。如有需要,进行标准化处理(参考
    references/supported-platforms.md
  2. 构建请求体
    json
    {
      "source": {
        "type": "url",
        "uri": "{url}"
      },
      "options": {
        "summarize": true/false,
        "maxLength": 5000,
        "twitter": {
          "count": 50
        }
      }
    }
    如果用户选择默认配置,可省略
    options
    字段。
  3. 提交请求(前台):调用
    POST /v1/content/extract
    接口 → 提取返回的
    taskId
  4. 告知用户提取操作正在进行中
  5. 轮询状态(后台):执行以下精确的bash命令,设置
    run_in_background: true
    timeout: 300000
    。注意:状态字段为
    .data.status
    (而非
    processStatus
    ),轮询间隔为5秒,状态值包括
    processing
    /
    completed
    /
    failed
    bash
    TASK_ID="<id-from-step-3>"
    for i in $(seq 1 60); do
      RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
        -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
      STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
      case "$STATUS" in
        completed) echo "$RESULT"; exit 0 ;;
        failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
        *) sleep 5 ;;
      esac
    done
    echo "TIMEOUT" >&2; exit 2
  6. 收到完成通知后,下载并展示结果
    如果
    autoDownload
    设置为
    true
    • 将完整提取的内容以Markdown格式写入当前目录的
      {taskId}-extracted.md
      文件
    • 将API返回的原始完整数据写入当前目录的
      {taskId}-extracted.json
      文件
    bash
    echo "$CONTENT_MD" > "${TASK_ID}-extracted.md"
    echo "$RESULT" > "${TASK_ID}-extracted.json"
    展示内容:
    内容提取完成!
    
    来源:{url}
    标题:{metadata.title}
    长度:~{character count} 字符
    消耗积分:{credits}
    
    已保存至当前目录:
      {taskId}-extracted.md
      {taskId}-extracted.json
  7. 展示提取内容的预览(前约500个字符)
  8. 提供将内容用于其他技能的选项(例如
    /podcast
    /tts
预计耗时:10-30秒,具体取决于内容大小和平台类型。

API Reference

API参考

  • Content extract:
    shared/api-content-extract.md
  • Supported platforms:
    references/supported-platforms.md
  • Polling:
    shared/common-patterns.md
    § Async Polling
  • Error handling:
    shared/common-patterns.md
    § Error Handling
  • Config pattern:
    shared/config-pattern.md
  • 内容提取:
    shared/api-content-extract.md
  • 支持的平台:
    references/supported-platforms.md
  • 轮询机制:
    shared/common-patterns.md
    § 异步轮询
  • 错误处理:
    shared/common-patterns.md
    § 错误处理
  • 配置模式:
    shared/config-pattern.md

Example

示例

User: "Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent workflow:
  1. URL:
    https://en.wikipedia.org/wiki/Topology
  2. Options: defaults (omit options)
  3. Submit extraction
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://en.wikipedia.org/wiki/Topology"
    }
  }'
  1. Poll until complete:
bash
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY"
  1. Present extracted content preview and offer next actions.

User: "Extract recent tweets from @elonmusk, get 50 tweets"
Agent workflow:
  1. URL:
    https://x.com/elonmusk
  2. Options:
    {"twitter": {"count": 50}}
  3. Submit extraction
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://x.com/elonmusk"
    },
    "options": {
      "twitter": {
        "count": 50
      }
    }
  }'
  1. Poll until complete, present results.
用户:"Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent工作流
  1. URL:
    https://en.wikipedia.org/wiki/Topology
  2. 选项:默认配置(省略options字段)
  3. 提交提取请求
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://en.wikipedia.org/wiki/Topology"
    }
  }'
  1. 轮询直到完成:
bash
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY"
  1. 展示提取内容的预览并提供后续操作选项。

用户:"Extract recent tweets from @elonmusk, get 50 tweets"
Agent工作流
  1. URL:
    https://x.com/elonmusk
  2. 选项:
    {"twitter": {"count": 50}}
  3. 提交提取请求
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://x.com/elonmusk"
    },
    "options": {
      "twitter": {
        "count": 50
      }
    }
  }'
  1. 轮询直到完成,展示结果。