content-parser
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to Use
适用场景
- User provides a URL and wants to extract/read its content
- Another skill needs to parse source material from a URL before generation
- User says "parse this URL", "extract content from this link"
- User says "解析链接", "提取内容"
- 用户提供URL并希望提取/查看其内容
- 其他技能在生成内容前需要从URL解析源材料
- 用户发送“parse this URL”“extract content from this link”指令
- 用户发送“解析链接”“提取内容”指令
When NOT to Use
不适用场景
- User already has text content and doesn't need URL parsing
- User wants to generate audio/video content (not content extraction)
- User wants to read a local file (use standard file reading tools)
- 用户已拥有文本内容,无需解析URL
- 用户希望生成音频/视频内容(非内容提取需求)
- 用户希望读取本地文件(使用标准文件读取工具)
Purpose
功能目标
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
从支持的平台的URL中提取并标准化内容。返回包含内容主体、元数据和引用的结构化数据。可作为内容生成技能的预处理步骤,也可独立用于内容提取。
Hard Constraints
硬性约束
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read for API key and headers
shared/authentication.md - Follow for polling, errors, and interaction patterns
shared/common-patterns.md - URL must be a valid HTTP(S) URL
- Always read config following before any interaction
shared/config-pattern.md - Never save files to or
~/Downloads/— save to the current working directory.listenhub/
- 禁止使用shell脚本。需根据资源中列出的API参考文件构建curl命令
- 必须阅读获取API密钥和请求头信息
shared/authentication.md - 遵循中的轮询、错误处理和交互模式
shared/common-patterns.md - URL必须是有效的HTTP(S)链接
- 在进行任何交互前,必须遵循读取配置
shared/config-pattern.md - 不得将文件保存到或
~/Downloads/目录——请保存到当前工作目录.listenhub/
Step -1: API Key Check
步骤-1:API密钥检查
Follow § API Key Check. If the key is missing, stop immediately.
shared/config-pattern.md遵循中的“API密钥检查”章节。如果密钥缺失,立即停止操作。
shared/config-pattern.mdStep 0: Config Setup
步骤0:配置设置
Follow Step 0.
shared/config-pattern.mdIf file doesn't exist — ask location, then create immediately:
bash
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"遵循的步骤0。
shared/config-pattern.md如果配置文件不存在 —— 询问用户存储位置,然后立即创建:
bash
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"(or $HOME/.listenhub/content-parser/config.json for global)
(或全局配置路径 $HOME/.listenhub/content-parser/config.json)
Then run **Setup Flow** below.
**If file exists** — read config, display summary, and confirm:当前配置 (content-parser):
自动下载:{是 / 否}
Ask: "使用已保存的配置?" → **确认,直接继续** / **重新配置**然后执行下方的**配置流程**。
**如果配置文件已存在** —— 读取配置,显示摘要并确认:当前配置 (content-parser):
自动下载:{是 / 否}
询问:“使用已保存的配置?” → **确认,直接继续** / **重新配置**Setup Flow (first run or reconfigure)
配置流程(首次运行或重新配置)
- autoDownload: "自动保存提取的内容到当前目录?"
- "是(推荐)" →
autoDownload: true - "否" →
autoDownload: false
- "是(推荐)" →
Save immediately:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")- autoDownload:“自动将提取的内容保存到当前目录?”
- “是(推荐)” →
autoDownload: true - “否” →
autoDownload: false
- “是(推荐)” →
立即保存配置:
bash
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")Interaction Flow
交互流程
Step 1: URL Input
步骤1:URL输入
Free text input. Ask the user:
What URL would you like to extract content from?
自由文本输入。询问用户:
您希望从哪个URL提取内容?
Step 2: Options (optional)
步骤2:选项配置(可选)
Ask if the user wants to configure extraction options:
Question: "Do you want to configure extraction options?"
Options:
- "No, use defaults" — Extract with default settings
- "Yes, configure options" — Set summarize, maxLength, or Twitter tweet countIf "Yes", ask follow-up questions:
- Summarize: "Generate a summary of the content?" (Yes/No)
- Max Length: "Set maximum content length?" (Free text, e.g., "5000")
- Twitter count (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
询问用户是否需要配置提取选项:
问题:"是否需要配置提取选项?"
选项:
- "否,使用默认设置" —— 使用默认配置提取内容
- "是,配置选项" —— 设置摘要、最大长度或Twitter推文数量如果用户选择“是”,则继续询问以下问题:
- 摘要生成:“是否需要生成内容摘要?”(是/否)
- 最大长度:“是否设置内容最大长度?”(自由文本输入,例如“5000”)
- Twitter推文数量(仅当URL为Twitter/X个人主页时):“需要获取多少条推文?”(1-100,默认20)
Step 3: Confirm & Extract
步骤3:确认并提取
Summarize:
Ready to extract content:
URL: {url}
Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default
Proceed?Wait for explicit confirmation before calling the API.
汇总信息:
准备开始提取内容:
URL:{url}
选项:{summarize: true, maxLength: 5000, twitter.count: 50} / 默认配置
是否继续?等待用户明确确认后再调用API。
Workflow
工作流
-
Validate URL: Must be HTTP(S). Normalize if needed (see)
references/supported-platforms.md -
Build request body:json
{ "source": { "type": "url", "uri": "{url}" }, "options": { "summarize": true/false, "maxLength": 5000, "twitter": { "count": 50 } } }Omitif user chose defaults.options -
Submit (foreground):→ extract
POST /v1/content/extracttaskId -
Tell the user extraction is in progress
-
Poll (background): Run the following exact bash command withand
run_in_background: true. Note: status field istimeout: 300000(not.data.status), interval is 5s, values areprocessStatus/processing/completed:failedbashTASK_ID="<id-from-step-3>" for i in $(seq 1 60); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"') case "$STATUS" in completed) echo "$RESULT"; exit 0 ;; failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 5 ;; esac done echo "TIMEOUT" >&2; exit 2 -
When notified, download and present result:Ifis
autoDownload:true- Write to the current directory — full extracted content in markdown
{taskId}-extracted.md - Write to the current directory — full raw API response data
{taskId}-extracted.json
bashecho "$CONTENT_MD" > "${TASK_ID}-extracted.md" echo "$RESULT" > "${TASK_ID}-extracted.json"Present:内容提取完成! 来源:{url} 标题:{metadata.title} 长度:~{character count} 字符 消耗积分:{credits} 已保存到当前目录: {taskId}-extracted.md {taskId}-extracted.json - Write
-
Show a preview of the extracted content (first ~500 chars)
-
Offer to use content in another skill (e.g.,
/podcast)/tts
Estimated time: 10-30 seconds depending on content size and platform.
-
验证URL:必须为HTTP(S)链接。如有需要,进行标准化处理(参考)
references/supported-platforms.md -
构建请求体:json
{ "source": { "type": "url", "uri": "{url}" }, "options": { "summarize": true/false, "maxLength": 5000, "twitter": { "count": 50 } } }如果用户选择默认配置,可省略字段。options -
提交请求(前台):调用接口 → 提取返回的
POST /v1/content/extracttaskId -
告知用户提取操作正在进行中
-
轮询状态(后台):执行以下精确的bash命令,设置和
run_in_background: true。注意:状态字段为timeout: 300000(而非.data.status),轮询间隔为5秒,状态值包括processStatus/processing/completed:failedbashTASK_ID="<id-from-step-3>" for i in $(seq 1 60); do RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \ -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null) STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"') case "$STATUS" in completed) echo "$RESULT"; exit 0 ;; failed) echo "FAILED: $RESULT" >&2; exit 1 ;; *) sleep 5 ;; esac done echo "TIMEOUT" >&2; exit 2 -
收到完成通知后,下载并展示结果:如果设置为
autoDownload:true- 将完整提取的内容以Markdown格式写入当前目录的文件
{taskId}-extracted.md - 将API返回的原始完整数据写入当前目录的文件
{taskId}-extracted.json
bashecho "$CONTENT_MD" > "${TASK_ID}-extracted.md" echo "$RESULT" > "${TASK_ID}-extracted.json"展示内容:内容提取完成! 来源:{url} 标题:{metadata.title} 长度:~{character count} 字符 消耗积分:{credits} 已保存至当前目录: {taskId}-extracted.md {taskId}-extracted.json - 将完整提取的内容以Markdown格式写入当前目录的
-
展示提取内容的预览(前约500个字符)
-
提供将内容用于其他技能的选项(例如、
/podcast)/tts
预计耗时:10-30秒,具体取决于内容大小和平台类型。
API Reference
API参考
- Content extract:
shared/api-content-extract.md - Supported platforms:
references/supported-platforms.md - Polling: § Async Polling
shared/common-patterns.md - Error handling: § Error Handling
shared/common-patterns.md - Config pattern:
shared/config-pattern.md
- 内容提取:
shared/api-content-extract.md - 支持的平台:
references/supported-platforms.md - 轮询机制:§ 异步轮询
shared/common-patterns.md - 错误处理:§ 错误处理
shared/common-patterns.md - 配置模式:
shared/config-pattern.md
Example
示例
User: "Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent workflow:
- URL:
https://en.wikipedia.org/wiki/Topology - Options: defaults (omit options)
- Submit extraction
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'- Poll until complete:
bash
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY"- Present extracted content preview and offer next actions.
User: "Extract recent tweets from @elonmusk, get 50 tweets"
Agent workflow:
- URL:
https://x.com/elonmusk - Options:
{"twitter": {"count": 50}} - Submit extraction
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'- Poll until complete, present results.
用户:"Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent工作流:
- URL:
https://en.wikipedia.org/wiki/Topology - 选项:默认配置(省略options字段)
- 提交提取请求
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'- 轮询直到完成:
bash
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY"- 展示提取内容的预览并提供后续操作选项。
用户:"Extract recent tweets from @elonmusk, get 50 tweets"
Agent工作流:
- URL:
https://x.com/elonmusk - 选项:
{"twitter": {"count": 50}} - 提交提取请求
bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'- 轮询直到完成,展示结果。