defuddle

Original：🇺🇸 English

Translated

Extract clean article content from web pages or local HTML files. Removes clutter (ads, sidebars, nav) and returns readable content with metadata.

2installs

Sourcejoeseesun/defuddle-skill

Added on2026-03-07

NPX Install

npx skill4agent add joeseesun/defuddle-skill defuddle

SKILL.md Content

View Translation Comparison →

Defuddle - Web Content Extraction

Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.

Prerequisites

Before first use, check if

defuddle

is installed:

bash

command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom

Default Workflow

When user provides a URL, follow this workflow:

Step 1: Extract content as Markdown + JSON metadata

Always use both

-m

and

-j

flags to get markdown content with full metadata:

bash

defuddle parse "<url>" -m -j

Step 2: Present a summary to the user

Show the user:

Title: from JSON
```
title
```
field
Author: from JSON
```
author
```
field
Source: domain
Word count: from JSON
```
wordCount
```
field
A brief preview (first 2-3 sentences)

Step 3: Ask where to save

If this is the first time using defuddle in this conversation, ask the user:

"Save to which directory? (e.g.
~/Documents
,
~/Desktop
, or a custom path)"

Remember the user's chosen directory for subsequent uses in the same conversation.

Step 4: Save as Markdown file

Write the file with frontmatter + full content:

markdown

---
title: {title}
author: {author}
source: {url}
date: {published or "Unknown"}
clipped: {today's date YYYY-MM-DD}
wordCount: {wordCount}
---

# {title}

{markdown content}

File naming: Use the article title as filename, sanitized for filesystem:

Replace special characters with spaces
Trim whitespace
Example:
```
The Shape of the Essay Field.md
```

Step 5: Confirm to user

Tell the user the file path where it was saved.

CLI Reference

bash

defuddle parse <source> [options]

Arguments:

```
<source>
```
— URL (
```
https://...
```
) or local HTML file path

Options:

Flag	Description
`-m, --markdown`	Convert content to Markdown
`-j, --json`	Output as JSON with full metadata
`-o, --output <file>`	Write to file instead of stdout
`-p, --property <name>`	Extract single property (title, description, domain, author, published, wordCount, content)
`--debug`	Verbose logging

JSON Response Fields

When using

-j

, the response includes:

```
title
```
— Article title
```
author
```
— Author name
```
published
```
— Publication date
```
description
```
— Meta description
```
content
```
— Extracted Markdown (when
```
-m
```
used)
```
domain
```
— Source domain
```
favicon
```
— Favicon URL
```
image
```
— Featured image URL
```
site
```
— Site name
```
wordCount
```
— Word count
```
parseTime
```
— Processing time in ms

Notes

Requires Node.js and npm
```
jsdom
```
is required as a peer dependency
Works best with article-style pages (blogs, news, documentation)
Not designed for SPAs or JavaScript-heavy pages (e.g. WeChat articles need browser rendering)

defuddle

NPX Install

Tags

SKILL.md Content

Defuddle - Web Content Extraction

Prerequisites

Default Workflow

Step 1: Extract content as Markdown + JSON metadata

Step 2: Present a summary to the user

Step 3: Ask where to save

Step 4: Save as Markdown file

Step 5: Confirm to user

CLI Reference

JSON Response Fields

Notes