blog-taxonomy
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBlog Taxonomy
博客分类体系
Manage tags, categories, and topic clusters across CMS platforms.
跨CMS平台管理标签、分类及主题集群。
Commands
命令
| Command | Purpose |
|---|---|
| Extract candidate tags and categories from content |
| Push taxonomy to CMS via authenticated API |
| Check for thin tags, orphan tags, taxonomy bloat |
| 命令 | 用途 |
|---|---|
| 从内容中提取候选标签与分类 |
| 通过已认证API将分类体系推送至CMS |
| 检查单薄标签、孤立标签、分类体系冗余问题 |
Tag Suggestion Workflow
标签建议工作流程
Step 1: Parse Content Structure
步骤1:解析内容结构
Read the target file and extract:
- All H2 and H3 headings (primary topic signals)
- Bold and italic phrases (emphasis signals)
- Existing frontmatter tags/categories if present
读取目标文件并提取:
- 所有H2和H3标题(核心主题信号)
- 加粗和斜体短语(重点强调信号)
- 若存在前置元数据中的现有标签/分类
Step 2: Frequency Analysis
步骤2:频率分析
Scan the body text for high-frequency phrases:
- 1-word terms: minimum 4 occurrences (excluding stop words)
- 2-word phrases: minimum 3 occurrences
- 3-word phrases: minimum 2 occurrences
Exclude common non-tag words: articles, prepositions, conjunctions, pronouns.
扫描正文中的高频短语:
- 单字词:至少出现4次(排除停用词)
- 双字词组:至少出现3次
- 三字词组:至少出现2次
排除常见非标签词汇:冠词、介词、连词、代词。
Step 3: Semantic Grouping
步骤3:语义分组
Group related candidates into clusters:
- Merge singular/plural variants (keep the more common form)
- Merge hyphenated and non-hyphenated forms
- Group synonyms under the highest-frequency term
将相关候选词聚类:
- 合并单复数变体(保留更常用的形式)
- 合并带连字符与不带连字符的形式
- 将同义词归到出现频率最高的词汇下
Step 4: Deduplicate and Rank
步骤4:去重与排序
- Fuzzy match on slugified names (Levenshtein distance <= 2)
- Score each candidate:
(frequency * 2) + (heading_presence * 5) + (emphasis * 1) - Return top 5-10 ranked suggestions
- 对slug化后的名称进行模糊匹配(编辑距离≤2)
- 为每个候选词打分:
(出现频率 * 2) + (标题出现次数 * 5) + (强调标记 * 1) - 返回排名前5-10的建议
Output Format
输出格式
undefinedundefinedTag Suggestions: [Post Title]
标签建议:[文章标题]
| Rank | Tag | Score | Source |
|---|---|---|---|
| 1 | content-marketing | 18 | H2 + 6 mentions |
| 2 | seo-strategy | 14 | H3 + 4 mentions |
| 3 | keyword-research | 11 | 5 mentions + bold |
| 排名 | 标签 | 分数 | 来源 |
|---|---|---|---|
| 1 | content-marketing | 18 | H2 + 6次提及 |
| 2 | seo-strategy | 14 | H3 + 4次提及 |
| 3 | keyword-research | 11 | 5次提及 + 加粗 |
Suggested Categories
建议分类
- Primary: [best-fit category]
- Secondary: [optional second category]
undefined- 主分类:[最适配分类]
- 副分类:[可选二级分类]
undefinedCMS Adapters
CMS适配器
Adapter Overview
适配器概览
| CMS | API Type | Auth Method | Tags Model |
|---|---|---|---|
| WordPress | REST | Application Passwords (base64) | First-class entities with IDs |
| Shopify | GraphQL (Admin API) | Admin API access token | String array on Article |
| Ghost | REST (Admin API) | API key with JWT signing | First-class entities |
| Strapi | REST or GraphQL | API token (Bearer) | User-defined content type |
| Sanity | GROQ / Mutations | Project token (Bearer) | Document type |
| CMS | API类型 | 认证方式 | 标签模型 |
|---|---|---|---|
| WordPress | REST | 应用密码(base64编码) | 带ID的一等实体 |
| Shopify | GraphQL(Admin API) | Admin API访问令牌 | Article对象上的字符串数组 |
| Ghost | REST(Admin API) | 带JWT签名的API密钥 | 一等实体 |
| Strapi | REST或GraphQL | API令牌(Bearer) | 用户自定义内容类型 |
| Sanity | GROQ / 突变API | 项目令牌(Bearer) | 文档类型 |
WordPress Adapter
WordPress适配器
List tags:
GET {CMS_URL}/wp-json/wp/v2/tags?per_page=100&search={keyword}
Authorization: Basic {base64(username:app_password)}Create tag:
POST {CMS_URL}/wp-json/wp/v2/tags
Body: {"name": "Tag Name", "slug": "tag-name", "description": "Optional"}List categories (hierarchical, supports parent field):
GET {CMS_URL}/wp-json/wp/v2/categories?per_page=100Create category:
POST {CMS_URL}/wp-json/wp/v2/categories
Body: {"name": "Category", "slug": "category", "parent": 0}Assign tags to post:
POST {CMS_URL}/wp-json/wp/v2/posts/{id}
Body: {"tags": [1, 2, 3], "categories": [4]}Pagination: follow header for full listing.
X-WP-TotalPages列出标签:
GET {CMS_URL}/wp-json/wp/v2/tags?per_page=100&search={keyword}
Authorization: Basic {base64(username:app_password)}创建标签:
POST {CMS_URL}/wp-json/wp/v2/tags
Body: {"name": "Tag Name", "slug": "tag-name", "description": "Optional"}列出分类(层级结构,支持父级字段):
GET {CMS_URL}/wp-json/wp/v2/categories?per_page=100创建分类:
POST {CMS_URL}/wp-json/wp/v2/categories
Body: {"name": "Category", "slug": "category", "parent": 0}为文章分配标签:
POST {CMS_URL}/wp-json/wp/v2/posts/{id}
Body: {"tags": [1, 2, 3], "categories": [4]}分页:遵循标头获取完整列表。
X-WP-TotalPagesShopify Adapter
Shopify适配器
Tags on Shopify are string arrays on the Article object, not first-class entities.
Update article tags (GraphQL Admin API):
graphql
mutation {
articleUpdate(id: "gid://shopify/Article/123", article: {
tags: ["tag-one", "tag-two", "tag-three"]
}) {
article { id tags }
userErrors { field message }
}
}List all tags in use (GraphQL):
graphql
{
articles(first: 250) {
edges {
node { id title tags }
}
}
}Auth header:
X-Shopify-Access-Token: {token}Note: REST API marked legacy Oct 2024. GraphQL required for new apps since Apr 2025.
Shopify的标签是Article对象上的字符串数组,并非一等实体。
更新文章标签(GraphQL Admin API):
graphql
mutation {
articleUpdate(id: "gid://shopify/Article/123", article: {
tags: ["tag-one", "tag-two", "tag-three"]
}) {
article { id tags }
userErrors { field message }
}
}列出所有已使用的标签(GraphQL):
graphql
{
articles(first: 250) {
edges {
node { id title tags }
}
}
}认证标头:
X-Shopify-Access-Token: {token}注意:REST API于2024年10月标记为遗留版本。自2025年4月起,新应用需使用GraphQL。
Ghost Adapter
Ghost适配器
List tags:
GET {CMS_URL}/ghost/api/admin/tags/?limit=all
Authorization: Ghost {jwt_token}Create tag:
POST {CMS_URL}/ghost/api/admin/tags/
Body: {"tags": [{"name": "Tag Name", "slug": "tag-name"}]}JWT generation: sign with admin API key (id:secret format), iat = now, exp = 5 min,
audience = .
/admin/列出标签:
GET {CMS_URL}/ghost/api/admin/tags/?limit=all
Authorization: Ghost {jwt_token}创建标签:
POST {CMS_URL}/ghost/api/admin/tags/
Body: {"tags": [{"name": "Tag Name", "slug": "tag-name"}]}JWT生成:使用管理员API密钥(id:secret格式)签名,iat=当前时间,exp=5分钟,audience=。
/admin/Strapi Adapter
Strapi适配器
Endpoint auto-generated from content types. Typical setup:
GET {CMS_URL}/api/tags?pagination[pageSize]=100
POST {CMS_URL}/api/tags
Body: {"data": {"name": "Tag Name", "slug": "tag-name"}}
Authorization: Bearer {api_token}Strapi v4+ uses the wrapper. Check your content type schema for field names.
data端点由内容类型自动生成。典型配置:
GET {CMS_URL}/api/tags?pagination[pageSize]=100
POST {CMS_URL}/api/tags
Body: {"data": {"name": "Tag Name", "slug": "tag-name"}}
Authorization: Bearer {api_token}Strapi v4+使用包装器。请查看您的内容类型架构以确认字段名称。
dataSanity Adapter
Sanity适配器
Query tags (GROQ):
*[_type == "tag"] { _id, name, slug }Create tag (Mutations API):
POST https://{project_id}.api.sanity.io/v2024-01-01/data/mutate/{dataset}
Body: {"mutations": [{"create": {"_type": "tag", "name": "Tag", "slug": {"current": "tag"}}}]}
Authorization: Bearer {token}查询标签(GROQ):
*[_type == "tag"] { _id, name, slug }创建标签(突变API):
POST https://{project_id}.api.sanity.io/v2024-01-01/data/mutate/{dataset}
Body: {"mutations": [{"create": {"_type": "tag", "name": "Tag", "slug": {"current": "tag"}}}]}
Authorization: Bearer {token}Taxonomy Audit Workflow
分类体系审计工作流程
Step 1: Inventory
步骤1:清单统计
Scan all posts in the target directory (or fetch from CMS). Build a map:
- tag_name -> [list of post files/IDs using this tag]
- category_name -> [list of post files/IDs]
扫描目标目录中的所有文章(或从CMS获取)。构建映射:
- tag_name -> [使用该标签的文章文件/ID列表]
- category_name -> [文章文件/ID列表]
Step 2: Health Checks
步骤2:健康检查
| Check | Threshold | Action |
|---|---|---|
| Thin tag archives | < 5 posts per tag | Recommend noindex or merge |
| Orphan tags | 0 posts | Recommend deletion |
| Tag bloat | > 50 total tags | Recommend consolidation |
| Category depth | > 3 levels | Recommend flattening |
| Uncategorized posts | No category assigned | Assign to appropriate category |
| Duplicate slugs | Same slug, different name | Merge into canonical version |
| 检查项 | 阈值 | 操作建议 |
|---|---|---|
| 单薄标签归档 | 每个标签对应文章<5篇 | 建议设置noindex或合并标签 |
| 孤立标签 | 0篇文章使用 | 建议删除 |
| 标签冗余 | 总标签数>50 | 建议合并精简 |
| 分类深度 | >3层级 | 建议扁平化 |
| 未分类文章 | 未分配分类 | 分配至合适分类 |
| 重复slug | slug相同但名称不同 | 合并为标准版本 |
Step 3: Recommendations
步骤3:建议
Group findings by priority:
- Critical: orphan tags creating empty archive pages (crawl waste)
- High: thin tags with < 3 posts (poor user experience, weak SEO signal)
- Medium: tag bloat over 50 (diluted taxonomy, harder to navigate)
- Low: naming inconsistencies (mixed case, hyphen vs space)
按优先级分组展示结果:
- 严重:孤立标签会生成空归档页面(浪费爬虫资源)
- 高优先级:对应文章<3篇的单薄标签(用户体验差,SEO信号弱)
- 中优先级:标签数超过50的冗余问题(分类体系分散,导航难度大)
- 低优先级:命名不一致问题(大小写混合、连字符与空格混用)
Output Format
输出格式
undefinedundefinedTaxonomy Audit: [Site/Directory]
分类体系审计:[站点/目录]
Total tags: [n] | Total categories: [n]
Healthy: [n] | Thin: [n] | Orphan: [n]
总标签数: [n] | 总分类数: [n]
健康: [n] | 单薄: [n] | 孤立: [n]
Critical Issues
严重问题
- [orphan tags list]
- [孤立标签列表]
Recommendations
建议
- Merge [tag-a] and [tag-b] (same topic, [n] combined posts)
- Delete orphan tags: [list]
- Add noindex to tag archives with < 5 posts
undefined- 合并[tag-a]与[tag-b](主题相同,合并后共[n]篇文章)
- 删除孤立标签:[列表]
- 为对应文章<5篇的标签归档设置noindex
undefinedSite-Wide Guidelines
全站规范
- Aim for 5-10 main categories per site (broad topics)
- Tags should have at least 5 posts before creating an archive page
- Use consistent slug format: lowercase, hyphen-separated
- Every post needs exactly 1 primary category
- Tags per post: 3-8 recommended, never exceed 15
- 每个站点目标设置5-10个主分类(宽泛主题)
- 创建标签归档页面之前,该标签需对应至少5篇文章
- 使用统一的slug格式:小写、连字符分隔
- 每篇文章必须分配恰好1个主分类
- 每篇文章的标签数:建议3-8个,切勿超过15个
Environment Variables
环境变量
| Variable | Purpose | Example |
|---|---|---|
| CMS_TYPE | Platform identifier | wordpress, shopify, ghost, strapi, sanity |
| CMS_URL | Base URL of the CMS | https://example.com |
| CMS_API_KEY | Authentication credential | Application password, API token, or key |
These must be set in the shell environment. Never store credentials in files or
commit them to version control. The skill reads them via , ,
and at runtime.
$CMS_TYPE$CMS_URL$CMS_API_KEY| 变量 | 用途 | 示例 |
|---|---|---|
| CMS_TYPE | 平台标识符 | wordpress, shopify, ghost, strapi, sanity |
| CMS_URL | CMS的基础URL | https://example.com |
| CMS_API_KEY | 认证凭证 | 应用密码、API令牌或密钥 |
这些变量必须在Shell环境中设置。切勿将凭证存储在文件中或提交至版本控制系统。本工具在运行时通过、和读取这些变量。
$CMS_TYPE$CMS_URL$CMS_API_KEYError Handling
错误处理
- Missing environment variables: If CMS_TYPE, CMS_URL, or CMS_API_KEY is unset, report which variable is missing and provide the expected format
- Invalid credentials: If the CMS API returns 401/403, report "Authentication failed - check CMS_API_KEY" and do not retry
- Connection timeouts: If the CMS endpoint is unreachable after 10 seconds, report the timeout and suggest checking CMS_URL
- Duplicate tag slugs: If a tag already exists on the CMS, skip creation and note "Tag already exists: [name]"
- Rate limits: If the CMS API returns 429, wait and retry once. Report if the limit persists
- Unsupported CMS: If CMS_TYPE is not one of the 5 supported platforms, list the valid options and exit
- 缺失环境变量:若CMS_TYPE、CMS_URL或CMS_API_KEY未设置,报告缺失的变量并提供预期格式
- 无效凭证:若CMS API返回401/403,报告“认证失败 - 检查CMS_API_KEY”且不重试
- 连接超时:若CMS端点10秒内无法访问,报告超时并建议检查CMS_URL
- 重复标签slug:若CMS上已存在该标签,跳过创建并记录“标签已存在:[名称]”
- 速率限制:若CMS API返回429,等待后重试一次。若限制持续则报告
- 不支持的CMS:若CMS_TYPE不属于5个支持平台之一,列出有效选项并退出