generate-import-html

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Generate Import HTML

生成导入用HTML

Create plain HTML file with block structure from authoring analysis.
根据创作分析创建带有块结构的纯HTML文件。

When to Use This Skill

何时使用此技能

Use this skill when:
  • You have complete authoring analysis (all sequences have decisions)
  • You have section styling validation (from authoring-analysis)
  • Ready to generate the HTML file for preview
Invoked by: page-import skill (Step 4)
在以下场景使用此技能:
  • 您已完成创作分析(所有序列均已确定)
  • 您已完成章节样式验证(来自authoring-analysis)
  • 准备好生成用于预览的HTML文件
调用方: page-import 技能(步骤4)

Prerequisites

前置条件

From previous skills, you need:
  • ✅ Authoring analysis with block selections (from authoring-analysis)
  • ✅ Section styling decisions (from authoring-analysis Step 3e)
  • ✅ metadata.json with paths and metadata (from scrape-webpage)
  • ✅ cleaned.html with content (from scrape-webpage)
  • ✅ Block structures fetched (from authoring-analysis Step 3d)
您需要来自之前技能的以下内容:
  • ✅ 包含块选择的创作分析结果(来自authoring-analysis)
  • ✅ 章节样式决策(来自authoring-analysis步骤3e)
  • ✅ 包含路径和元数据的metadata.json(来自scrape-webpage)
  • ✅ 包含内容的cleaned.html(来自scrape-webpage)
  • ✅ 获取到的块结构(来自authoring-analysis步骤3d)

Related Skills

相关技能

  • page-import - Orchestrator that invokes this skill
  • authoring-analysis - Provides authoring decisions and styling validation
  • scrape-webpage - Provides metadata, paths, cleaned HTML, images
  • preview-import - Uses this skill's HTML output
  • page-import - 调用此技能的编排器
  • authoring-analysis - 提供创作决策和样式验证结果
  • scrape-webpage - 提供元数据、路径、清理后的HTML和图片
  • preview-import - 使用此技能生成的HTML输出

⚠️ CRITICAL REQUIREMENT: Complete Content Import

⚠️ 关键要求:完整内容导入

YOU MUST IMPORT ALL CONTENT FROM THE PAGE. PARTIAL IMPORT IS UNACCEPTABLE.
  • ❌ NEVER truncate or skip sections due to length concerns
  • ❌ NEVER summarize or abbreviate content
  • ❌ NEVER use placeholders like "<!-- rest of content -->"
  • ❌ NEVER omit content because the page is "too long"
  • ✅ ALWAYS import every section from authoring analysis
  • ✅ ALWAYS include all text, images, and structure from cleaned.html
  • ✅ If you encounter length issues, generate the FULL HTML anyway
Validation requirement: You MUST verify that the number of sections in your HTML matches the number of sections from identify-page-structure. If they don't match, you have made an error.

您必须导入页面的所有内容。部分导入是不允许的。
  • ❌ 绝不要因长度问题截断或跳过章节
  • ❌ 绝不要总结或缩写内容
  • ❌ 绝不要使用类似"<!-- rest of content -->"的占位符
  • ❌ 绝不要因页面“过长”而省略内容
  • ✅ 始终导入创作分析中的所有章节
  • ✅ 始终包含cleaned.html中的所有文本、图片和结构
  • ✅ 如果遇到长度问题,无论如何都要生成完整的HTML
验证要求: 您必须验证HTML中的章节数量与identify-page-structure识别的章节数量一致。如果不一致,说明您出现了错误。

HTML Generation Workflow

HTML生成流程

Structure Requirements

结构要求

IMPORTANT CHANGE: The AEM CLI now automatically wraps HTML content with headful structure (head, header, footer). You MUST generate ONLY the section content.
What to generate:
  • ✅ Section divs with content:
    <div>...</div>
    (one per section)
  • ✅ Blocks as
    <div class="block-name">
    with nested divs
  • ✅ Default content (headings, paragraphs, links, images)
  • ✅ Section metadata blocks where validated in authoring-analysis
What NOT to generate:
  • ❌ NO
    <html>
    ,
    <head>
    , or
    <body>
    tags
  • ❌ NO
    <header>
    or
    <footer>
    elements
  • ❌ NO
    <main>
    wrapper element
  • ❌ NO head content (meta tags, title, etc. - this comes from project's head.html)
Structure format:
html
<div>
  <!-- Section 1 content -->
</div>
<div>
  <!-- Section 2 content with section-metadata if needed -->
  <div class="section-metadata">
    <div>
      <div>Style</div>
      <div>grey</div>
    </div>
  </div>
  <!-- Section 2 blocks/content -->
</div>
<div>
  <!-- Section 3 content -->
</div>
For detailed block structure patterns: See
../page-import/resources/html-structure.md

重要变更: AEM CLI现在会自动为HTML内容添加完整的头部结构(head、header、footer)。您只需生成章节内容即可。
需要生成的内容:
  • ✅ 包含内容的章节div:
    <div>...</div>
    (每个章节一个)
  • ✅ 块以
    <div class="block-name">
    形式呈现,并包含嵌套div
  • ✅ 默认内容(标题、段落、链接、图片)
  • ✅ 在创作分析中验证过的章节元数据块
禁止生成的内容:
  • ❌ 不要使用
    <html>
    <head>
    <body>
    标签
  • ❌ 不要使用
    <header>
    <footer>
    元素
  • ❌ 不要使用
    <main>
    包裹元素
  • ❌ 不要添加头部内容(元标签、标题等 - 这些来自项目的head.html)
结构格式:
html
<div>
  <!-- Section 1 content -->
</div>
<div>
  <!-- Section 2 content with section-metadata if needed -->
  <div class="section-metadata">
    <div>
      <div>Style</div>
      <div>grey</div>
    </div>
  </div>
  <!-- Section 2 blocks/content -->
</div>
<div>
  <!-- Section 3 content -->
</div>
详细块结构模式: 请参阅
../page-import/resources/html-structure.md

Section Metadata Application

章节元数据应用

Apply validated decisions from authoring-analysis Step 3e:
WITH section-metadata (section provides container styling):
html
<div>
  <div class="section-metadata">
    <div>
      <div>Style</div>
      <div>dark</div>
    </div>
  </div>
  <div class="tabs">
    <!-- Tabs block content -->
  </div>
</div>
WITHOUT section-metadata (background is block-specific):
html
<div>
  <div class="hero">
    <!-- Hero block content with its own dark background -->
  </div>
</div>
Important:
  • Only migrate visible body content sections (skip header, navigation, footer - auto-generated)
  • Use consistent style names from identify-page-structure
  • Apply validated decisions from authoring-analysis Step 3e - Skip section-metadata for single-block sections where background is block-specific
  • Place
    section-metadata
    div at the start of each section that needs styling
  • The metadata div will be processed and removed by the platform
  • Each section is a separate top-level
    <div>
    element

应用来自authoring-analysis步骤3e的已验证决策:
包含section-metadata(章节提供容器样式):
html
<div>
  <div class="section-metadata">
    <div>
      <div>Style</div>
      <div>dark</div>
    </div>
  </div>
  <div class="tabs">
    <!-- Tabs block content -->
  </div>
</div>
不包含section-metadata(背景由块自身定义):
html
<div>
  <div class="hero">
    <!-- Hero block content with its own dark background -->
  </div>
</div>
注意事项:
  • 仅迁移可见的正文内容章节(跳过头部、导航、页脚 - 这些会自动生成)
  • 使用来自identify-page-structure的统一样式名称
  • 应用来自authoring-analysis步骤3e的已验证决策 - 对于背景由块自身定义的单块章节,跳过section-metadata
  • section-metadata
    div放在每个需要样式的章节的开头
  • 元数据div会被平台处理并移除
  • 每个章节是独立的顶级
    <div>
    元素

Page Metadata Block

页面元数据块

Unless user explicitly requested to skip metadata, use the metadata extracted from scrape-webpage to generate a metadata block.
Process:
1. Review extracted metadata from metadata.json
2. Map each property to standard format:
Title:
  • Compare source
    title
    (or
    og:title
    ) with first H1 on page
  • If matches first H1 → Omit (platform defaults to H1)
  • If differs → Include as
    title
    property
Description:
  • Compare source
    description
    (or
    og:description
    ) with first paragraph
  • If matches first paragraph → Consider omitting (platform defaults to first paragraph)
  • If differs OR more descriptive → Include as
    description
    property
  • Check: 150-160 characters ideal
Image:
  • Check source
    og:image
  • If matches first content image → Consider omitting (platform defaults to first image)
  • If custom social image → Include as
    image
    property
  • Ensure absolute URL or correct relative path
  • Check: 1200x630 pixels recommended
Canonical:
  • If points to same page URL → Omit (platform auto-generates)
  • If points to different page → Include as
    canonical
    property
Tags:
  • Map
    article:tag
    or
    keywords
    → comma-separated
    tags
    property
Properties to SKIP (platform auto-populates):
  • og:url
    ,
    og:title
    ,
    og:description
    ,
    twitter:title
    ,
    twitter:description
    ,
    twitter:image
  • viewport
    ,
    charset
    ,
    X-UA-Compatible
    (belong in head.html)
3. Generate metadata block HTML:
html
<div>
  <div class="metadata">
    <div>
      <div>title</div>
      <div>[Your mapped title]</div>
    </div>
    <div>
      <div>description</div>
      <div>[Your mapped description]</div>
    </div>
    <!-- Only include image if custom -->
    <!-- Only include canonical if differs from page URL -->
    <!-- Only include tags if present -->
  </div>
</div>
Append metadata block as the last section div at the end of the HTML file.
Detailed guidance: See
resources/metadata-extraction.md
and
resources/metadata-mapping.md

除非用户明确要求跳过元数据,否则使用来自scrape-webpage提取的元数据生成元数据块。
流程:
1. 查看来自metadata.json的提取元数据
2. 将每个属性映射为标准格式:
标题:
  • 比较源
    title
    (或
    og:title
    )与页面上的第一个H1
  • 如果与第一个H1匹配 → 省略(平台默认使用H1)
  • 如果不同 → 作为
    title
    属性包含
描述:
  • 比较源
    description
    (或
    og:description
    )与第一段内容
  • 如果与第一段匹配 → 考虑省略(平台默认使用第一段)
  • 如果不同或更具描述性 → 作为
    description
    属性包含
  • 检查:理想长度为150-160字符
图片:
  • 检查源
    og:image
  • 如果与第一个内容图片匹配 → 考虑省略(平台默认使用第一张图片)
  • 如果是自定义社交图片 → 作为
    image
    属性包含
  • 确保使用绝对URL或正确的相对路径
  • 检查:建议尺寸为1200x630像素
规范链接:
  • 如果指向同一页面URL → 省略(平台自动生成)
  • 如果指向不同页面 → 作为
    canonical
    属性包含
标签:
  • 映射
    article:tag
    keywords
    → 以逗号分隔的
    tags
    属性
需要跳过的属性(平台自动填充):
  • og:url
    ,
    og:title
    ,
    og:description
    ,
    twitter:title
    ,
    twitter:description
    ,
    twitter:image
  • viewport
    ,
    charset
    ,
    X-UA-Compatible
    (属于head.html)
3. 生成元数据块HTML:
html
<div>
  <div class="metadata">
    <div>
      <div>title</div>
      <div>[您的映射标题]</div>
    </div>
    <div>
      <div>description</div>
      <div>[您的映射描述]</div>
    </div>
    <!-- 仅在自定义时包含图片 -->
    <!-- 仅当与页面URL不同时包含规范链接 -->
    <!-- 仅在存在时包含标签 -->
  </div>
</div>
将元数据块作为最后一个章节div追加到HTML文件末尾。
详细指南: 请参阅
resources/metadata-extraction.md
resources/metadata-mapping.md

Images Folder Management (CRITICAL)

图片文件夹管理(关键)

The images are currently in
./import-work/images/
and the HTML references them as
./images/...
. You MUST handle the images folder correctly:
Step 1: Determine the correct images folder location
Based on
paths.htmlFilePath
from metadata.json:
  • HTML file:
    us/en/about.plain.html
    → Images should be at:
    us/en/images/
  • HTML file:
    products/widget.plain.html
    → Images should be at:
    products/images/
  • HTML file:
    index.plain.html
    → Images should be at:
    images/
Rule: Images folder goes in the same directory as the HTML file.
Step 2: Copy the images folder
bash
undefined
图片当前位于
./import-work/images/
,HTML中引用为
./images/...
。您必须正确处理图片文件夹:
步骤1:确定正确的图片文件夹位置
根据metadata.json中的
paths.htmlFilePath
  • HTML文件:
    us/en/about.plain.html
    → 图片应位于:
    us/en/images/
  • HTML文件:
    products/widget.plain.html
    → 图片应位于:
    products/images/
  • HTML文件:
    index.plain.html
    → 图片应位于:
    images/
规则: 图片文件夹与HTML文件位于同一目录。
步骤2:复制图片文件夹
bash
undefined

Example: If HTML is at us/en/about.plain.html

示例:如果HTML位于us/en/about.plain.html

mkdir -p us/en/images cp -r ./import-work/images/* us/en/images/

**Step 3: Verify image paths in HTML are correct**

The HTML should already reference images as `./images/...` which is correct for files in the same directory. No path changes needed in the HTML.

**Example:**
HTML location: us/en/about.plain.html Images location: us/en/images/ Image reference in HTML: <img src="./images/abc123.jpg"> Result: ✅ Correct - browser resolves to us/en/images/abc123.jpg

---
mkdir -p us/en/images cp -r ./import-work/images/* us/en/images/

**步骤3:验证HTML中的图片路径是否正确**

HTML中已将图片引用为`./images/...`,这对于同一目录下的文件是正确的。无需修改HTML中的路径。

**示例:**
HTML位置:us/en/about.plain.html 图片位置:us/en/images/ HTML中的图片引用:<img src="./images/abc123.jpg"> 结果:✅ 正确 - 浏览器会解析为us/en/images/abc123.jpg

---

Save HTML File

保存HTML文件

Save to: Use
paths.htmlFilePath
from metadata.json (e.g.,
us/en/about.plain.html
)
Read the metadata.json file from scrape-webpage to get the correct file path.

保存路径: 使用metadata.json中的
paths.htmlFilePath
(例如
us/en/about.plain.html
从scrape-webpage获取metadata.json文件以获取正确的文件路径。

Validation Checklist (MANDATORY)

验证清单(必填)

Before proceeding to preview-import skill, verify:
  • ✅ Section count: HTML has the same number of top-level
    <div>
    sections as identified in identify-page-structure
  • ✅ All sequences: Every content sequence from authoring-analysis appears in the HTML
  • ✅ No truncation: No "..." or "<!-- more content -->" or similar placeholders
  • ✅ Complete text: All headings, paragraphs, and text from cleaned.html are present
  • ✅ All images: Every image reference from the scraped page is included
  • ✅ HTML file saved: HTML file written to disk at the correct path
  • ✅ Images folder copied: Images folder exists in the same directory as the HTML file
  • ✅ Images accessible: Verify that at least one image file exists in the copied images folder
If any validation check fails, STOP and fix before proceeding.

在进入preview-import技能之前,请验证:
  • ✅ 章节数量:HTML中的顶级
    <div>
    章节数量与identify-page-structure中识别的数量一致
  • ✅ 所有序列:来自authoring-analysis的每个内容序列都出现在HTML中
  • ✅ 无截断:没有“...”或“<!-- more content -->”或类似占位符
  • ✅ 文本完整:包含cleaned.html中的所有标题、段落和文本
  • ✅ 图片完整:包含来自抓取页面的所有图片引用
  • ✅ HTML文件已保存:HTML文件已写入磁盘的正确路径
  • ✅ 图片文件夹已复制:图片文件夹与HTML文件位于同一目录
  • ✅ 图片可访问:验证复制的图片文件夹中至少存在一个图片文件
如果任何验证检查失败,请停止并修复后再继续。

Output

输出结果

This skill provides:
  • ✅ HTML file at correct path (e.g.,
    us/en/about.plain.html
    )
  • ✅ Images folder in same directory (e.g.,
    us/en/images/
    )
  • ✅ Complete content import (all sections)
  • ✅ Proper block structure
  • ✅ Section metadata applied per validation
  • ✅ Page metadata block included
Next step: Pass HTML file path to preview-import skill
此技能提供以下内容:
  • ✅ 位于正确路径的HTML文件(例如
    us/en/about.plain.html
  • ✅ 与HTML文件同目录的图片文件夹(例如
    us/en/images/
  • ✅ 完整的内容导入(所有章节)
  • ✅ 正确的块结构
  • ✅ 已根据验证结果应用章节元数据
  • ✅ 包含页面元数据块
下一步: 将HTML文件路径传递给preview-import技能