reverse-engineering-api

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Reverse Engineering API Skill

API逆向工程技能

This skill enables you to reverse engineer web APIs by:
  1. Controlling a browser with HAR recording enabled
  2. Analyzing captured network traffic
  3. Generating production-ready Python API clients
此技能可让你通过以下方式逆向工程Web API:
  1. 控制启用HAR录制的浏览器
  2. 分析捕获的网络流量
  3. 生成可用于生产环境的Python API客户端

Prerequisites

前提条件

  • Playwright MCP: You must have access to Playwright MCP tools for browser control
  • HAR Recording: The browser must be configured to record HAR files
  • Python: For running analysis scripts and generated clients
  • Playwright MCP:你必须有权限使用Playwright MCP工具来控制浏览器
  • HAR录制:浏览器必须配置为录制HAR文件
  • Python:用于运行分析脚本和生成的客户端

Workflow Overview

工作流程概述

[User Task] -> [Browser Capture] -> [HAR Analysis] -> [API Client Generation] -> [Testing & Refinement]
[用户任务] -> [浏览器捕获] -> [HAR分析] -> [API客户端生成] -> [测试与优化]

Phase 0: Preparation (Using HAR Helper Scripts)

阶段0:准备(使用HAR辅助脚本)

Available Helper Scripts

可用辅助脚本

This skill provides Python utilities for HAR analysis located at:
Script Directory:
plugins/reverse-api-engineer/skills/reverse-engineering-api/scripts/
Available Scripts:
  • har_filter.py
    - Filter HAR files to API endpoints only
  • har_analyze.py
    - Extract structured endpoint information
  • har_validate.py
    - Validate generated code against HAR analysis
  • har_utils.py
    - Shared utility functions
此技能提供用于HAR分析的Python工具,位于:
脚本目录:
plugins/reverse-api-engineer/skills/reverse-engineering-api/scripts/
可用脚本:
  • har_filter.py
    - 过滤HAR文件,仅保留API端点
  • har_analyze.py
    - 提取结构化的端点信息
  • har_validate.py
    - 根据HAR分析结果验证生成的代码
  • har_utils.py
    - 共享工具函数

Script Usage Pattern

脚本使用流程

Use these scripts in sequence for optimal code generation:
bash
undefined
按顺序使用这些脚本以获得最佳的代码生成效果:
bash
undefined

1. Filter HAR to remove noise (static assets, analytics, CDN)

1. 过滤HAR文件以去除冗余内容(静态资源、分析脚本、CDN资源)

python {SKILL_DIR}/scripts/har_filter.py {har_path} --output filtered.har --stats
python {SKILL_DIR}/scripts/har_filter.py {har_path} --output filtered.har --stats

2. Analyze endpoints and extract patterns

2. 分析端点并提取模式

python {SKILL_DIR}/scripts/har_analyze.py filtered.har --output analysis.json
python {SKILL_DIR}/scripts/har_analyze.py filtered.har --output analysis.json

3. Read analysis for code generation guidance

3. 读取分析结果以指导代码生成

cat analysis.json
cat analysis.json

4. Generate API client code based on analysis

4. 根据分析结果生成API客户端代码

5. Validate generated code

5. 验证生成的代码

python {SKILL_DIR}/scripts/har_validate.py api_client.py analysis.json
undefined
python {SKILL_DIR}/scripts/har_validate.py api_client.py analysis.json
undefined

Why Use These Scripts?

为什么使用这些脚本?

har_filter.py benefits:
  • Reduces HAR file size by 80-90% (removes noise)
  • Focuses analysis on actual API calls
  • Significantly improves code generation quality
  • Outputs statistics showing what was filtered
har_analyze.py benefits:
  • Provides structured endpoint information
  • Detects authentication patterns automatically
  • Identifies pagination mechanisms
  • Extracts request/response schemas
  • Groups endpoints by pattern
har_validate.py benefits:
  • Ensures all endpoints are implemented
  • Validates authentication handling
  • Checks for proper error handling
  • Calculates coverage score (must be >= 90)
  • Identifies missing features
har_filter.py的优势:
  • 将HAR文件大小减少80-90%(去除冗余内容)
  • 专注于实际的API调用分析
  • 显著提升代码生成质量
  • 输出统计信息,展示过滤的内容
har_analyze.py的优势:
  • 提供结构化的端点信息
  • 自动检测认证模式
  • 识别分页机制
  • 提取请求/响应 schema
  • 按模式对端点进行分组
har_validate.py的优势:
  • 确保所有端点都已实现
  • 验证认证处理逻辑
  • 检查错误处理是否完善
  • 计算覆盖率分数(必须≥90)
  • 识别缺失的功能

Task Tracking

任务跟踪

Use TodoWrite to track workflow progress:
  • Mark tasks as
    pending
    ,
    in_progress
    , or
    completed
  • Only ONE task should be
    in_progress
    at a time
  • Complete ALL tasks - never stop early
Example TodoWrite usage:
python
TodoWrite([
  {"content": "Filter HAR using har_filter.py", "status": "in_progress", "activeForm": "Filtering HAR"},
  {"content": "Analyze HAR using har_analyze.py", "status": "pending", "activeForm": "Analyzing endpoints"},
  {"content": "Generate API client", "status": "pending", "activeForm": "Generating code"},
  {"content": "Validate using har_validate.py", "status": "pending", "activeForm": "Validating code"},
  {"content": "Test implementation", "status": "pending", "activeForm": "Testing API client"}
])
CRITICAL: Task tracking ensures complete workflow execution. Never skip tasks or stop early.
使用TodoWrite跟踪工作流程进度:
  • 将任务标记为
    pending
    (待处理)、
    in_progress
    (进行中)或
    completed
    (已完成)
  • 同一时间只能有一个任务处于
    in_progress
    状态
  • 必须完成所有任务,不得提前终止
TodoWrite使用示例:
python
TodoWrite([
  {"content": "使用har_filter.py过滤HAR文件", "status": "in_progress", "activeForm": "过滤HAR文件"},
  {"content": "使用har_analyze.py分析HAR文件", "status": "pending", "activeForm": "分析端点"},
  {"content": "生成API客户端", "status": "pending", "activeForm": "生成代码"},
  {"content": "使用har_validate.py验证", "status": "pending", "activeForm": "验证代码"},
  {"content": "测试实现效果", "status": "pending", "activeForm": "测试API客户端"}
])
重要提示: 任务跟踪确保工作流程完整执行。请勿跳过任务或提前终止。

Phase 1: Browser Capture with HAR Recording

阶段1:启用HAR录制的浏览器捕获

Starting the Browser

启动浏览器

When starting a browser session for API capture:
  1. Launch browser with HAR recording enabled via Playwright MCP
  2. Generate a unique run ID:
    {run_id}
  3. Configure HAR output path:
    ~/.reverse-api/runs/har/{run_id}/recording.har
当启动浏览器会话以捕获API时:
  1. 通过Playwright MCP启动启用HAR录制的浏览器
  2. 生成唯一运行ID:
    {run_id}
  3. 配置HAR输出路径:
    ~/.reverse-api/runs/har/{run_id}/recording.har

During Capture

捕获过程中

Navigate autonomously to trigger the API calls needed:
  • Login flows (capture authentication)
  • Data fetching (capture GET endpoints)
  • Form submissions (capture POST/PUT endpoints)
  • Pagination (capture query parameter patterns)
自主导航以触发所需的API调用:
  • 登录流程(捕获认证信息)
  • 数据获取(捕获GET端点)
  • 表单提交(捕获POST/PUT端点)
  • 分页操作(捕获查询参数模式)

On Browser Close

关闭浏览器时

When the browser closes, note the HAR file location:
HAR file saved to: ~/.reverse-api/runs/har/{run_id}/recording.har
浏览器关闭后,记录HAR文件的位置:
HAR文件已保存至:~/.reverse-api/runs/har/{run_id}/recording.har

Phase 2: HAR Analysis

阶段2:HAR文件分析

Reading the HAR File

读取HAR文件

HAR files are JSON with this structure:
json
{
  "log": {
    "entries": [
      {
        "request": {
          "method": "GET|POST|PUT|DELETE",
          "url": "https://api.example.com/endpoint",
          "headers": [...],
          "postData": {...}
        },
        "response": {
          "status": 200,
          "headers": [...],
          "content": {...}
        }
      }
    ]
  }
}
HAR文件是JSON格式,结构如下:
json
{
  "log": {
    "entries": [
      {
        "request": {
          "method": "GET|POST|PUT|DELETE",
          "url": "https://api.example.com/endpoint",
          "headers": [...],
          "postData": {...}
        },
        "response": {
          "status": 200,
          "headers": [...],
          "content": {...}
        }
      }
    ]
  }
}

Filtering Relevant Entries

筛选相关条目

Filter out noise by excluding:
  • Static assets:
    .js
    ,
    .css
    ,
    .png
    ,
    .jpg
    ,
    .svg
    ,
    .woff
    ,
    .ico
  • Analytics:
    google-analytics
    ,
    segment
    ,
    mixpanel
    ,
    hotjar
  • Ads:
    doubleclick
    ,
    adsense
    ,
    facebook.com/tr
  • CDN resources:
    cloudflare
    ,
    cdn.
    ,
    static.
Focus on:
  • API endpoints:
    /api/
    ,
    /v1/
    ,
    /v2/
    ,
    /graphql
  • XHR/Fetch requests with JSON responses
  • Requests with authentication headers
通过排除以下内容去除冗余信息:
  • 静态资源:
    .js
    .css
    .png
    .jpg
    .svg
    .woff
    .ico
  • 分析脚本:
    google-analytics
    segment
    mixpanel
    hotjar
  • 广告:
    doubleclick
    adsense
    facebook.com/tr
  • CDN资源:
    cloudflare
    cdn.
    static.
重点关注:
  • API端点:
    /api/
    /v1/
    /v2/
    /graphql
  • 返回JSON的XHR/Fetch请求
  • 带有认证头的请求

Extracting Patterns

提取模式

For each relevant endpoint, extract:
  1. URL Pattern: Base URL, path, query parameters
  2. Method: GET, POST, PUT, DELETE, PATCH
  3. Headers:
    • Required headers (Authorization, Content-Type, custom headers)
    • Optional headers (User-Agent, Accept)
  4. Request Body: JSON schema, form data structure
  5. Response Schema: JSON structure, status codes
  6. Authentication: See references/AUTH_PATTERNS.md
针对每个相关端点,提取以下信息:
  1. URL模式:基础URL、路径、查询参数
  2. 请求方法:GET、POST、PUT、DELETE、PATCH
  3. 请求头
    • 必填请求头(Authorization、Content-Type、自定义请求头)
    • 可选请求头(User-Agent、Accept)
  4. 请求体:JSON schema、表单数据结构
  5. 响应Schema:JSON结构、状态码
  6. 认证方式:参考references/AUTH_PATTERNS.md

Phase 3: API Client Generation

阶段3:API客户端生成

Code Structure

代码结构

Generate a Python module with:
{output_dir}/
  api_client.py    # Main API client class
  README.md        # Usage documentation
生成一个Python模块,包含:
{output_dir}/
  api_client.py    # 主API客户端类
  README.md        # 使用文档

api_client.py Template

api_client.py模板

python
"""
Auto-generated API client for {domain}
Generated from HAR capture on {date}
"""

import requests
from typing import Optional, Dict, Any, List
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class {ClassName}Client:
    """API client for {domain}."""
    
    def __init__(
        self,
        base_url: str = "{base_url}",
        session: Optional[requests.Session] = None,
    ):
        self.base_url = base_url.rstrip("/")
        self.session = session or requests.Session()
        self._setup_session()
    
    def _setup_session(self):
        """Configure session with default headers."""
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (compatible)",
            "Accept": "application/json",
            # Add other required headers
        })
    
    def _request(
        self,
        method: str,
        endpoint: str,
        **kwargs,
    ) -> requests.Response:
        """Make an HTTP request with error handling."""
        url = f"{self.base_url}{endpoint}"
        try:
            response = self.session.request(method, url, **kwargs)
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            logger.error(f"Request failed: {e}")
            raise
    
    # Generated endpoint methods go here
    def get_example(self, param: str) -> Dict[str, Any]:
        """
        Fetch example data.
        
        Args:
            param: Description of parameter
            
        Returns:
            JSON response data
        """
        response = self._request("GET", f"/api/example/{param}")
        return response.json()
python
"""
为{domain}自动生成的API客户端
基于{date}的HAR捕获内容生成
"""

import requests
from typing import Optional, Dict, Any, List
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class {ClassName}Client:
    """{domain}的API客户端。"""
    
    def __init__(
        self,
        base_url: str = "{base_url}",
        session: Optional[requests.Session] = None,
    ):
        self.base_url = base_url.rstrip("/")
        self.session = session or requests.Session()
        self._setup_session()
    
    def _setup_session(self):
        """配置会话,设置默认请求头。"""
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (compatible)",
            "Accept": "application/json",
            # 添加其他必填请求头
        })
    
    def _request(
        self,
        method: str,
        endpoint: str,
        **kwargs,
    ) -> requests.Response:
        """发起HTTP请求并处理错误。"""
        url = f"{self.base_url}{endpoint}"
        try:
            response = self.session.request(method, url, **kwargs)
            response.raise_for_status()
            return response
        except requests.exceptions.RequestException as e:
            logger.error(f"请求失败:{e}")
            raise
    
    # 生成的端点方法将放在这里
    def get_example(self, param: str) -> Dict[str, Any]:
        """
        获取示例数据。
        
        参数:
            param: 参数说明
            
        返回:
            JSON响应数据
        """
        response = self._request("GET", f"/api/example/{param}")
        return response.json()

Example usage

使用示例

if name == "main": client = {ClassName}Client() # Example calls
undefined
if name == "main": client = {ClassName}Client() # 示例调用
undefined

Code Quality Requirements

代码质量要求

All generated code must include:
  1. Type hints for all parameters and return values
  2. Docstrings for all public methods
  3. Error handling with try-except blocks
  4. Logging for debugging
  5. Session management for connection reuse
  6. Authentication handling based on detected patterns
所有生成的代码必须包含:
  1. 类型提示:所有参数和返回值都要有类型提示
  2. 文档字符串:所有公共方法都要有文档字符串
  3. 错误处理:包含try-except块
  4. 日志记录:用于调试
  5. 会话管理:复用连接
  6. 认证处理:基于检测到的模式实现

Phase 4: Testing & Refinement

阶段4:测试与优化

Testing the Generated Client

测试生成的客户端

After generating the client:
  1. Run the example usage section
  2. Verify responses match expected structure
  3. Handle any errors encountered
生成客户端后:
  1. 运行示例代码
  2. 验证响应是否符合预期结构
  3. 处理遇到的任何错误

Iteration Protocol

迭代流程

You have up to 5 attempts to fix issues:
Attempt 1: Initial implementation
  - What was tried
  - What failed (if anything)
  - What was changed

Attempt 2: Refinement
  ...
你最多有5次尝试修复问题的机会:
尝试1:初始实现
  - 已尝试的内容
  - 失败点(如果有)
  - 已修改的内容

尝试2:优化
  ...

Common Issues

常见问题

IssueSolution
403 ForbiddenAdd missing headers, check authentication
Bot detectionSwitch to Playwright with stealth mode
Rate limitingAdd delays, respect Retry-After headers
Session expiryImplement token refresh logic
CORS errorsUse server-side requests (not applicable to Python)
问题解决方案
403 Forbidden添加缺失的请求头,检查认证方式
机器人检测切换到启用隐身模式的Playwright
请求频率限制添加延迟,遵守Retry-After请求头
会话过期实现令牌刷新逻辑
CORS错误使用服务端请求(Python客户端不涉及此问题)

Domain Discovery (Optional)

域名发现(可选)

Before capture, you may want to map the domain to understand its structure.
在捕获之前,你可能需要映射域名以了解其结构。

Using the Mapper Script

使用Mapper脚本

Run
scripts/mapper.py
to quickly discover:
  • All pages on the domain or subdomains
  • Subdomains
It is useful for generalizing your scripts on multitenants websites.
For example, for Ashby ATS or Workday it's useful to find other companies using this ATS when trying to generalize your script.
bash
python scripts/mapper.py https://example.com
运行
scripts/mapper.py
可快速发现:
  • 域名或子域名下的所有页面
  • 子域名
这对于在多租户网站上通用化脚本非常有用。
例如,针对Ashby ATS或Workday,当你想通用化脚本时,查找使用该ATS的其他公司会很有帮助。
bash
python scripts/mapper.py https://example.com

Using the Sitemap Parser

使用站点地图解析器

Run
scripts/sitemap.py
to extract URLs from sitemaps:
bash
python scripts/sitemap.py https://example.com
运行
scripts/sitemap.py
从站点地图中提取URL:
bash
python scripts/sitemap.py https://example.com

Output Locations

输出位置

  • HAR files:
    ~/.reverse-api/runs/har/{run_id}/
  • Generated scripts:
    ./{task_name}
  • HAR文件
    ~/.reverse-api/runs/har/{run_id}/
  • 生成的脚本
    ./{task_name}

Example Session

示例会话

User: "Create an API client for the Apple Jobs website"


1. [Browser Capture]
   Launch browser with HAR recording
   Navigate to jobs.apple.com
   Perform search, browse listings
   Close browser
   HAR saved to: ~/.reverse-api/runs/har/{run_id}/recording.har

   Note: you can monitor browser requests with the Playwright MCP

2. [HAR Analysis]
   Found endpoints:
   - GET /api/role/search?query=...
   - GET /api/role/{id}
   Authentication: None required (public API)

3. [Generate Client]
   Create : {task_name}/api_client.py
   
4. [Test]
   Ran example usage - Success!
   
5. [Summary]
   Generated Apple Jobs API client with:
   - search_roles(query, location, page)
   - get_role(role_id)
   Files: ./{task_name}/
用户:"为Apple Jobs网站创建一个API客户端"


1. [浏览器捕获]
   启动启用HAR录制的浏览器
   导航至jobs.apple.com
   执行搜索、浏览职位列表
   关闭浏览器
   HAR文件保存至:~/.reverse-api/runs/har/{run_id}/recording.har

   注意:你可以通过Playwright MCP监控浏览器请求

2. [HAR分析]
   发现的端点:
   - GET /api/role/search?query=...
   - GET /api/role/{id}
   认证方式:无需认证(公开API)

3. [生成客户端]
   创建:{task_name}/api_client.py
   
4. [测试]
   运行示例代码 - 成功!
   
5. [总结]
   生成的Apple Jobs API客户端包含:
   - search_roles(query, location, page)
   - get_role(role_id)
   文件位置:./{task_name}/