spa-reverse-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SPA Reverse Engineering — React + Vite + Workbox + CDP

SPA逆向工程——React + Vite + Workbox + CDP

Reverse engineer modern SPAs to extract APIs, intercept service workers, debug runtime state, and build tooling.
对现代单页应用(SPA)进行逆向工程,以提取API、拦截服务工作线程、调试运行时状态并构建相关工具。

When to use

适用场景

Use this skill when:
  • Analyzing perplexity.ai SPA internals (React component tree, state, hooks)
  • Intercepting Workbox service worker caching and request strategies
  • Using Chrome DevTools Protocol (CDP) to automate browser interactions
  • Building Chrome extensions for traffic interception or state extraction
  • Debugging Vite-bundled source maps and module graph
  • Extracting GraphQL/REST schemas from SPA network layer
  • Writing Puppeteer/Playwright scripts for automated API discovery
在以下场景中使用本技能:
  • 分析perplexity.ai的SPA内部机制(React组件树、状态、钩子)
  • 拦截Workbox服务工作线程的缓存与请求策略
  • 使用Chrome DevTools Protocol(CDP)实现浏览器交互自动化
  • 开发用于流量拦截或状态提取的Chrome扩展
  • 调试Vite打包的源映射与模块图谱
  • 从SPA网络层提取GraphQL/REST模式
  • 编写Puppeteer/Playwright脚本以实现API自动发现

Instructions

操作步骤

Step 1: Identify SPA Stack

步骤1:识别SPA技术栈

Detect the technology stack of the target SPA:
javascript
// In DevTools Console:

// React detection
window.__REACT_DEVTOOLS_GLOBAL_HOOK__  // React DevTools presence
document.querySelector('#__next')  // Next.js
document.querySelector('#root')    // Vite/CRA
document.querySelector('#app')     // Vue (for comparison)

// Vite detection
document.querySelector('script[type="module"]')  // ESM modules
// Check source for /@vite/client or /.vite/ paths

// Workbox / Service Worker
navigator.serviceWorker.getRegistrations()  // List SWs
// Check Application → Service Workers in DevTools

// State management
window.__REDUX_DEVTOOLS_EXTENSION__  // Redux
// React DevTools → Components → hooks for Zustand/Jotai/Recoil
检测目标SPA的技术栈:
javascript
// In DevTools Console:

// React detection
window.__REACT_DEVTOOLS_GLOBAL_HOOK__  // React DevTools presence
document.querySelector('#__next')  // Next.js
document.querySelector('#root')    // Vite/CRA
document.querySelector('#app')     // Vue (for comparison)

// Vite detection
document.querySelector('script[type="module"]')  // ESM modules
// Check source for /@vite/client or /.vite/ paths

// Workbox / Service Worker
navigator.serviceWorker.getRegistrations()  // List SWs
// Check Application → Service Workers in DevTools

// State management
window.__REDUX_DEVTOOLS_EXTENSION__  // Redux
// React DevTools → Components → hooks for Zustand/Jotai/Recoil

Step 2: React Internals Analysis

步骤2:React内部机制分析

Component Tree Extraction

组件树提取

javascript
// Get React fiber tree from any DOM element
function getFiber(element) {
    const key = Object.keys(element).find(k =>
        k.startsWith('__reactFiber$') || k.startsWith('__reactInternalInstance$')
    );
    return element[key];
}

// Walk fiber tree
function walkFiber(fiber, depth = 0) {
    if (!fiber) return;
    const name = fiber.type?.displayName || fiber.type?.name || fiber.type;
    if (typeof name === 'string') {
        console.log('  '.repeat(depth) + name);
    }
    walkFiber(fiber.child, depth + 1);
    walkFiber(fiber.sibling, depth);
}

// Start from root
const root = document.getElementById('root');
walkFiber(getFiber(root));
javascript
// Get React fiber tree from any DOM element
function getFiber(element) {
    const key = Object.keys(element).find(k =>
        k.startsWith('__reactFiber$') || k.startsWith('__reactInternalInstance$')
    );
    return element[key];
}

// Walk fiber tree
function walkFiber(fiber, depth = 0) {
    if (!fiber) return;
    const name = fiber.type?.displayName || fiber.type?.name || fiber.type;
    if (typeof name === 'string') {
        console.log('  '.repeat(depth) + name);
    }
    walkFiber(fiber.child, depth + 1);
    walkFiber(fiber.sibling, depth);
}

// Start from root
const root = document.getElementById('root');
walkFiber(getFiber(root));

State & Props Extraction

状态与属性提取

javascript
// Extract component state via fiber
function getComponentState(fiber) {
    const state = [];
    let hook = fiber.memoizedState;
    while (hook) {
        state.push(hook.memoizedState);
        hook = hook.next;
    }
    return state;
}

// Find specific component by name
function findComponent(fiber, name) {
    if (!fiber) return null;
    if (fiber.type?.name === name || fiber.type?.displayName === name) {
        return fiber;
    }
    return findComponent(fiber.child, name) || findComponent(fiber.sibling, name);
}
javascript
// Extract component state via fiber
function getComponentState(fiber) {
    const state = [];
    let hook = fiber.memoizedState;
    while (hook) {
        state.push(hook.memoizedState);
        hook = hook.next;
    }
    return state;
}

// Find specific component by name
function findComponent(fiber, name) {
    if (!fiber) return null;
    if (fiber.type?.name === name || fiber.type?.displayName === name) {
        return fiber;
    }
    return findComponent(fiber.child, name) || findComponent(fiber.sibling, name);
}

Step 3: Vite Bundle Analysis

步骤3:Vite打包分析

Source Map Extraction

源映射提取

bash
undefined
bash
undefined

Find source maps from bundled assets

Find source maps from bundled assets

curl -s https://www.perplexity.ai/ | grep -oP 'src="[^"].js"' | while read src; do url=$(echo $src | grep -oP '"[^"]"' | tr -d '"') echo "Checking: $url" curl -sI "https://www.perplexity.ai${url}.map" | head -5 done
undefined
curl -s https://www.perplexity.ai/ | grep -oP 'src="[^"].js"' | while read src; do url=$(echo $src | grep -oP '"[^"]"' | tr -d '"') echo "Checking: $url" curl -sI "https://www.perplexity.ai${url}.map" | head -5 done
undefined

Module Graph

模块图谱

javascript
// In Vite dev mode (if accessible):
// /__vite_module_graph shows dependency graph

// In production — analyze chunks:
// Performance → Network → JS files → Initiator chain
// Sources → Webpack/Vite tree → module paths
javascript
// In Vite dev mode (if accessible):
// /__vite_module_graph shows dependency graph

// In production — analyze chunks:
// Performance → Network → JS files → Initiator chain
// Sources → Webpack/Vite tree → module paths

Step 4: Service Worker & Workbox Interception

步骤4:服务工作线程与Workbox拦截

Analyze Caching Strategy

缓存策略分析

javascript
// List all cached URLs
async function listCaches() {
    const names = await caches.keys();
    for (const name of names) {
        const cache = await caches.open(name);
        const keys = await cache.keys();
        console.log(`Cache: ${name} (${keys.length} entries)`);
        keys.forEach(k => console.log(`  ${k.url}`));
    }
}

// Intercept SW fetch events (from SW scope)
self.addEventListener('fetch', event => {
    console.log('[SW Intercept]', event.request.method, event.request.url);
});
javascript
// List all cached URLs
async function listCaches() {
    const names = await caches.keys();
    for (const name of names) {
        const cache = await caches.open(name);
        const keys = await cache.keys();
        console.log(`Cache: ${name} (${keys.length} entries)`);
        keys.forEach(k => console.log(`  ${k.url}`));
    }
}

// Intercept SW fetch events (from SW scope)
self.addEventListener('fetch', event => {
    console.log('[SW Intercept]', event.request.method, event.request.url);
});

Workbox Strategy Detection

Workbox策略检测

javascript
// Common Workbox strategies to look for in SW source:
// - CacheFirst       → Static assets (fonts, images)
// - NetworkFirst     → API calls (dynamic data)
// - StaleWhileRevalidate → Frequently updated content
// - NetworkOnly      → Always fresh (auth endpoints)
// - CacheOnly        → Offline-only content

// Check SW source for workbox patterns:
// workbox.strategies.CacheFirst
// workbox.routing.registerRoute
// workbox.precaching.precacheAndRoute
javascript
// Common Workbox strategies to look for in SW source:
// - CacheFirst       → Static assets (fonts, images)
// - NetworkFirst     → API calls (dynamic data)
// - StaleWhileRevalidate → Frequently updated content
// - NetworkOnly      → Always fresh (auth endpoints)
// - CacheOnly        → Offline-only content

// Check SW source for workbox patterns:
// workbox.strategies.CacheFirst
// workbox.routing.registerRoute
// workbox.precaching.precacheAndRoute

Step 5: Chrome DevTools Protocol (CDP)

步骤5:Chrome DevTools Protocol(CDP)

Automated Interception via CDP

通过CDP实现自动化拦截

python
import asyncio
from playwright.async_api import async_playwright

async def intercept_with_cdp():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()
        page = await context.new_page()

        # Enable CDP domains
        cdp = await page.context.new_cdp_session(page)

        # Intercept network at CDP level
        await cdp.send('Network.enable')
        cdp.on('Network.requestWillBeSent', lambda params:
            print(f"[CDP] {params['request']['method']} {params['request']['url']}")
        )
        cdp.on('Network.responseReceived', lambda params:
            print(f"[CDP] {params['response']['status']} {params['response']['url']}")
        )

        # Intercept WebSocket frames
        await cdp.send('Network.enable')
        cdp.on('Network.webSocketFrameSent', lambda params:
            print(f"[WS→] {params['response']['payloadData'][:200]}")
        )
        cdp.on('Network.webSocketFrameReceived', lambda params:
            print(f"[←WS] {params['response']['payloadData'][:200]}")
        )

        await page.goto('https://www.perplexity.ai/')
        await page.wait_for_timeout(60000)
python
import asyncio
from playwright.async_api import async_playwright

async def intercept_with_cdp():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        context = await browser.new_context()
        page = await context.new_page()

        # Enable CDP domains
        cdp = await page.context.new_cdp_session(page)

        # Intercept network at CDP level
        await cdp.send('Network.enable')
        cdp.on('Network.requestWillBeSent', lambda params:
            print(f"[CDP] {params['request']['method']} {params['request']['url']}")
        )
        cdp.on('Network.responseReceived', lambda params:
            print(f"[CDP] {params['response']['status']} {params['response']['url']}")
        )

        # Intercept WebSocket frames
        await cdp.send('Network.enable')
        cdp.on('Network.webSocketFrameSent', lambda params:
            print(f"[WS→] {params['response']['payloadData'][:200]}")
        )
        cdp.on('Network.webSocketFrameReceived', lambda params:
            print(f"[←WS] {params['response']['payloadData'][:200]}")
        )

        await page.goto('https://www.perplexity.ai/')
        await page.wait_for_timeout(60000)

Runtime JS Evaluation via CDP

通过CDP执行运行时JS代码

python
undefined
python
undefined

Execute JS in page context

Execute JS in page context

result = await cdp.send('Runtime.evaluate', { 'expression': 'JSON.stringify(window.NEXT_DATA)', 'returnByValue': True, }) next_data = json.loads(result['result']['value'])
undefined
result = await cdp.send('Runtime.evaluate', { 'expression': 'JSON.stringify(window.NEXT_DATA)', 'returnByValue': True, }) next_data = json.loads(result['result']['value'])
undefined

Step 6: Chrome Extension Development

步骤6:Chrome扩展开发

Manifest v3 Extension for Traffic Capture

用于流量捕获的Manifest v3扩展

json
{
    "manifest_version": 3,
    "name": "pplx-sdk Traffic Capture",
    "version": "1.0",
    "permissions": [
        "webRequest", "activeTab", "storage", "debugger"
    ],
    "host_permissions": ["https://www.perplexity.ai/*"],
    "background": {
        "service_worker": "background.js"
    },
    "content_scripts": [{
        "matches": ["https://www.perplexity.ai/*"],
        "js": ["content.js"],
        "run_at": "document_start"
    }]
}
json
{
    "manifest_version": 3,
    "name": "pplx-sdk Traffic Capture",
    "version": "1.0",
    "permissions": [
        "webRequest", "activeTab", "storage", "debugger"
    ],
    "host_permissions": ["https://www.perplexity.ai/*"],
    "background": {
        "service_worker": "background.js"
    },
    "content_scripts": [{
        "matches": ["https://www.perplexity.ai/*"],
        "js": ["content.js"],
        "run_at": "document_start"
    }]
}

Background Script — Request Interception

后台脚本——请求拦截

javascript
// background.js
chrome.webRequest.onBeforeRequest.addListener(
    (details) => {
        if (details.url.includes('/rest/')) {
            console.log('[pplx-capture]', details.method, details.url);
            if (details.requestBody?.raw) {
                const body = new TextDecoder().decode(
                    new Uint8Array(details.requestBody.raw[0].bytes)
                );
                chrome.storage.local.set({
                    [`req_${Date.now()}`]: {
                        url: details.url,
                        method: details.method,
                        body: JSON.parse(body),
                        timestamp: Date.now()
                    }
                });
            }
        }
    },
    { urls: ["https://www.perplexity.ai/rest/*"] },
    ["requestBody"]
);
javascript
// background.js
chrome.webRequest.onBeforeRequest.addListener(
    (details) => {
        if (details.url.includes('/rest/')) {
            console.log('[pplx-capture]', details.method, details.url);
            if (details.requestBody?.raw) {
                const body = new TextDecoder().decode(
                    new Uint8Array(details.requestBody.raw[0].bytes)
                );
                chrome.storage.local.set({
                    [`req_${Date.now()}`]: {
                        url: details.url,
                        method: details.method,
                        body: JSON.parse(body),
                        timestamp: Date.now()
                    }
                });
            }
        }
    },
    { urls: ["https://www.perplexity.ai/rest/*"] },
    ["requestBody"]
);

Content Script — React State Extraction

内容脚本——React状态提取

javascript
// content.js — inject into page context
const script = document.createElement('script');
script.textContent = `
    // Hook into React state updates
    const origSetState = React.Component.prototype.setState;
    React.Component.prototype.setState = function(state, cb) {
        window.postMessage({
            type: 'PPLX_STATE_UPDATE',
            component: this.constructor.name,
            state: JSON.parse(JSON.stringify(state))
        }, '*');
        return origSetState.call(this, state, cb);
    };
`;
document.documentElement.appendChild(script);

// Listen for state updates
window.addEventListener('message', (event) => {
    if (event.data.type === 'PPLX_STATE_UPDATE') {
        chrome.runtime.sendMessage(event.data);
    }
});
javascript
// content.js — inject into page context
const script = document.createElement('script');
script.textContent = `
    // Hook into React state updates
    const origSetState = React.Component.prototype.setState;
    React.Component.prototype.setState = function(state, cb) {
        window.postMessage({
            type: 'PPLX_STATE_UPDATE',
            component: this.constructor.name,
            state: JSON.parse(JSON.stringify(state))
        }, '*');
        return origSetState.call(this, state, cb);
    };
`;
document.documentElement.appendChild(script);

// Listen for state updates
window.addEventListener('message', (event) => {
    if (event.data.type === 'PPLX_STATE_UPDATE') {
        chrome.runtime.sendMessage(event.data);
    }
});

Step 7: Map Discoveries to SDK

步骤7:将分析结果映射到SDK

SPA DiscoverySDK TargetAction
React component state
domain/models.py
Model the state shape
API fetch calls
transport/http.py
Add endpoint methods
SSE event handlers
transport/sse.py
Map event types
Service worker cache
shared/
Understand caching behavior
Auth token flow
shared/auth.py
Token refresh logic
WebSocket frames
transport/
New WebSocket transport
GraphQL queries
domain/
Query/mutation services
SPA分析结果SDK目标文件操作
React组件状态
domain/models.py
对状态结构进行建模
API请求调用
transport/http.py
添加端点方法
SSE事件处理器
transport/sse.py
映射事件类型
服务工作线程缓存
shared/
理解缓存行为
认证令牌流程
shared/auth.py
实现令牌刷新逻辑
WebSocket帧
transport/
新增WebSocket传输模块
GraphQL查询
domain/
实现查询/变更服务

Step 8: SPA Source Code Graph

步骤8:SPA源代码图谱

After runtime analysis, build a static code graph of the SPA source. Delegate to
codegraph
for structural analysis.
完成运行时分析后,构建SPA源代码的静态代码图谱。委托
codegraph
进行结构分析。

Source Map Recovery

源映射恢复

bash
undefined
bash
undefined

Extract original source paths from source maps

Extract original source paths from source maps

curl -s https://www.perplexity.ai/ | grep -oP 'src="(/[^"]*.js)"' | while read -r url; do echo "Checking: $url" curl -s "https://www.perplexity.ai${url}.map" 2>/dev/null |
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null done | sort -u
undefined
curl -s https://www.perplexity.ai/ | grep -oP 'src="(/[^"]*.js)"' | while read -r url; do echo "Checking: $url" curl -s "https://www.perplexity.ai${url}.map" 2>/dev/null |
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null done | sort -u
undefined

Static Analysis (from recovered source or public repo)

静态分析(基于恢复的源代码或公开仓库)

bash
undefined
bash
undefined

Component tree from source

Component tree from source

grep -rn "export (default )?function |export const .* = (" src/ --include=".tsx" --include=".jsx"
grep -rn "export (default )?function |export const .* = (" src/ --include=".tsx" --include=".jsx"

Import graph

Import graph

grep -rn "import .* from " src/ --include=".ts" --include=".tsx" |
awk -F: '{print $1 " → " $NF}' | sort -u
grep -rn "import .* from " src/ --include=".ts" --include=".tsx" |
awk -F: '{print $1 " → " $NF}' | sort -u

Hook usage map

Hook usage map

grep -rn "use[A-Z][a-zA-Z](" src/ --include=".tsx" |
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn
grep -rn "use[A-Z][a-zA-Z](" src/ --include=".tsx" |
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn

API call sites (fetch, axios, etc.)

API call sites (fetch, axios, etc.)

grep -rn "fetch(|axios.|api.|apiClient." src/ --include=".ts" --include=".tsx"
undefined
grep -rn "fetch(|axios.|api.|apiClient." src/ --include=".ts" --include=".tsx"
undefined

Cross-Reference: Runtime ↔ Static

交叉引用:运行时 ↔ 静态代码

Runtime Discovery (spa-expert)Static Discovery (codegraph)Cross-Reference
Fiber tree component namesSource component definitionsMatch names to source files
Hook state valuesHook implementationsMap state shape to hook logic
Network API calls
fetch()
/
axios
call sites
Confirm endpoints in source
Context provider values
createContext()
definitions
Map runtime state to types
Service worker routesWorkbox config in sourceValidate caching strategy
运行时分析结果(spa-expert)静态代码分析结果(codegraph)交叉引用操作
Fiber树组件名称源代码组件定义将组件名称与源文件匹配
钩子状态值钩子实现代码将状态结构与钩子逻辑映射
网络API调用
fetch()
/
axios
调用位置
在源代码中确认端点
Context提供者值
createContext()
定义
将运行时状态与类型映射
服务工作线程路由源代码中的Workbox配置验证缓存策略

Perplexity.ai SPA Notes

Perplexity.ai SPA相关说明

Known Stack

已知技术栈

  • Framework: Next.js (React 18+)
  • Bundler: Webpack (via Next.js, not raw Vite — skill covers both for broader SPA RE)
  • State: React hooks + context (observed patterns)
  • Streaming: SSE via fetch() with ReadableStream
  • Auth: Cookie-based (
    pplx.session-id
    )
  • 框架:Next.js(React 18+)
  • 打包工具:Webpack(通过Next.js,非原生Vite——本技能同时覆盖两者以支持更广泛的SPA逆向工程)
  • 状态管理:React钩子 + Context(已观测到的模式)
  • 流处理:通过fetch()结合ReadableStream实现SSE
  • 认证:基于Cookie(
    pplx.session-id

Key DOM Selectors

关键DOM选择器

javascript
// Query input
document.querySelector('textarea[placeholder*="Ask"]')
// Response area
document.querySelector('[class*="prose"]')
// Thread list
document.querySelector('[class*="thread"]')
javascript
// Query input
document.querySelector('textarea[placeholder*="Ask"]')
// Response area
document.querySelector('[class*="prose"]')
// Thread list
document.querySelector('[class*="thread"]')