spa-reverse-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSPA Reverse Engineering — React + Vite + Workbox + CDP
SPA逆向工程——React + Vite + Workbox + CDP
Reverse engineer modern SPAs to extract APIs, intercept service workers, debug runtime state, and build tooling.
对现代单页应用(SPA)进行逆向工程,以提取API、拦截服务工作线程、调试运行时状态并构建相关工具。
When to use
适用场景
Use this skill when:
- Analyzing perplexity.ai SPA internals (React component tree, state, hooks)
- Intercepting Workbox service worker caching and request strategies
- Using Chrome DevTools Protocol (CDP) to automate browser interactions
- Building Chrome extensions for traffic interception or state extraction
- Debugging Vite-bundled source maps and module graph
- Extracting GraphQL/REST schemas from SPA network layer
- Writing Puppeteer/Playwright scripts for automated API discovery
在以下场景中使用本技能:
- 分析perplexity.ai的SPA内部机制(React组件树、状态、钩子)
- 拦截Workbox服务工作线程的缓存与请求策略
- 使用Chrome DevTools Protocol(CDP)实现浏览器交互自动化
- 开发用于流量拦截或状态提取的Chrome扩展
- 调试Vite打包的源映射与模块图谱
- 从SPA网络层提取GraphQL/REST模式
- 编写Puppeteer/Playwright脚本以实现API自动发现
Instructions
操作步骤
Step 1: Identify SPA Stack
步骤1:识别SPA技术栈
Detect the technology stack of the target SPA:
javascript
// In DevTools Console:
// React detection
window.__REACT_DEVTOOLS_GLOBAL_HOOK__ // React DevTools presence
document.querySelector('#__next') // Next.js
document.querySelector('#root') // Vite/CRA
document.querySelector('#app') // Vue (for comparison)
// Vite detection
document.querySelector('script[type="module"]') // ESM modules
// Check source for /@vite/client or /.vite/ paths
// Workbox / Service Worker
navigator.serviceWorker.getRegistrations() // List SWs
// Check Application → Service Workers in DevTools
// State management
window.__REDUX_DEVTOOLS_EXTENSION__ // Redux
// React DevTools → Components → hooks for Zustand/Jotai/Recoil检测目标SPA的技术栈:
javascript
// In DevTools Console:
// React detection
window.__REACT_DEVTOOLS_GLOBAL_HOOK__ // React DevTools presence
document.querySelector('#__next') // Next.js
document.querySelector('#root') // Vite/CRA
document.querySelector('#app') // Vue (for comparison)
// Vite detection
document.querySelector('script[type="module"]') // ESM modules
// Check source for /@vite/client or /.vite/ paths
// Workbox / Service Worker
navigator.serviceWorker.getRegistrations() // List SWs
// Check Application → Service Workers in DevTools
// State management
window.__REDUX_DEVTOOLS_EXTENSION__ // Redux
// React DevTools → Components → hooks for Zustand/Jotai/RecoilStep 2: React Internals Analysis
步骤2:React内部机制分析
Component Tree Extraction
组件树提取
javascript
// Get React fiber tree from any DOM element
function getFiber(element) {
const key = Object.keys(element).find(k =>
k.startsWith('__reactFiber$') || k.startsWith('__reactInternalInstance$')
);
return element[key];
}
// Walk fiber tree
function walkFiber(fiber, depth = 0) {
if (!fiber) return;
const name = fiber.type?.displayName || fiber.type?.name || fiber.type;
if (typeof name === 'string') {
console.log(' '.repeat(depth) + name);
}
walkFiber(fiber.child, depth + 1);
walkFiber(fiber.sibling, depth);
}
// Start from root
const root = document.getElementById('root');
walkFiber(getFiber(root));javascript
// Get React fiber tree from any DOM element
function getFiber(element) {
const key = Object.keys(element).find(k =>
k.startsWith('__reactFiber$') || k.startsWith('__reactInternalInstance$')
);
return element[key];
}
// Walk fiber tree
function walkFiber(fiber, depth = 0) {
if (!fiber) return;
const name = fiber.type?.displayName || fiber.type?.name || fiber.type;
if (typeof name === 'string') {
console.log(' '.repeat(depth) + name);
}
walkFiber(fiber.child, depth + 1);
walkFiber(fiber.sibling, depth);
}
// Start from root
const root = document.getElementById('root');
walkFiber(getFiber(root));State & Props Extraction
状态与属性提取
javascript
// Extract component state via fiber
function getComponentState(fiber) {
const state = [];
let hook = fiber.memoizedState;
while (hook) {
state.push(hook.memoizedState);
hook = hook.next;
}
return state;
}
// Find specific component by name
function findComponent(fiber, name) {
if (!fiber) return null;
if (fiber.type?.name === name || fiber.type?.displayName === name) {
return fiber;
}
return findComponent(fiber.child, name) || findComponent(fiber.sibling, name);
}javascript
// Extract component state via fiber
function getComponentState(fiber) {
const state = [];
let hook = fiber.memoizedState;
while (hook) {
state.push(hook.memoizedState);
hook = hook.next;
}
return state;
}
// Find specific component by name
function findComponent(fiber, name) {
if (!fiber) return null;
if (fiber.type?.name === name || fiber.type?.displayName === name) {
return fiber;
}
return findComponent(fiber.child, name) || findComponent(fiber.sibling, name);
}Step 3: Vite Bundle Analysis
步骤3:Vite打包分析
Source Map Extraction
源映射提取
bash
undefinedbash
undefinedFind source maps from bundled assets
Find source maps from bundled assets
curl -s https://www.perplexity.ai/ | grep -oP 'src="[^"].js"' | while read src; do
url=$(echo $src | grep -oP '"[^"]"' | tr -d '"')
echo "Checking: $url"
curl -sI "https://www.perplexity.ai${url}.map" | head -5
done
undefinedcurl -s https://www.perplexity.ai/ | grep -oP 'src="[^"].js"' | while read src; do
url=$(echo $src | grep -oP '"[^"]"' | tr -d '"')
echo "Checking: $url"
curl -sI "https://www.perplexity.ai${url}.map" | head -5
done
undefinedModule Graph
模块图谱
javascript
// In Vite dev mode (if accessible):
// /__vite_module_graph shows dependency graph
// In production — analyze chunks:
// Performance → Network → JS files → Initiator chain
// Sources → Webpack/Vite tree → module pathsjavascript
// In Vite dev mode (if accessible):
// /__vite_module_graph shows dependency graph
// In production — analyze chunks:
// Performance → Network → JS files → Initiator chain
// Sources → Webpack/Vite tree → module pathsStep 4: Service Worker & Workbox Interception
步骤4:服务工作线程与Workbox拦截
Analyze Caching Strategy
缓存策略分析
javascript
// List all cached URLs
async function listCaches() {
const names = await caches.keys();
for (const name of names) {
const cache = await caches.open(name);
const keys = await cache.keys();
console.log(`Cache: ${name} (${keys.length} entries)`);
keys.forEach(k => console.log(` ${k.url}`));
}
}
// Intercept SW fetch events (from SW scope)
self.addEventListener('fetch', event => {
console.log('[SW Intercept]', event.request.method, event.request.url);
});javascript
// List all cached URLs
async function listCaches() {
const names = await caches.keys();
for (const name of names) {
const cache = await caches.open(name);
const keys = await cache.keys();
console.log(`Cache: ${name} (${keys.length} entries)`);
keys.forEach(k => console.log(` ${k.url}`));
}
}
// Intercept SW fetch events (from SW scope)
self.addEventListener('fetch', event => {
console.log('[SW Intercept]', event.request.method, event.request.url);
});Workbox Strategy Detection
Workbox策略检测
javascript
// Common Workbox strategies to look for in SW source:
// - CacheFirst → Static assets (fonts, images)
// - NetworkFirst → API calls (dynamic data)
// - StaleWhileRevalidate → Frequently updated content
// - NetworkOnly → Always fresh (auth endpoints)
// - CacheOnly → Offline-only content
// Check SW source for workbox patterns:
// workbox.strategies.CacheFirst
// workbox.routing.registerRoute
// workbox.precaching.precacheAndRoutejavascript
// Common Workbox strategies to look for in SW source:
// - CacheFirst → Static assets (fonts, images)
// - NetworkFirst → API calls (dynamic data)
// - StaleWhileRevalidate → Frequently updated content
// - NetworkOnly → Always fresh (auth endpoints)
// - CacheOnly → Offline-only content
// Check SW source for workbox patterns:
// workbox.strategies.CacheFirst
// workbox.routing.registerRoute
// workbox.precaching.precacheAndRouteStep 5: Chrome DevTools Protocol (CDP)
步骤5:Chrome DevTools Protocol(CDP)
Automated Interception via CDP
通过CDP实现自动化拦截
python
import asyncio
from playwright.async_api import async_playwright
async def intercept_with_cdp():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
page = await context.new_page()
# Enable CDP domains
cdp = await page.context.new_cdp_session(page)
# Intercept network at CDP level
await cdp.send('Network.enable')
cdp.on('Network.requestWillBeSent', lambda params:
print(f"[CDP] {params['request']['method']} {params['request']['url']}")
)
cdp.on('Network.responseReceived', lambda params:
print(f"[CDP] {params['response']['status']} {params['response']['url']}")
)
# Intercept WebSocket frames
await cdp.send('Network.enable')
cdp.on('Network.webSocketFrameSent', lambda params:
print(f"[WS→] {params['response']['payloadData'][:200]}")
)
cdp.on('Network.webSocketFrameReceived', lambda params:
print(f"[←WS] {params['response']['payloadData'][:200]}")
)
await page.goto('https://www.perplexity.ai/')
await page.wait_for_timeout(60000)python
import asyncio
from playwright.async_api import async_playwright
async def intercept_with_cdp():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
context = await browser.new_context()
page = await context.new_page()
# Enable CDP domains
cdp = await page.context.new_cdp_session(page)
# Intercept network at CDP level
await cdp.send('Network.enable')
cdp.on('Network.requestWillBeSent', lambda params:
print(f"[CDP] {params['request']['method']} {params['request']['url']}")
)
cdp.on('Network.responseReceived', lambda params:
print(f"[CDP] {params['response']['status']} {params['response']['url']}")
)
# Intercept WebSocket frames
await cdp.send('Network.enable')
cdp.on('Network.webSocketFrameSent', lambda params:
print(f"[WS→] {params['response']['payloadData'][:200]}")
)
cdp.on('Network.webSocketFrameReceived', lambda params:
print(f"[←WS] {params['response']['payloadData'][:200]}")
)
await page.goto('https://www.perplexity.ai/')
await page.wait_for_timeout(60000)Runtime JS Evaluation via CDP
通过CDP执行运行时JS代码
python
undefinedpython
undefinedExecute JS in page context
Execute JS in page context
result = await cdp.send('Runtime.evaluate', {
'expression': 'JSON.stringify(window.NEXT_DATA)',
'returnByValue': True,
})
next_data = json.loads(result['result']['value'])
undefinedresult = await cdp.send('Runtime.evaluate', {
'expression': 'JSON.stringify(window.NEXT_DATA)',
'returnByValue': True,
})
next_data = json.loads(result['result']['value'])
undefinedStep 6: Chrome Extension Development
步骤6:Chrome扩展开发
Manifest v3 Extension for Traffic Capture
用于流量捕获的Manifest v3扩展
json
{
"manifest_version": 3,
"name": "pplx-sdk Traffic Capture",
"version": "1.0",
"permissions": [
"webRequest", "activeTab", "storage", "debugger"
],
"host_permissions": ["https://www.perplexity.ai/*"],
"background": {
"service_worker": "background.js"
},
"content_scripts": [{
"matches": ["https://www.perplexity.ai/*"],
"js": ["content.js"],
"run_at": "document_start"
}]
}json
{
"manifest_version": 3,
"name": "pplx-sdk Traffic Capture",
"version": "1.0",
"permissions": [
"webRequest", "activeTab", "storage", "debugger"
],
"host_permissions": ["https://www.perplexity.ai/*"],
"background": {
"service_worker": "background.js"
},
"content_scripts": [{
"matches": ["https://www.perplexity.ai/*"],
"js": ["content.js"],
"run_at": "document_start"
}]
}Background Script — Request Interception
后台脚本——请求拦截
javascript
// background.js
chrome.webRequest.onBeforeRequest.addListener(
(details) => {
if (details.url.includes('/rest/')) {
console.log('[pplx-capture]', details.method, details.url);
if (details.requestBody?.raw) {
const body = new TextDecoder().decode(
new Uint8Array(details.requestBody.raw[0].bytes)
);
chrome.storage.local.set({
[`req_${Date.now()}`]: {
url: details.url,
method: details.method,
body: JSON.parse(body),
timestamp: Date.now()
}
});
}
}
},
{ urls: ["https://www.perplexity.ai/rest/*"] },
["requestBody"]
);javascript
// background.js
chrome.webRequest.onBeforeRequest.addListener(
(details) => {
if (details.url.includes('/rest/')) {
console.log('[pplx-capture]', details.method, details.url);
if (details.requestBody?.raw) {
const body = new TextDecoder().decode(
new Uint8Array(details.requestBody.raw[0].bytes)
);
chrome.storage.local.set({
[`req_${Date.now()}`]: {
url: details.url,
method: details.method,
body: JSON.parse(body),
timestamp: Date.now()
}
});
}
}
},
{ urls: ["https://www.perplexity.ai/rest/*"] },
["requestBody"]
);Content Script — React State Extraction
内容脚本——React状态提取
javascript
// content.js — inject into page context
const script = document.createElement('script');
script.textContent = `
// Hook into React state updates
const origSetState = React.Component.prototype.setState;
React.Component.prototype.setState = function(state, cb) {
window.postMessage({
type: 'PPLX_STATE_UPDATE',
component: this.constructor.name,
state: JSON.parse(JSON.stringify(state))
}, '*');
return origSetState.call(this, state, cb);
};
`;
document.documentElement.appendChild(script);
// Listen for state updates
window.addEventListener('message', (event) => {
if (event.data.type === 'PPLX_STATE_UPDATE') {
chrome.runtime.sendMessage(event.data);
}
});javascript
// content.js — inject into page context
const script = document.createElement('script');
script.textContent = `
// Hook into React state updates
const origSetState = React.Component.prototype.setState;
React.Component.prototype.setState = function(state, cb) {
window.postMessage({
type: 'PPLX_STATE_UPDATE',
component: this.constructor.name,
state: JSON.parse(JSON.stringify(state))
}, '*');
return origSetState.call(this, state, cb);
};
`;
document.documentElement.appendChild(script);
// Listen for state updates
window.addEventListener('message', (event) => {
if (event.data.type === 'PPLX_STATE_UPDATE') {
chrome.runtime.sendMessage(event.data);
}
});Step 7: Map Discoveries to SDK
步骤7:将分析结果映射到SDK
| SPA Discovery | SDK Target | Action |
|---|---|---|
| React component state | | Model the state shape |
| API fetch calls | | Add endpoint methods |
| SSE event handlers | | Map event types |
| Service worker cache | | Understand caching behavior |
| Auth token flow | | Token refresh logic |
| WebSocket frames | | New WebSocket transport |
| GraphQL queries | | Query/mutation services |
| SPA分析结果 | SDK目标文件 | 操作 |
|---|---|---|
| React组件状态 | | 对状态结构进行建模 |
| API请求调用 | | 添加端点方法 |
| SSE事件处理器 | | 映射事件类型 |
| 服务工作线程缓存 | | 理解缓存行为 |
| 认证令牌流程 | | 实现令牌刷新逻辑 |
| WebSocket帧 | | 新增WebSocket传输模块 |
| GraphQL查询 | | 实现查询/变更服务 |
Step 8: SPA Source Code Graph
步骤8:SPA源代码图谱
After runtime analysis, build a static code graph of the SPA source. Delegate to for structural analysis.
codegraph完成运行时分析后,构建SPA源代码的静态代码图谱。委托进行结构分析。
codegraphSource Map Recovery
源映射恢复
bash
undefinedbash
undefinedExtract original source paths from source maps
Extract original source paths from source maps
curl -s https://www.perplexity.ai/ | grep -oP 'src="(/[^"]*.js)"' | while read -r url; do
echo "Checking: $url"
curl -s "https://www.perplexity.ai${url}.map" 2>/dev/null |
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null done | sort -u
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null done | sort -u
undefinedcurl -s https://www.perplexity.ai/ | grep -oP 'src="(/[^"]*.js)"' | while read -r url; do
echo "Checking: $url"
curl -s "https://www.perplexity.ai${url}.map" 2>/dev/null |
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null done | sort -u
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d.get('sources',[])))" 2>/dev/null done | sort -u
undefinedStatic Analysis (from recovered source or public repo)
静态分析(基于恢复的源代码或公开仓库)
bash
undefinedbash
undefinedComponent tree from source
Component tree from source
grep -rn "export (default )?function |export const .* = (" src/ --include=".tsx" --include=".jsx"
grep -rn "export (default )?function |export const .* = (" src/ --include=".tsx" --include=".jsx"
Import graph
Import graph
grep -rn "import .* from " src/ --include=".ts" --include=".tsx" |
awk -F: '{print $1 " → " $NF}' | sort -u
awk -F: '{print $1 " → " $NF}' | sort -u
grep -rn "import .* from " src/ --include=".ts" --include=".tsx" |
awk -F: '{print $1 " → " $NF}' | sort -u
awk -F: '{print $1 " → " $NF}' | sort -u
Hook usage map
Hook usage map
grep -rn "use[A-Z][a-zA-Z](" src/ --include=".tsx" |
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn
grep -rn "use[A-Z][a-zA-Z](" src/ --include=".tsx" |
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn
grep -oP 'use[A-Z][a-zA-Z]*' | sort | uniq -c | sort -rn
API call sites (fetch, axios, etc.)
API call sites (fetch, axios, etc.)
grep -rn "fetch(|axios.|api.|apiClient." src/ --include=".ts" --include=".tsx"
undefinedgrep -rn "fetch(|axios.|api.|apiClient." src/ --include=".ts" --include=".tsx"
undefinedCross-Reference: Runtime ↔ Static
交叉引用:运行时 ↔ 静态代码
| Runtime Discovery (spa-expert) | Static Discovery (codegraph) | Cross-Reference |
|---|---|---|
| Fiber tree component names | Source component definitions | Match names to source files |
| Hook state values | Hook implementations | Map state shape to hook logic |
| Network API calls | | Confirm endpoints in source |
| Context provider values | | Map runtime state to types |
| Service worker routes | Workbox config in source | Validate caching strategy |
| 运行时分析结果(spa-expert) | 静态代码分析结果(codegraph) | 交叉引用操作 |
|---|---|---|
| Fiber树组件名称 | 源代码组件定义 | 将组件名称与源文件匹配 |
| 钩子状态值 | 钩子实现代码 | 将状态结构与钩子逻辑映射 |
| 网络API调用 | | 在源代码中确认端点 |
| Context提供者值 | | 将运行时状态与类型映射 |
| 服务工作线程路由 | 源代码中的Workbox配置 | 验证缓存策略 |
Perplexity.ai SPA Notes
Perplexity.ai SPA相关说明
Known Stack
已知技术栈
- Framework: Next.js (React 18+)
- Bundler: Webpack (via Next.js, not raw Vite — skill covers both for broader SPA RE)
- State: React hooks + context (observed patterns)
- Streaming: SSE via fetch() with ReadableStream
- Auth: Cookie-based ()
pplx.session-id
- 框架:Next.js(React 18+)
- 打包工具:Webpack(通过Next.js,非原生Vite——本技能同时覆盖两者以支持更广泛的SPA逆向工程)
- 状态管理:React钩子 + Context(已观测到的模式)
- 流处理:通过fetch()结合ReadableStream实现SSE
- 认证:基于Cookie()
pplx.session-id
Key DOM Selectors
关键DOM选择器
javascript
// Query input
document.querySelector('textarea[placeholder*="Ask"]')
// Response area
document.querySelector('[class*="prose"]')
// Thread list
document.querySelector('[class*="thread"]')javascript
// Query input
document.querySelector('textarea[placeholder*="Ask"]')
// Response area
document.querySelector('[class*="prose"]')
// Thread list
document.querySelector('[class*="thread"]')