page-agent-web-automation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePage Agent Web Automation
Page Agent 网页自动化
Skill by ara.so — AI Agent Skills collection.
Page Agent is a JavaScript in-page GUI agent that enables controlling web interfaces through natural language commands. Unlike traditional browser automation tools, it runs directly in the webpage (no browser extension or headless browser required) and uses text-based DOM manipulation instead of screenshots.
由ara.so提供的Skill — AI Agent技能合集。
Page Agent是一款JavaScript页内GUI Agent,支持通过自然语言命令控制网页界面。与传统浏览器自动化工具不同,它直接在网页中运行(无需浏览器扩展或无头浏览器),并使用基于文本的DOM操作而非截图。
Installation
安装
NPM Installation
NPM安装
bash
npm install page-agentbash
npm install page-agentCDN (Quick Testing)
CDN(快速测试)
For rapid prototyping with a demo LLM (evaluation purposes only):
html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js" crossorigin="true"></script>Add to prevent automatic initialization:
?autoInit=falsehtml
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js?autoInit=false" crossorigin="true"></script>
<script>
const agent = new window.PageAgent({...});
</script>仅用于评估目的,使用演示LLM进行快速原型开发:
html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js" crossorigin="true"></script>添加以阻止自动初始化:
?autoInit=falsehtml
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js?autoInit=false" crossorigin="true"></script>
<script>
const agent = new window.PageAgent({...});
</script>Basic Usage
基本使用
Importing and Initialization
导入与初始化
typescript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.DASHSCOPE_API_KEY,
language: 'en-US',
})typescript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.DASHSCOPE_API_KEY,
language: 'en-US',
})Executing Commands
执行命令
typescript
// Simple command execution
await agent.execute('Click the login button')
// Form filling
await agent.execute('Fill in the email field with user@example.com')
// Multi-step workflow
await agent.execute('Search for "page agent" and click the first result')
// Navigation
await agent.execute('Go to the settings page')typescript
// 简单命令执行
await agent.execute('Click the login button')
// 表单填写
await agent.execute('Fill in the email field with user@example.com')
// 多步骤工作流
await agent.execute('Search for "page agent" and click the first result')
// 页面导航
await agent.execute('Go to the settings page')Configuration Options
配置选项
Basic Configuration
基础配置
typescript
const agent = new PageAgent({
// LLM Configuration
model: 'gpt-4',
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
// Language settings
language: 'en-US', // or 'zh-CN'
// Optional: Custom system prompt
systemPrompt: 'You are a helpful assistant...',
})typescript
const agent = new PageAgent({
// LLM配置
model: 'gpt-4',
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
// 语言设置
language: 'en-US', // 或 'zh-CN'
// 可选:自定义系统提示词
systemPrompt: 'You are a helpful assistant...',
})Advanced Configuration
高级配置
typescript
const agent = new PageAgent({
model: 'claude-3-5-sonnet-20241022',
baseURL: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY,
language: 'en-US',
// Execution options
maxSteps: 20, // Maximum execution steps
timeout: 30000, // Timeout in milliseconds
// Custom element selector strategy
elementSelector: {
includeInvisible: false,
maxElements: 100,
},
// Debug mode
debug: true,
})typescript
const agent = new PageAgent({
model: 'claude-3-5-sonnet-20241022',
baseURL: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY,
language: 'en-US',
// 执行选项
maxSteps: 20, // 最大执行步骤
timeout: 30000, // 超时时间(毫秒)
// 自定义元素选择器策略
elementSelector: {
includeInvisible: false,
maxElements: 100,
},
// 调试模式
debug: true,
})Supported LLM Providers
支持的LLM提供商
OpenAI
OpenAI
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
})typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY,
})Anthropic Claude
Anthropic Claude
typescript
const agent = new PageAgent({
model: 'claude-3-5-sonnet-20241022',
baseURL: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY,
})typescript
const agent = new PageAgent({
model: 'claude-3-5-sonnet-20241022',
baseURL: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY,
})Alibaba Qwen (DashScope)
阿里云Qwen(DashScope)
typescript
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.DASHSCOPE_API_KEY,
})typescript
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.DASHSCOPE_API_KEY,
})Azure OpenAI
Azure OpenAI
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.AZURE_OPENAI_ENDPOINT,
apiKey: process.env.AZURE_OPENAI_API_KEY,
})typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.AZURE_OPENAI_ENDPOINT,
apiKey: process.env.AZURE_OPENAI_API_KEY,
})Common Usage Patterns
常见使用场景
SaaS Copilot Integration
SaaS Copilot集成
typescript
import { PageAgent } from 'page-agent'
class SaaSCopilot {
private agent: PageAgent
constructor() {
this.agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.LLM_BASE_URL,
apiKey: process.env.LLM_API_KEY,
language: 'en-US',
})
}
async handleUserCommand(command: string) {
try {
const result = await this.agent.execute(command)
return { success: true, result }
} catch (error) {
console.error('Copilot error:', error)
return { success: false, error: error.message }
}
}
async autoFillForm(formData: Record<string, string>) {
const commands = Object.entries(formData).map(
([field, value]) => `Fill ${field} with ${value}`
)
for (const command of commands) {
await this.agent.execute(command)
}
}
}
// Usage
const copilot = new SaaSCopilot()
await copilot.handleUserCommand('Create a new project named "Website Redesign"')typescript
import { PageAgent } from 'page-agent'
class SaaSCopilot {
private agent: PageAgent
constructor() {
this.agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.LLM_BASE_URL,
apiKey: process.env.LLM_API_KEY,
language: 'en-US',
})
}
async handleUserCommand(command: string) {
try {
const result = await this.agent.execute(command)
return { success: true, result }
} catch (error) {
console.error('Copilot error:', error)
return { success: false, error: error.message }
}
}
async autoFillForm(formData: Record<string, string>) {
const commands = Object.entries(formData).map(
([field, value]) => `Fill ${field} with ${value}`
)
for (const command of commands) {
await this.agent.execute(command)
}
}
}
// 使用示例
const copilot = new SaaSCopilot()
await copilot.handleUserCommand('Create a new project named "Website Redesign"')Form Automation
表单自动化
typescript
import { PageAgent } from 'page-agent'
async function automateFormFilling() {
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.DASHSCOPE_API_KEY,
})
// Smart form filling with natural language
await agent.execute(`
Fill out the registration form:
- First name: John
- Last name: Doe
- Email: john.doe@example.com
- Password: Use a strong password
- Check the terms and conditions checkbox
- Click submit
`)
}typescript
import { PageAgent } from 'page-agent'
async function automateFormFilling() {
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: process.env.DASHSCOPE_API_KEY,
})
// 使用自然语言智能填写表单
await agent.execute(`
Fill out the registration form:
- First name: John
- Last name: Doe
- Email: john.doe@example.com
- Password: Use a strong password
- Check the terms and conditions checkbox
- Click submit
`)
}Accessibility Enhancement
可访问性增强
typescript
import { PageAgent } from 'page-agent'
class AccessibilityAgent {
private agent: PageAgent
constructor() {
this.agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US',
})
}
async handleVoiceCommand(voiceTranscript: string) {
// Convert voice commands to actions
await this.agent.execute(voiceTranscript)
}
async describeCurrentPage() {
// Use agent to describe page content for screen readers
const description = await this.agent.execute(
'Describe what is visible on this page'
)
return description
}
}typescript
import { PageAgent } from 'page-agent'
class AccessibilityAgent {
private agent: PageAgent
constructor() {
this.agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US',
})
}
async handleVoiceCommand(voiceTranscript: string) {
// 将语音命令转换为操作
await this.agent.execute(voiceTranscript)
}
async describeCurrentPage() {
// 使用Agent为屏幕阅读器描述页面内容
const description = await this.agent.execute(
'Describe what is visible on this page'
)
return description
}
}Multi-Step Workflow
多步骤工作流
typescript
import { PageAgent } from 'page-agent'
async function complexWorkflow() {
const agent = new PageAgent({
model: 'claude-3-5-sonnet-20241022',
baseURL: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY,
})
// Execute complex multi-step task
await agent.execute(`
1. Navigate to the products page
2. Filter by category "Electronics"
3. Sort by price (low to high)
4. Add the first three items to cart
5. Go to checkout
`)
}typescript
import { PageAgent } from 'page-agent'
async function complexWorkflow() {
const agent = new PageAgent({
model: 'claude-3-5-sonnet-20241022',
baseURL: 'https://api.anthropic.com/v1',
apiKey: process.env.ANTHROPIC_API_KEY,
})
// 执行复杂的多步骤任务
await agent.execute(`
1. Navigate to the products page
2. Filter by category "Electronics"
3. Sort by price (low to high)
4. Add the first three items to cart
5. Go to checkout
`)
}Error Handling and Retry
错误处理与重试
typescript
import { PageAgent } from 'page-agent'
async function executeWithRetry(command: string, maxRetries = 3) {
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await agent.execute(command)
return { success: true, result }
} catch (error) {
console.error(`Attempt ${attempt} failed:`, error)
if (attempt === maxRetries) {
return { success: false, error: error.message }
}
// Wait before retry
await new Promise(resolve => setTimeout(resolve, 1000 * attempt))
}
}
}typescript
import { PageAgent } from 'page-agent'
async function executeWithRetry(command: string, maxRetries = 3) {
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await agent.execute(command)
return { success: true, result }
} catch (error) {
console.error(`Attempt ${attempt} failed:`, error)
if (attempt === maxRetries) {
return { success: false, error: error.message }
}
// 重试前等待
await new Promise(resolve => setTimeout(resolve, 1000 * attempt))
}
}
}Browser Extension (Multi-Page Tasks)
浏览器扩展(多页面任务)
For cross-tab automation, Page Agent provides a Chrome extension:
typescript
// In your extension background script
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
// Execute commands across multiple tabs
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
if (message.type === 'EXECUTE_COMMAND') {
agent.execute(message.command)
.then(result => sendResponse({ success: true, result }))
.catch(error => sendResponse({ success: false, error: error.message }))
return true // Keep channel open for async response
}
})针对跨标签页自动化,Page Agent提供了Chrome扩展:
typescript
// 在你的扩展后台脚本中
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
// 跨多个标签页执行命令
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
if (message.type === 'EXECUTE_COMMAND') {
agent.execute(message.command)
.then(result => sendResponse({ success: true, result }))
.catch(error => sendResponse({ success: false, error: error.message }))
return true // 保持通道开放以获取异步响应
}
})MCP Server (Beta)
MCP服务器(测试版)
Page Agent includes an MCP (Model Context Protocol) server for external control:
bash
undefinedPage Agent包含一个用于外部控制的MCP(Model Context Protocol)服务器:
bash
undefinedStart MCP server
启动MCP服务器
npx page-agent-mcp
Configure in your MCP client (e.g., Claude Desktop):
```json
{
"mcpServers": {
"page-agent": {
"command": "npx",
"args": ["page-agent-mcp"]
}
}
}npx page-agent-mcp
在你的MCP客户端(如Claude Desktop)中配置:
```json
{
"mcpServers": {
"page-agent": {
"command": "npx",
"args": ["page-agent-mcp"]
}
}
}Programmatic API
程序化API
Event Listeners
事件监听器
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
// Listen to execution events
agent.on('step', (stepData) => {
console.log('Agent step:', stepData)
})
agent.on('complete', (result) => {
console.log('Execution complete:', result)
})
agent.on('error', (error) => {
console.error('Agent error:', error)
})typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
// 监听执行事件
agent.on('step', (stepData) => {
console.log('Agent step:', stepData)
})
agent.on('complete', (result) => {
console.log('Execution complete:', result)
})
agent.on('error', (error) => {
console.error('Agent error:', error)
})Custom Actions
自定义操作
typescript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
// Register custom action
agent.registerAction('sendEmail', async (params) => {
// Custom email sending logic
await sendEmail(params.to, params.subject, params.body)
return { success: true }
})
// Use custom action
await agent.execute('Send an email to team@example.com with subject "Update"')typescript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
})
// 注册自定义操作
agent.registerAction('sendEmail', async (params) => {
// 自定义邮件发送逻辑
await sendEmail(params.to, params.subject, params.body)
return { success: true }
})
// 使用自定义操作
await agent.execute('Send an email to team@example.com with subject "Update"')Troubleshooting
故障排除
Agent Not Finding Elements
Agent无法找到元素
Problem: Agent fails to locate buttons or form fields.
Solutions:
- Ensure elements have proper labels or accessible names
- Check that elements are visible (not or
display: none)visibility: hidden - Increase in configuration
maxElements - Use more specific descriptions in commands
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
elementSelector: {
includeInvisible: false,
maxElements: 200, // Increase if needed
},
})问题:Agent无法定位按钮或表单字段。
解决方案:
- 确保元素有合适的标签或可访问名称
- 检查元素是否可见(非或
display: none状态)visibility: hidden - 增加配置中的值
maxElements - 在命令中使用更具体的描述
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
elementSelector: {
includeInvisible: false,
maxElements: 200, // 必要时增加
},
})LLM API Errors
LLM API错误
Problem: API key or connection errors.
Solutions:
- Verify API key is correctly set:
console.log(process.env.OPENAI_API_KEY) - Check baseURL matches your provider
- Ensure model name is correct for your provider
- Check network connectivity and CORS settings
typescript
// Debug API configuration
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
debug: true, // Enable debug logging
})问题:API密钥或连接错误。
解决方案:
- 验证API密钥是否正确设置:
console.log(process.env.OPENAI_API_KEY) - 检查baseURL是否与你的提供商匹配
- 确保模型名称与你的提供商一致
- 检查网络连接和CORS设置
typescript
// 调试API配置
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
debug: true, // 启用调试日志
})Timeout Issues
超时问题
Problem: Commands timeout before completion.
Solutions:
- Increase timeout value
- Break complex commands into smaller steps
- Optimize page performance
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
timeout: 60000, // 60 seconds
maxSteps: 30,
})问题:命令在完成前超时。
解决方案:
- 增加超时值
- 将复杂命令拆分为更小的步骤
- 优化页面性能
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
timeout: 60000, // 60秒
maxSteps: 30,
})CSP (Content Security Policy) Issues
CSP(内容安全策略)问题
Problem: Script blocked by CSP when using CDN.
Solutions:
- Add Page Agent CDN to your CSP header
- Use NPM package instead of CDN
- Update CSP meta tag
html
<meta http-equiv="Content-Security-Policy"
content="script-src 'self' https://cdn.jsdelivr.net;">问题:使用CDN时脚本被CSP阻止。
解决方案:
- 将Page Agent CDN添加到你的CSP头中
- 使用NPM包替代CDN
- 更新CSP元标签
html
<meta http-equiv="Content-Security-Policy"
content="script-src 'self' https://cdn.jsdelivr.net;">Language/Locale Issues
语言/区域设置问题
Problem: Agent responds in wrong language.
Solution: Set explicit language configuration
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US', // or 'zh-CN'
})问题:Agent使用错误的语言响应。
解决方案:设置明确的语言配置
typescript
const agent = new PageAgent({
model: 'gpt-4',
baseURL: process.env.OPENAI_BASE_URL,
apiKey: process.env.OPENAI_API_KEY,
language: 'en-US', // 或 'zh-CN'
})Best Practices
最佳实践
- Use Environment Variables: Never hardcode API keys
- Specific Commands: More specific natural language commands work better
- Error Handling: Always wrap execute() calls in try-catch
- Rate Limiting: Implement delays between commands if making many requests
- Element Visibility: Ensure target elements are visible before commands execute
- Security: Validate and sanitize user input before passing to agent
- Testing: Test with demo LLM first before production deployment
- 使用环境变量:永远不要硬编码API密钥
- 命令具体化:更具体的自然语言命令效果更好
- 错误处理:始终将execute()调用包裹在try-catch中
- 速率限制:如果发起大量请求,在命令之间设置延迟
- 元素可见性:执行命令前确保目标元素可见
- 安全保障:将用户输入传递给Agent前进行验证和清理
- 测试验证:在生产部署前先使用演示LLM进行测试
Resources
资源
- Documentation: https://alibaba.github.io/page-agent/docs/introduction/overview
- GitHub: https://github.com/alibaba/page-agent
- NPM Package: https://www.npmjs.com/package/page-agent
- License: MIT