page-agent-web-automation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Page Agent Web Automation

Page Agent 网页自动化

Skill by ara.so — AI Agent Skills collection.
Page Agent is a JavaScript in-page GUI agent that enables controlling web interfaces through natural language commands. Unlike traditional browser automation tools, it runs directly in the webpage (no browser extension or headless browser required) and uses text-based DOM manipulation instead of screenshots.
ara.so提供的Skill — AI Agent技能合集。
Page Agent是一款JavaScript页内GUI Agent,支持通过自然语言命令控制网页界面。与传统浏览器自动化工具不同,它直接在网页中运行(无需浏览器扩展或无头浏览器),并使用基于文本的DOM操作而非截图。

Installation

安装

NPM Installation

NPM安装

bash
npm install page-agent
bash
npm install page-agent

CDN (Quick Testing)

CDN(快速测试)

For rapid prototyping with a demo LLM (evaluation purposes only):
html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js" crossorigin="true"></script>
Add
?autoInit=false
to prevent automatic initialization:
html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js?autoInit=false" crossorigin="true"></script>
<script>
  const agent = new window.PageAgent({...});
</script>
仅用于评估目的,使用演示LLM进行快速原型开发:
html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js" crossorigin="true"></script>
添加
?autoInit=false
以阻止自动初始化:
html
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js?autoInit=false" crossorigin="true"></script>
<script>
  const agent = new window.PageAgent({...});
</script>

Basic Usage

基本使用

Importing and Initialization

导入与初始化

typescript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
  language: 'en-US',
})
typescript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
  language: 'en-US',
})

Executing Commands

执行命令

typescript
// Simple command execution
await agent.execute('Click the login button')

// Form filling
await agent.execute('Fill in the email field with user@example.com')

// Multi-step workflow
await agent.execute('Search for "page agent" and click the first result')

// Navigation
await agent.execute('Go to the settings page')
typescript
// 简单命令执行
await agent.execute('Click the login button')

// 表单填写
await agent.execute('Fill in the email field with user@example.com')

// 多步骤工作流
await agent.execute('Search for "page agent" and click the first result')

// 页面导航
await agent.execute('Go to the settings page')

Configuration Options

配置选项

Basic Configuration

基础配置

typescript
const agent = new PageAgent({
  // LLM Configuration
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
  
  // Language settings
  language: 'en-US', // or 'zh-CN'
  
  // Optional: Custom system prompt
  systemPrompt: 'You are a helpful assistant...',
})
typescript
const agent = new PageAgent({
  // LLM配置
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
  
  // 语言设置
  language: 'en-US', // 或 'zh-CN'
  
  // 可选:自定义系统提示词
  systemPrompt: 'You are a helpful assistant...',
})

Advanced Configuration

高级配置

typescript
const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
  language: 'en-US',
  
  // Execution options
  maxSteps: 20, // Maximum execution steps
  timeout: 30000, // Timeout in milliseconds
  
  // Custom element selector strategy
  elementSelector: {
    includeInvisible: false,
    maxElements: 100,
  },
  
  // Debug mode
  debug: true,
})
typescript
const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
  language: 'en-US',
  
  // 执行选项
  maxSteps: 20, // 最大执行步骤
  timeout: 30000, // 超时时间(毫秒)
  
  // 自定义元素选择器策略
  elementSelector: {
    includeInvisible: false,
    maxElements: 100,
  },
  
  // 调试模式
  debug: true,
})

Supported LLM Providers

支持的LLM提供商

OpenAI

OpenAI

typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
})
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
})

Anthropic Claude

Anthropic Claude

typescript
const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
})
typescript
const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
})

Alibaba Qwen (DashScope)

阿里云Qwen(DashScope)

typescript
const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
})
typescript
const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
})

Azure OpenAI

Azure OpenAI

typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.AZURE_OPENAI_ENDPOINT,
  apiKey: process.env.AZURE_OPENAI_API_KEY,
})
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.AZURE_OPENAI_ENDPOINT,
  apiKey: process.env.AZURE_OPENAI_API_KEY,
})

Common Usage Patterns

常见使用场景

SaaS Copilot Integration

SaaS Copilot集成

typescript
import { PageAgent } from 'page-agent'

class SaaSCopilot {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.LLM_BASE_URL,
      apiKey: process.env.LLM_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleUserCommand(command: string) {
    try {
      const result = await this.agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error('Copilot error:', error)
      return { success: false, error: error.message }
    }
  }
  
  async autoFillForm(formData: Record<string, string>) {
    const commands = Object.entries(formData).map(
      ([field, value]) => `Fill ${field} with ${value}`
    )
    
    for (const command of commands) {
      await this.agent.execute(command)
    }
  }
}

// Usage
const copilot = new SaaSCopilot()
await copilot.handleUserCommand('Create a new project named "Website Redesign"')
typescript
import { PageAgent } from 'page-agent'

class SaaSCopilot {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.LLM_BASE_URL,
      apiKey: process.env.LLM_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleUserCommand(command: string) {
    try {
      const result = await this.agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error('Copilot error:', error)
      return { success: false, error: error.message }
    }
  }
  
  async autoFillForm(formData: Record<string, string>) {
    const commands = Object.entries(formData).map(
      ([field, value]) => `Fill ${field} with ${value}`
    )
    
    for (const command of commands) {
      await this.agent.execute(command)
    }
  }
}

// 使用示例
const copilot = new SaaSCopilot()
await copilot.handleUserCommand('Create a new project named "Website Redesign"')

Form Automation

表单自动化

typescript
import { PageAgent } from 'page-agent'

async function automateFormFilling() {
  const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: process.env.DASHSCOPE_API_KEY,
  })
  
  // Smart form filling with natural language
  await agent.execute(`
    Fill out the registration form:
    - First name: John
    - Last name: Doe
    - Email: john.doe@example.com
    - Password: Use a strong password
    - Check the terms and conditions checkbox
    - Click submit
  `)
}
typescript
import { PageAgent } from 'page-agent'

async function automateFormFilling() {
  const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: process.env.DASHSCOPE_API_KEY,
  })
  
  // 使用自然语言智能填写表单
  await agent.execute(`
    Fill out the registration form:
    - First name: John
    - Last name: Doe
    - Email: john.doe@example.com
    - Password: Use a strong password
    - Check the terms and conditions checkbox
    - Click submit
  `)
}

Accessibility Enhancement

可访问性增强

typescript
import { PageAgent } from 'page-agent'

class AccessibilityAgent {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.OPENAI_BASE_URL,
      apiKey: process.env.OPENAI_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleVoiceCommand(voiceTranscript: string) {
    // Convert voice commands to actions
    await this.agent.execute(voiceTranscript)
  }
  
  async describeCurrentPage() {
    // Use agent to describe page content for screen readers
    const description = await this.agent.execute(
      'Describe what is visible on this page'
    )
    return description
  }
}
typescript
import { PageAgent } from 'page-agent'

class AccessibilityAgent {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.OPENAI_BASE_URL,
      apiKey: process.env.OPENAI_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleVoiceCommand(voiceTranscript: string) {
    // 将语音命令转换为操作
    await this.agent.execute(voiceTranscript)
  }
  
  async describeCurrentPage() {
    // 使用Agent为屏幕阅读器描述页面内容
    const description = await this.agent.execute(
      'Describe what is visible on this page'
    )
    return description
  }
}

Multi-Step Workflow

多步骤工作流

typescript
import { PageAgent } from 'page-agent'

async function complexWorkflow() {
  const agent = new PageAgent({
    model: 'claude-3-5-sonnet-20241022',
    baseURL: 'https://api.anthropic.com/v1',
    apiKey: process.env.ANTHROPIC_API_KEY,
  })
  
  // Execute complex multi-step task
  await agent.execute(`
    1. Navigate to the products page
    2. Filter by category "Electronics"
    3. Sort by price (low to high)
    4. Add the first three items to cart
    5. Go to checkout
  `)
}
typescript
import { PageAgent } from 'page-agent'

async function complexWorkflow() {
  const agent = new PageAgent({
    model: 'claude-3-5-sonnet-20241022',
    baseURL: 'https://api.anthropic.com/v1',
    apiKey: process.env.ANTHROPIC_API_KEY,
  })
  
  // 执行复杂的多步骤任务
  await agent.execute(`
    1. Navigate to the products page
    2. Filter by category "Electronics"
    3. Sort by price (low to high)
    4. Add the first three items to cart
    5. Go to checkout
  `)
}

Error Handling and Retry

错误处理与重试

typescript
import { PageAgent } from 'page-agent'

async function executeWithRetry(command: string, maxRetries = 3) {
  const agent = new PageAgent({
    model: 'gpt-4',
    baseURL: process.env.OPENAI_BASE_URL,
    apiKey: process.env.OPENAI_API_KEY,
  })
  
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error(`Attempt ${attempt} failed:`, error)
      
      if (attempt === maxRetries) {
        return { success: false, error: error.message }
      }
      
      // Wait before retry
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt))
    }
  }
}
typescript
import { PageAgent } from 'page-agent'

async function executeWithRetry(command: string, maxRetries = 3) {
  const agent = new PageAgent({
    model: 'gpt-4',
    baseURL: process.env.OPENAI_BASE_URL,
    apiKey: process.env.OPENAI_API_KEY,
  })
  
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error(`Attempt ${attempt} failed:`, error)
      
      if (attempt === maxRetries) {
        return { success: false, error: error.message }
      }
      
      // 重试前等待
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt))
    }
  }
}

Browser Extension (Multi-Page Tasks)

浏览器扩展(多页面任务)

For cross-tab automation, Page Agent provides a Chrome extension:
typescript
// In your extension background script
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// Execute commands across multiple tabs
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === 'EXECUTE_COMMAND') {
    agent.execute(message.command)
      .then(result => sendResponse({ success: true, result }))
      .catch(error => sendResponse({ success: false, error: error.message }))
    return true // Keep channel open for async response
  }
})
针对跨标签页自动化,Page Agent提供了Chrome扩展:
typescript
// 在你的扩展后台脚本中
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// 跨多个标签页执行命令
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === 'EXECUTE_COMMAND') {
    agent.execute(message.command)
      .then(result => sendResponse({ success: true, result }))
      .catch(error => sendResponse({ success: false, error: error.message }))
    return true // 保持通道开放以获取异步响应
  }
})

MCP Server (Beta)

MCP服务器(测试版)

Page Agent includes an MCP (Model Context Protocol) server for external control:
bash
undefined
Page Agent包含一个用于外部控制的MCP(Model Context Protocol)服务器:
bash
undefined

Start MCP server

启动MCP服务器

npx page-agent-mcp

Configure in your MCP client (e.g., Claude Desktop):

```json
{
  "mcpServers": {
    "page-agent": {
      "command": "npx",
      "args": ["page-agent-mcp"]
    }
  }
}
npx page-agent-mcp

在你的MCP客户端(如Claude Desktop)中配置:

```json
{
  "mcpServers": {
    "page-agent": {
      "command": "npx",
      "args": ["page-agent-mcp"]
    }
  }
}

Programmatic API

程序化API

Event Listeners

事件监听器

typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// Listen to execution events
agent.on('step', (stepData) => {
  console.log('Agent step:', stepData)
})

agent.on('complete', (result) => {
  console.log('Execution complete:', result)
})

agent.on('error', (error) => {
  console.error('Agent error:', error)
})
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// 监听执行事件
agent.on('step', (stepData) => {
  console.log('Agent step:', stepData)
})

agent.on('complete', (result) => {
  console.log('Execution complete:', result)
})

agent.on('error', (error) => {
  console.error('Agent error:', error)
})

Custom Actions

自定义操作

typescript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// Register custom action
agent.registerAction('sendEmail', async (params) => {
  // Custom email sending logic
  await sendEmail(params.to, params.subject, params.body)
  return { success: true }
})

// Use custom action
await agent.execute('Send an email to team@example.com with subject "Update"')
typescript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// 注册自定义操作
agent.registerAction('sendEmail', async (params) => {
  // 自定义邮件发送逻辑
  await sendEmail(params.to, params.subject, params.body)
  return { success: true }
})

// 使用自定义操作
await agent.execute('Send an email to team@example.com with subject "Update"')

Troubleshooting

故障排除

Agent Not Finding Elements

Agent无法找到元素

Problem: Agent fails to locate buttons or form fields.
Solutions:
  • Ensure elements have proper labels or accessible names
  • Check that elements are visible (not
    display: none
    or
    visibility: hidden
    )
  • Increase
    maxElements
    in configuration
  • Use more specific descriptions in commands
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  elementSelector: {
    includeInvisible: false,
    maxElements: 200, // Increase if needed
  },
})
问题:Agent无法定位按钮或表单字段。
解决方案
  • 确保元素有合适的标签或可访问名称
  • 检查元素是否可见(非
    display: none
    visibility: hidden
    状态)
  • 增加配置中的
    maxElements
  • 在命令中使用更具体的描述
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  elementSelector: {
    includeInvisible: false,
    maxElements: 200, // 必要时增加
  },
})

LLM API Errors

LLM API错误

Problem: API key or connection errors.
Solutions:
  • Verify API key is correctly set:
    console.log(process.env.OPENAI_API_KEY)
  • Check baseURL matches your provider
  • Ensure model name is correct for your provider
  • Check network connectivity and CORS settings
typescript
// Debug API configuration
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  debug: true, // Enable debug logging
})
问题:API密钥或连接错误。
解决方案
  • 验证API密钥是否正确设置:
    console.log(process.env.OPENAI_API_KEY)
  • 检查baseURL是否与你的提供商匹配
  • 确保模型名称与你的提供商一致
  • 检查网络连接和CORS设置
typescript
// 调试API配置
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  debug: true, // 启用调试日志
})

Timeout Issues

超时问题

Problem: Commands timeout before completion.
Solutions:
  • Increase timeout value
  • Break complex commands into smaller steps
  • Optimize page performance
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60000, // 60 seconds
  maxSteps: 30,
})
问题:命令在完成前超时。
解决方案
  • 增加超时值
  • 将复杂命令拆分为更小的步骤
  • 优化页面性能
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60000, // 60秒
  maxSteps: 30,
})

CSP (Content Security Policy) Issues

CSP(内容安全策略)问题

Problem: Script blocked by CSP when using CDN.
Solutions:
  • Add Page Agent CDN to your CSP header
  • Use NPM package instead of CDN
  • Update CSP meta tag
html
<meta http-equiv="Content-Security-Policy" 
      content="script-src 'self' https://cdn.jsdelivr.net;">
问题:使用CDN时脚本被CSP阻止。
解决方案
  • 将Page Agent CDN添加到你的CSP头中
  • 使用NPM包替代CDN
  • 更新CSP元标签
html
<meta http-equiv="Content-Security-Policy" 
      content="script-src 'self' https://cdn.jsdelivr.net;">

Language/Locale Issues

语言/区域设置问题

Problem: Agent responds in wrong language.
Solution: Set explicit language configuration
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  language: 'en-US', // or 'zh-CN'
})
问题:Agent使用错误的语言响应。
解决方案:设置明确的语言配置
typescript
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  language: 'en-US', // 或 'zh-CN'
})

Best Practices

最佳实践

  1. Use Environment Variables: Never hardcode API keys
  2. Specific Commands: More specific natural language commands work better
  3. Error Handling: Always wrap execute() calls in try-catch
  4. Rate Limiting: Implement delays between commands if making many requests
  5. Element Visibility: Ensure target elements are visible before commands execute
  6. Security: Validate and sanitize user input before passing to agent
  7. Testing: Test with demo LLM first before production deployment
  1. 使用环境变量:永远不要硬编码API密钥
  2. 命令具体化:更具体的自然语言命令效果更好
  3. 错误处理:始终将execute()调用包裹在try-catch中
  4. 速率限制:如果发起大量请求,在命令之间设置延迟
  5. 元素可见性:执行命令前确保目标元素可见
  6. 安全保障:将用户输入传递给Agent前进行验证和清理
  7. 测试验证:在生产部署前先使用演示LLM进行测试

Resources

资源