page-agent-web-automation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Page Agent Web Automation

Page Agent 网页自动化

Skill by ara.so — AI Agent Skills collection.

Page Agent is a JavaScript in-page GUI agent that enables controlling web interfaces through natural language commands. Unlike traditional browser automation tools, it runs directly in the webpage (no browser extension or headless browser required) and uses text-based DOM manipulation instead of screenshots.

由ara.so提供的Skill — AI Agent技能合集。

Page Agent是一款JavaScript页内GUI Agent，支持通过自然语言命令控制网页界面。与传统浏览器自动化工具不同，它直接在网页中运行（无需浏览器扩展或无头浏览器），并使用基于文本的DOM操作而非截图。

Installation

安装

NPM Installation

NPM安装

bash

npm install page-agent

bash

npm install page-agent

CDN (Quick Testing)

CDN（快速测试）

For rapid prototyping with a demo LLM (evaluation purposes only):

html

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js" crossorigin="true"></script>

Add

?autoInit=false

to prevent automatic initialization:

html

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js?autoInit=false" crossorigin="true"></script>
<script>
  const agent = new window.PageAgent({...});
</script>

仅用于评估目的，使用演示LLM进行快速原型开发：

html

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js" crossorigin="true"></script>

添加

?autoInit=false

以阻止自动初始化：

html

<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.2/dist/iife/page-agent.demo.js?autoInit=false" crossorigin="true"></script>
<script>
  const agent = new window.PageAgent({...});
</script>

Basic Usage

基本使用

Importing and Initialization

导入与初始化

typescript

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
  language: 'en-US',
})

typescript

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
  language: 'en-US',
})

Executing Commands

执行命令

typescript

// Simple command execution
await agent.execute('Click the login button')

// Form filling
await agent.execute('Fill in the email field with user@example.com')

// Multi-step workflow
await agent.execute('Search for "page agent" and click the first result')

// Navigation
await agent.execute('Go to the settings page')

typescript

// 简单命令执行
await agent.execute('Click the login button')

// 表单填写
await agent.execute('Fill in the email field with user@example.com')

// 多步骤工作流
await agent.execute('Search for "page agent" and click the first result')

// 页面导航
await agent.execute('Go to the settings page')

Configuration Options

配置选项

Basic Configuration

基础配置

typescript

const agent = new PageAgent({
  // LLM Configuration
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
  
  // Language settings
  language: 'en-US', // or 'zh-CN'
  
  // Optional: Custom system prompt
  systemPrompt: 'You are a helpful assistant...',
})

typescript

const agent = new PageAgent({
  // LLM配置
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
  
  // 语言设置
  language: 'en-US', // 或 'zh-CN'
  
  // 可选：自定义系统提示词
  systemPrompt: 'You are a helpful assistant...',
})

Advanced Configuration

高级配置

typescript

const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
  language: 'en-US',
  
  // Execution options
  maxSteps: 20, // Maximum execution steps
  timeout: 30000, // Timeout in milliseconds
  
  // Custom element selector strategy
  elementSelector: {
    includeInvisible: false,
    maxElements: 100,
  },
  
  // Debug mode
  debug: true,
})

typescript

const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
  language: 'en-US',
  
  // 执行选项
  maxSteps: 20, // 最大执行步骤
  timeout: 30000, // 超时时间（毫秒）
  
  // 自定义元素选择器策略
  elementSelector: {
    includeInvisible: false,
    maxElements: 100,
  },
  
  // 调试模式
  debug: true,
})

Supported LLM Providers

支持的LLM提供商

OpenAI

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
})

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
})

Anthropic Claude

typescript

const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
})

typescript

const agent = new PageAgent({
  model: 'claude-3-5-sonnet-20241022',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
})

Alibaba Qwen (DashScope)

阿里云Qwen（DashScope）

typescript

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
})

typescript

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: process.env.DASHSCOPE_API_KEY,
})

Azure OpenAI

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.AZURE_OPENAI_ENDPOINT,
  apiKey: process.env.AZURE_OPENAI_API_KEY,
})

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.AZURE_OPENAI_ENDPOINT,
  apiKey: process.env.AZURE_OPENAI_API_KEY,
})

Common Usage Patterns

常见使用场景

SaaS Copilot Integration

SaaS Copilot集成

typescript

import { PageAgent } from 'page-agent'

class SaaSCopilot {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.LLM_BASE_URL,
      apiKey: process.env.LLM_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleUserCommand(command: string) {
    try {
      const result = await this.agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error('Copilot error:', error)
      return { success: false, error: error.message }
    }
  }
  
  async autoFillForm(formData: Record<string, string>) {
    const commands = Object.entries(formData).map(
      ([field, value]) => `Fill ${field} with ${value}`
    )
    
    for (const command of commands) {
      await this.agent.execute(command)
    }
  }
}

// Usage
const copilot = new SaaSCopilot()
await copilot.handleUserCommand('Create a new project named "Website Redesign"')

typescript

import { PageAgent } from 'page-agent'

class SaaSCopilot {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.LLM_BASE_URL,
      apiKey: process.env.LLM_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleUserCommand(command: string) {
    try {
      const result = await this.agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error('Copilot error:', error)
      return { success: false, error: error.message }
    }
  }
  
  async autoFillForm(formData: Record<string, string>) {
    const commands = Object.entries(formData).map(
      ([field, value]) => `Fill ${field} with ${value}`
    )
    
    for (const command of commands) {
      await this.agent.execute(command)
    }
  }
}

// 使用示例
const copilot = new SaaSCopilot()
await copilot.handleUserCommand('Create a new project named "Website Redesign"')

Form Automation

表单自动化

typescript

import { PageAgent } from 'page-agent'

async function automateFormFilling() {
  const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: process.env.DASHSCOPE_API_KEY,
  })
  
  // Smart form filling with natural language
  await agent.execute(`
    Fill out the registration form:
    - First name: John
    - Last name: Doe
    - Email: john.doe@example.com
    - Password: Use a strong password
    - Check the terms and conditions checkbox
    - Click submit
  `)
}

typescript

import { PageAgent } from 'page-agent'

async function automateFormFilling() {
  const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: process.env.DASHSCOPE_API_KEY,
  })
  
  // 使用自然语言智能填写表单
  await agent.execute(`
    Fill out the registration form:
    - First name: John
    - Last name: Doe
    - Email: john.doe@example.com
    - Password: Use a strong password
    - Check the terms and conditions checkbox
    - Click submit
  `)
}

Accessibility Enhancement

可访问性增强

typescript

import { PageAgent } from 'page-agent'

class AccessibilityAgent {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.OPENAI_BASE_URL,
      apiKey: process.env.OPENAI_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleVoiceCommand(voiceTranscript: string) {
    // Convert voice commands to actions
    await this.agent.execute(voiceTranscript)
  }
  
  async describeCurrentPage() {
    // Use agent to describe page content for screen readers
    const description = await this.agent.execute(
      'Describe what is visible on this page'
    )
    return description
  }
}

typescript

import { PageAgent } from 'page-agent'

class AccessibilityAgent {
  private agent: PageAgent
  
  constructor() {
    this.agent = new PageAgent({
      model: 'gpt-4',
      baseURL: process.env.OPENAI_BASE_URL,
      apiKey: process.env.OPENAI_API_KEY,
      language: 'en-US',
    })
  }
  
  async handleVoiceCommand(voiceTranscript: string) {
    // 将语音命令转换为操作
    await this.agent.execute(voiceTranscript)
  }
  
  async describeCurrentPage() {
    // 使用Agent为屏幕阅读器描述页面内容
    const description = await this.agent.execute(
      'Describe what is visible on this page'
    )
    return description
  }
}

Multi-Step Workflow

多步骤工作流

typescript

import { PageAgent } from 'page-agent'

async function complexWorkflow() {
  const agent = new PageAgent({
    model: 'claude-3-5-sonnet-20241022',
    baseURL: 'https://api.anthropic.com/v1',
    apiKey: process.env.ANTHROPIC_API_KEY,
  })
  
  // Execute complex multi-step task
  await agent.execute(`
    1. Navigate to the products page
    2. Filter by category "Electronics"
    3. Sort by price (low to high)
    4. Add the first three items to cart
    5. Go to checkout
  `)
}

typescript

import { PageAgent } from 'page-agent'

async function complexWorkflow() {
  const agent = new PageAgent({
    model: 'claude-3-5-sonnet-20241022',
    baseURL: 'https://api.anthropic.com/v1',
    apiKey: process.env.ANTHROPIC_API_KEY,
  })
  
  // 执行复杂的多步骤任务
  await agent.execute(`
    1. Navigate to the products page
    2. Filter by category "Electronics"
    3. Sort by price (low to high)
    4. Add the first three items to cart
    5. Go to checkout
  `)
}

Error Handling and Retry

错误处理与重试

typescript

import { PageAgent } from 'page-agent'

async function executeWithRetry(command: string, maxRetries = 3) {
  const agent = new PageAgent({
    model: 'gpt-4',
    baseURL: process.env.OPENAI_BASE_URL,
    apiKey: process.env.OPENAI_API_KEY,
  })
  
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error(`Attempt ${attempt} failed:`, error)
      
      if (attempt === maxRetries) {
        return { success: false, error: error.message }
      }
      
      // Wait before retry
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt))
    }
  }
}

typescript

import { PageAgent } from 'page-agent'

async function executeWithRetry(command: string, maxRetries = 3) {
  const agent = new PageAgent({
    model: 'gpt-4',
    baseURL: process.env.OPENAI_BASE_URL,
    apiKey: process.env.OPENAI_API_KEY,
  })
  
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await agent.execute(command)
      return { success: true, result }
    } catch (error) {
      console.error(`Attempt ${attempt} failed:`, error)
      
      if (attempt === maxRetries) {
        return { success: false, error: error.message }
      }
      
      // 重试前等待
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt))
    }
  }
}

Browser Extension (Multi-Page Tasks)

浏览器扩展（多页面任务）

For cross-tab automation, Page Agent provides a Chrome extension:

typescript

// In your extension background script
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// Execute commands across multiple tabs
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === 'EXECUTE_COMMAND') {
    agent.execute(message.command)
      .then(result => sendResponse({ success: true, result }))
      .catch(error => sendResponse({ success: false, error: error.message }))
    return true // Keep channel open for async response
  }
})

针对跨标签页自动化，Page Agent提供了Chrome扩展：

typescript

// 在你的扩展后台脚本中
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// 跨多个标签页执行命令
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.type === 'EXECUTE_COMMAND') {
    agent.execute(message.command)
      .then(result => sendResponse({ success: true, result }))
      .catch(error => sendResponse({ success: false, error: error.message }))
    return true // 保持通道开放以获取异步响应
  }
})

MCP Server (Beta)

MCP服务器（测试版）

Page Agent includes an MCP (Model Context Protocol) server for external control:

bash

undefined

Page Agent包含一个用于外部控制的MCP（Model Context Protocol）服务器：

bash

undefined

Start MCP server

启动MCP服务器

npx page-agent-mcp


Configure in your MCP client (e.g., Claude Desktop):

```json
{
  "mcpServers": {
    "page-agent": {
      "command": "npx",
      "args": ["page-agent-mcp"]
    }
  }
}

npx page-agent-mcp


在你的MCP客户端（如Claude Desktop）中配置：

```json
{
  "mcpServers": {
    "page-agent": {
      "command": "npx",
      "args": ["page-agent-mcp"]
    }
  }
}

Programmatic API

程序化API

Event Listeners

事件监听器

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// Listen to execution events
agent.on('step', (stepData) => {
  console.log('Agent step:', stepData)
})

agent.on('complete', (result) => {
  console.log('Execution complete:', result)
})

agent.on('error', (error) => {
  console.error('Agent error:', error)
})

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// 监听执行事件
agent.on('step', (stepData) => {
  console.log('Agent step:', stepData)
})

agent.on('complete', (result) => {
  console.log('Execution complete:', result)
})

agent.on('error', (error) => {
  console.error('Agent error:', error)
})

Custom Actions

自定义操作

typescript

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// Register custom action
agent.registerAction('sendEmail', async (params) => {
  // Custom email sending logic
  await sendEmail(params.to, params.subject, params.body)
  return { success: true }
})

// Use custom action
await agent.execute('Send an email to team@example.com with subject "Update"')

typescript

import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
})

// 注册自定义操作
agent.registerAction('sendEmail', async (params) => {
  // 自定义邮件发送逻辑
  await sendEmail(params.to, params.subject, params.body)
  return { success: true }
})

// 使用自定义操作
await agent.execute('Send an email to team@example.com with subject "Update"')

Troubleshooting

故障排除

Agent Not Finding Elements

Agent无法找到元素

Problem: Agent fails to locate buttons or form fields.

Solutions:

Ensure elements have proper labels or accessible names
Check that elements are visible (not
```
display: none
```
or
```
visibility: hidden
```
)
Increase
```
maxElements
```
in configuration
Use more specific descriptions in commands

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  elementSelector: {
    includeInvisible: false,
    maxElements: 200, // Increase if needed
  },
})

问题：Agent无法定位按钮或表单字段。

解决方案：

确保元素有合适的标签或可访问名称
检查元素是否可见（非
```
display: none
```
或
```
visibility: hidden
```
状态）
增加配置中的
```
maxElements
```
值
在命令中使用更具体的描述

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  elementSelector: {
    includeInvisible: false,
    maxElements: 200, // 必要时增加
  },
})

LLM API Errors

LLM API错误

Problem: API key or connection errors.

Solutions:

Verify API key is correctly set:
```
console.log(process.env.OPENAI_API_KEY)
```
Check baseURL matches your provider
Ensure model name is correct for your provider
Check network connectivity and CORS settings

typescript

// Debug API configuration
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  debug: true, // Enable debug logging
})

问题：API密钥或连接错误。

解决方案：

验证API密钥是否正确设置：
```
console.log(process.env.OPENAI_API_KEY)
```
检查baseURL是否与你的提供商匹配
确保模型名称与你的提供商一致
检查网络连接和CORS设置

typescript

// 调试API配置
const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  debug: true, // 启用调试日志
})

Timeout Issues

超时问题

Problem: Commands timeout before completion.

Solutions:

Increase timeout value
Break complex commands into smaller steps
Optimize page performance

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60000, // 60 seconds
  maxSteps: 30,
})

问题：命令在完成前超时。

解决方案：

增加超时值
将复杂命令拆分为更小的步骤
优化页面性能

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  timeout: 60000, // 60秒
  maxSteps: 30,
})

CSP (Content Security Policy) Issues

CSP（内容安全策略）问题

Problem: Script blocked by CSP when using CDN.

Solutions:

Add Page Agent CDN to your CSP header
Use NPM package instead of CDN
Update CSP meta tag

html

<meta http-equiv="Content-Security-Policy" 
      content="script-src 'self' https://cdn.jsdelivr.net;">

问题：使用CDN时脚本被CSP阻止。

解决方案：

将Page Agent CDN添加到你的CSP头中
使用NPM包替代CDN
更新CSP元标签

html

<meta http-equiv="Content-Security-Policy" 
      content="script-src 'self' https://cdn.jsdelivr.net;">

Language/Locale Issues

语言/区域设置问题

Problem: Agent responds in wrong language.

Solution: Set explicit language configuration

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  language: 'en-US', // or 'zh-CN'
})

问题：Agent使用错误的语言响应。

解决方案：设置明确的语言配置

typescript

const agent = new PageAgent({
  model: 'gpt-4',
  baseURL: process.env.OPENAI_BASE_URL,
  apiKey: process.env.OPENAI_API_KEY,
  language: 'en-US', // 或 'zh-CN'
})

Best Practices

最佳实践

Use Environment Variables: Never hardcode API keys
Specific Commands: More specific natural language commands work better
Error Handling: Always wrap execute() calls in try-catch
Rate Limiting: Implement delays between commands if making many requests
Element Visibility: Ensure target elements are visible before commands execute
Security: Validate and sanitize user input before passing to agent
Testing: Test with demo LLM first before production deployment

使用环境变量：永远不要硬编码API密钥
命令具体化：更具体的自然语言命令效果更好
错误处理：始终将execute()调用包裹在try-catch中
速率限制：如果发起大量请求，在命令之间设置延迟
元素可见性：执行命令前确保目标元素可见
安全保障：将用户输入传递给Agent前进行验证和清理
测试验证：在生产部署前先使用演示LLM进行测试

Resources

资源

Documentation: https://alibaba.github.io/page-agent/docs/introduction/overview
GitHub: https://github.com/alibaba/page-agent
NPM Package: https://www.npmjs.com/package/page-agent
License: MIT

文档：https://alibaba.github.io/page-agent/docs/introduction/overview
GitHub：https://github.com/alibaba/page-agent
NPM包：https://www.npmjs.com/package/page-agent
许可证：MIT