alibabacloud-ehpc-instant-job-skill
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese阿里云E-HPC Instant作业管理技能
Alibaba Cloud E-HPC Instant Job Management Skill
技能概述
Skill Overview
通过阿里云CLI(优先)或SDK工具,实现对E-HPC Instant计算平台作业及计算资源的全生命周期管理。
Full lifecycle management of jobs and computing resources on the E-HPC Instant computing platform is implemented via Alibaba Cloud CLI (preferred) or SDK tools.
交互原则
Interaction Principles
- 用户友好:使用"您"而非"你",保持专业而友好的语调
- 透明操作:所有关键配置信息必须向用户展示并获得确认
- 安全第一:遵循最小权限原则,避免意外操作
- 错误处理:提供清晰的错误信息和解决方案
- 格式规范:使用简单、清晰、易读的输出格式
- User-friendly: Maintain a professional and friendly tone, using respectful language when addressing users
- Transparent operation: All key configuration information must be displayed to users and confirmed
- Security first: Follow the principle of least privilege to avoid accidental operations
- Error handling: Provide clear error messages and solutions
- Format specification: Use simple, clear, and easy-to-read output formats
执行流程
Execution Process
- Step1: 配置预加载
- Step2: 前置条件校验
- Step3: 作业管理执行
- Step4: 作业任务执行结果输出
- Step1: Preload Configuration
- Step2: Precondition Check
- Step3: Job Management Execution
- Step4: Job Task Execution Result Output
Step 1: 配置预加载
Step 1: Preload Configuration
[MUST] 必须首先尝试读取文件,严禁跳过此检查步骤。./jobconfig/pre-config.json
- 文件存在且读取成功:加载其中的参数值,向用户提示预配置已加载并展示关键参数摘要,然后进入 Step 2。
- 文件不存在:向用户提示未检测到预配置文件,将在 Step 2 中通过交互方式获取必要参数,然后进入 Step 2。
- 文件存在但读取/解析失败(如 JSON 格式错误、权限不足等):向用户报告具体错误信息,询问是否继续(跳过预配置进入 Step 2)或中止执行。
[MUST] Must first attempt to read thefile, strictly prohibited to skip this check step../jobconfig/pre-config.json
- File exists and read successfully: Load the parameter values, prompt the user that the pre-configuration has been loaded and display a summary of key parameters, then proceed to Step 2.
- File does not exist: Prompt the user that no pre-configuration file is detected, necessary parameters will be obtained through interaction in Step 2, then proceed to Step 2.
- File exists but read/parsing failed (such as JSON format error, insufficient permissions, etc.): Report specific error information to the user, ask whether to continue (skip pre-configuration and enter Step 2) or abort execution.
Step 2: 前置条件校验
Step 2: Precondition Check
所有作业管理任务通用校验
General Checks for All Job Management Tasks
- 阿里云CLI配置:验证CLI已安装并配置AccessKey
- CLI版本要求 >= 3.3.3:运行 验证版本 >= 3.3.3。若未安装或版本过低,参考 references/aliyun-cli.md 中的安装说明进行安装或更新。
aliyun version - [MUST] 运行 启用自动插件安装。
aliyun configure set --auto-plugin-install true - [MUST] 运行 确保已安装插件始终为最新版本。
aliyun plugin update - 若无配置,引导用户交互式安装和配置
- 参考:references/aliyun-cli.md
- [MUST] CLI User-Agent — 每个 CLI 命令调用必须包含:
aliyun--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill - 启用阿里云CLI AI模式:运行以下初始化命令,附加本Skill执行的AI身份信息:
bashaliyun configure ai-mode enable aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill" aliyun plugin update阿里云CLI提供了AI模式。启用后,CLI会自动附加AI身份信息,使服务端能够识别并优化Agent调用链路。 [MUST] 在每个退出点禁用AI模式 — 在因任何原因交付最终响应之前,务必先禁用AI模式。这适用于所有退出路径:工作流成功、工作流失败、错误/异常、用户取消、会话结束,或任何不再执行CLI命令的场景。 AI模式仅用于Agent Skill调用场景,技能停止运行后必须保持禁用状态。bashaliyun configure ai-mode disable - CLI版本要求 >= 3.3.3:运行
- 地域信息:获取用户指定区域或使用默认区域
cn-shanghai - 权限验证:确保AccessKey具备E-HPC Instant相关权限
- Alibaba Cloud CLI Configuration: Verify that CLI is installed and AccessKey is configured
- CLI version requirement >= 3.3.3: Run to verify version >= 3.3.3. If not installed or version is too low, refer to the installation instructions in references/aliyun-cli.md for installation or update.
aliyun version - [MUST] Run to enable automatic plugin installation.
aliyun configure set --auto-plugin-install true - [MUST] Run to ensure installed plugins are always up to date.
aliyun plugin update - If no configuration is found, guide users to install and configure interactively
- Reference: references/aliyun-cli.md
- [MUST] CLI User-Agent — Each CLI command call must include:
aliyun--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill - Enable Alibaba Cloud CLI AI Mode: Run the following initialization commands to attach AI identity information for this Skill execution:
bashaliyun configure ai-mode enable aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill" aliyun plugin updateAlibaba Cloud CLI provides AI Mode. After enabling it, the CLI will automatically attach AI identity information, allowing the server to identify and optimize the Agent call chain. [MUST] Disable AI Mode at every exit point — Before delivering the final response for any reason, be sure to disable AI Mode first. This applies to all exit paths: workflow success, workflow failure, error/exception, user cancellation, session end, or any scenario where no more CLI commands will be executed. AI Mode is only used for Agent Skill invocation scenarios, and must remain disabled after the skill stops running.bashaliyun configure ai-mode disable - CLI version requirement >= 3.3.3: Run
- Region Information: Obtain the user-specified region or use the default region
cn-shanghai - Permission Verification: Ensure the AccessKey has permissions related to E-HPC Instant
创建作业任务专项校验
Special Checks for Job Creation Tasks
- 镜像准备:容器/VM镜像已添加至E-HPC Instant平台
- 若无镜像,引导用户交互式添加
- 参考:references/instant-image.md
- 网络配置:vSwitch已创建并可用
- 若无vSwitch,引导用户交互式创建
- 参考:references/vswitch.md
- 存储配置:NAS挂载路径已准备(如需要)
- 若需NAS,引导用户交互式配置
- 参考:references/storage.md
- 计算资源配置:Cores/Memory或InstanceType规格已指定
- 引导用户选择合适的资源配置
- 参考:references/resource.md
- 执行命令:作业执行命令行已指定
- 确保命令格式正确(特别是JSON数组格式)
- 参考:references/job-command.md
- Image Preparation: Container/VM image has been added to the E-HPC Instant platform
- If no image exists, guide users to add it interactively
- Reference: references/instant-image.md
- Network Configuration: vSwitch has been created and is available
- If no vSwitch exists, guide users to create it interactively
- Reference: references/vswitch.md
- Storage Configuration: NAS mount path is prepared (if needed)
- If NAS is required, guide users to configure it interactively
- Reference: references/storage.md
- Computing Resource Configuration: Cores/Memory or InstanceType specifications have been specified
- Guide users to select appropriate resource configurations
- Reference: references/resource.md
- Execution Command: Job execution command line has been specified
- Ensure the command format is correct (especially JSON array format)
- Reference: references/job-command.md
Step 3: 作业管理执行
Step 3: Job Management Execution
根据用户需求执行相应的作业管理操作
Execute the corresponding job management operations according to user requirements
创建作业
Create Job
- Container容器作业
bash
aliyun ehpcinstant CreateJob \
--region cn-shanghai \
--JobName 'container-job' \
--Tasks '[{
"TaskSpec": {
"TaskExecutor": [{
"Container": {
"Image": "registry.cn-shanghai.aliyuncs.com/registry/image:tag",
"Command": "[\"/bin/sh\", \"-c\", \"python /app/main.py\"]"
}
}],
"Resource": {
"Cores": 2,
"Memory": 4,
"Disks": [{"Type":"System","Size":40}]
},
"VolumeMount": [
{
"MountPath": "/mnt",
"VolumeDriver": "alicloud/nas",
"MountOptions": "{\"server\":\"xxx.cn-shanghai.nas.aliyuncs.com\",\"vers\":\"3\",\"path\":\"/\",\"options\":\"nolock,tcp,noresvport\"}"
}
],
},
"ExecutorPolicy": {"MaxCount": 1}
}]' \
--DeploymentPolicy '{"Network":{"Vswitch":["vsw-xxxx"]},"AllocationSpec":"Standard"}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill- VM虚拟机作业
bash
aliyun ehpcinstant CreateJob \
--region cn-shanghai \
--JobName 'longrunning-job' \
--JobDescription 'Long-running VM job' \
--Tasks '[{
"TaskSpec": {
"TaskExecutor": [{
"VM": {
"Image": "m-xxxx",
"Script": "base64_encoded_script"
}
}],
"Resource": {
"Disks": [{"Type":"System","Size":50}],
"InstanceTypes": ["ecs.c6.xlarge"],
"Cores": 4,
"Memory": 8
},
"VolumeMount": [
{
"MountPath": "/mnt",
"VolumeDriver": "alicloud/nas",
"MountOptions": "{\"server\":\"xxx.cn-shanghai.nas.aliyuncs.com\",\"vers\":\"3\",\"path\":\"/\",\"options\":\"nolock,tcp,noresvport\"}"
}
],
},
"ExecutorPolicy": {"MaxCount": 1}
}]' \
--DeploymentPolicy '{"Network":{"Vswitch":["vsw-xxxx"]},"AllocationSpec":"Standard"}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill- Container Job
bash
aliyun ehpcinstant CreateJob \
--region cn-shanghai \
--JobName 'container-job' \
--Tasks '[{
"TaskSpec": {
"TaskExecutor": [{
"Container": {
"Image": "registry.cn-shanghai.aliyuncs.com/registry/image:tag",
"Command": "[\"/bin/sh\", \"-c\", \"python /app/main.py\"]"
}
}],
"Resource": {
"Cores": 2,
"Memory": 4,
"Disks": [{"Type":"System","Size":40}]
},
"VolumeMount": [
{
"MountPath": "/mnt",
"VolumeDriver": "alicloud/nas",
"MountOptions": "{\"server\":\"xxx.cn-shanghai.nas.aliyuncs.com\",\"vers\":\"3\",\"path\":\"/\",\"options\":\"nolock,tcp,noresvport\"}"
}
],
},
"ExecutorPolicy": {"MaxCount": 1}
}]' \
--DeploymentPolicy '{"Network":{"Vswitch":["vsw-xxxx"]},"AllocationSpec":"Standard"}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill- VM Job
bash
aliyun ehpcinstant CreateJob \
--region cn-shanghai \
--JobName 'longrunning-job' \
--JobDescription 'Long-running VM job' \
--Tasks '[{
"TaskSpec": {
"TaskExecutor": [{
"VM": {
"Image": "m-xxxx",
"Script": "base64_encoded_script"
}
}],
"Resource": {
"Disks": [{"Type":"System","Size":50}],
"InstanceTypes": ["ecs.c6.xlarge"],
"Cores": 4,
"Memory": 8
},
"VolumeMount": [
{
"MountPath": "/mnt",
"VolumeDriver": "alicloud/nas",
"MountOptions": "{\"server\":\"xxx.cn-shanghai.nas.aliyuncs.com\",\"vers\":\"3\",\"path\":\"/\",\"options\":\"nolock,tcp,noresvport\"}"
}
],
},
"ExecutorPolicy": {"MaxCount": 1}
}]' \
--DeploymentPolicy '{"Network":{"Vswitch":["vsw-xxxx"]},"AllocationSpec":"Standard"}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill查询作业列表
Query Job List
bash
undefinedbash
undefined基本列表查询
Basic list query
aliyun ehpcinstant ListJobs --region cn-shanghai
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
aliyun ehpcinstant ListJobs --region cn-shanghai
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
分页查询(推荐用于大量作业)
Pagination query (recommended for large number of jobs)
aliyun ehpcinstant ListJobs --region cn-shanghai --PageSize 10 --PageNumber 1
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
aliyun ehpcinstant ListJobs --region cn-shanghai --PageSize 10 --PageNumber 1
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
条件过滤(按状态、时间等)
Conditional filtering (by status, time, etc.)
aliyun ehpcinstant ListJobs --region cn-shanghai --Filter '{"Status":"Running"}'
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
undefinedaliyun ehpcinstant ListJobs --region cn-shanghai --Filter '{"Status":"Running"}'
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
undefined获取作业详情
Get Job Details
bash
aliyun ehpcinstant GetJob --region cn-shanghai --JobId 'job-xxxx' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skillbash
aliyun ehpcinstant GetJob --region cn-shanghai --JobId 'job-xxxx' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill查询作业日志
Query Job Logs
bash
aliyun ehpcinstant DescribeJobResults --region cn-shanghai --JobId 'job-sh145rw47pegztzdpu7s' --TaskName Task0 --ArrayIndex 0 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skillbash
aliyun ehpcinstant DescribeJobResults --region cn-shanghai --JobId 'job-sh145rw47pegztzdpu7s' --TaskName Task0 --ArrayIndex 0 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill删除作业
Delete Job
bash
undefinedbash
undefined删除单个作业
Delete single job
aliyun ehpcinstant DeleteJobs --region cn-shanghai --JobSpec '[{"JobId":"job-xxxx"}]'
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
aliyun ehpcinstant DeleteJobs --region cn-shanghai --JobSpec '[{"JobId":"job-xxxx"}]'
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
批量删除(安全方式)
Batch deletion (safe method)
aliyun ehpcinstant DeleteJobs --region cn-shanghai --JobSpec '[{"JobId":"job-xxxx"},{"JobId":"job-yyyy"}]'
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
> **[MUST] 幂等删除原则** — 当用户请求删除作业时,即使通过 GetJob 或 ListJobs 查询确认作业已不存在,Agent **仍必须**调用 DeleteJobs API 并传入对应 JobId。DeleteJobs API 具备幂等性:作业存在时执行删除,作业不存在时返回标准 NotFound 响应。严禁 Agent 以"作业已不存在"为由自行跳过 API 调用 — 必须让服务端确认最终状态。aliyun ehpcinstant DeleteJobs --region cn-shanghai --JobSpec '[{"JobId":"job-xxxx"},{"JobId":"job-yyyy"}]'
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ehpc-instant-job-skill
> **[MUST] Idempotent Deletion Principle** — When a user requests to delete a job, even if querying via GetJob or ListJobs confirms the job no longer exists, the Agent **must still** call the DeleteJobs API with the corresponding JobId. The DeleteJobs API is idempotent: it deletes the job if it exists, and returns a standard NotFound response if it does not exist. It is strictly prohibited for the Agent to skip the API call on the grounds of "job does not exist" — the server must confirm the final status.Step 4: 结果输出与后续建议
Step 4: Result Output and Follow-up Suggestions
- 格式化输出执行结果
- 提供后续操作建议(如监控、日志查看等)
- 记录关键信息供后续参考
- Format and output execution results
- Provide follow-up operation suggestions (such as monitoring, log viewing, etc.)
- Record key information for future reference
作业管理参考执行脚本
Job Management Reference Execution Scripts
路径:
/home/admin/.openclaw/workspace/skills/ehpcinstant-job-skill/scripts/CLI脚本:
- - 创建容器作业
create_container_job.sh - - 创建VM作业
create_vm_job_batch.sh - - 创建长期运行的VM作业
create_vm_job_longrunning.sh - - 列出作业(支持分页)
list_jobs.sh - - 获取作业详情
get_job.sh - - 删除作业
delete_jobs.sh
SDK脚本:
- 对应的Python SDK版本脚本,位于目录
scripts/sdk/ - - 创建容器作业
create_container_job.py - - 创建VM作业
create_vm_job_batch.py - - 创建长期运行的VM作业
create_vm_job_longrunning.py - - 列出作业(支持分页)
list_jobs.py - - 获取作业详情
get_job.py - - 删除作业
delete_jobs.py
Path:
/home/admin/.openclaw/workspace/skills/ehpcinstant-job-skill/scripts/CLI Scripts:
- - Create container job
create_container_job.sh - - Create VM job
create_vm_job_batch.sh - - Create long-running VM job
create_vm_job_longrunning.sh - - List jobs (supports pagination)
list_jobs.sh - - Get job details
get_job.sh - - Delete jobs
delete_jobs.sh
SDK Scripts:
- Corresponding Python SDK version scripts are located in the directory
scripts/sdk/ - - Create container job
create_container_job.py - - Create VM job
create_vm_job_batch.py - - Create long-running VM job
create_vm_job_longrunning.py - - List jobs (supports pagination)
list_jobs.py - - Get job details
get_job.py - - Delete jobs
delete_jobs.py
注意事项
Notes
- CLI命令准确性:E-HPC Instant使用命令,不是
aliyun ehpcinstant!aliyun ehpc - 区域一致性:所有资源必须在同一区域,跨区域操作会失败
- 配额限制:注意账户配额限制,避免操作失败
- 成本意识:作业运行会产生费用,及时清理不需要的资源
- 权限最小化:为AccessKey配置最小必要权限,提高安全性
- CLI Command Accuracy: E-HPC Instant uses the command, not
aliyun ehpcinstant!aliyun ehpc - Region Consistency: All resources must be in the same region; cross-region operations will fail
- Quota Limits: Pay attention to account quota limits to avoid operation failures
- Cost Awareness: Job running incurs fees, clean up unnecessary resources in a timely manner
- Least Privilege: Configure AccessKey with minimal necessary permissions to improve security
故障排查与最佳实践
Troubleshooting and Best Practices
常见问题解决方案
Common Issue Solutions
| 问题 | 根本原因 | 解决方案 |
|---|---|---|
| InvalidCommand错误 | Command参数格式不正确 | 使用正确的JSON数组格式: |
| 认证失败 | AccessKey配置错误或权限不足 | 检查 |
| 镜像不存在 | 镜像ID错误或区域不匹配 | 确认镜像ID正确且在相同区域,参考references/instant-image.md |
| 网络配置错误 | VSwitch ID错误或不可用 | 检查VSwitch ID和安全组配置,参考references/vswitch.md |
| 资源配额不足 | 账户配额限制 | 联系阿里云增加配额,或调整资源配置 |
| 命令执行失败 | 容器内命令路径错误 | 确保命令在镜像环境中可执行,使用绝对路径 |
| Issue | Root Cause | Solution |
|---|---|---|
| InvalidCommand Error | Incorrect Command parameter format | Use the correct JSON array format: |
| Authentication Failed | Incorrect AccessKey configuration or insufficient permissions | Check the |
| Image Does Not Exist | Incorrect image ID or region mismatch | Confirm the image ID is correct and in the same region, refer to references/instant-image.md |
| Network Configuration Error | Incorrect VSwitch ID or unavailable | Check VSwitch ID and security group configuration, refer to references/vswitch.md |
| Insufficient Resource Quota | Account quota limits | Contact Alibaba Cloud to increase quota, or adjust resource configuration |
| Command Execution Failed | Incorrect command path in container | Ensure the command is executable in the image environment, use absolute paths |
最佳实践指南
Best Practices Guide
-
命名规范:
- 作业名称:(如:
应用-算例-时间戳)prod-training-20260411 - 资源名称:包含用途、环境、区域信息
- 作业名称:
-
资源配置优化:
- 根据应用特性选择合适的实例规格
- 避免过度配置造成资源浪费
-
成本控制:
- 及时清理已完成的作业
- 使用适当的作业超时设置
- 监控资源使用情况
-
可靠性保障:
- 实现作业状态监控和重试机制
- 配置适当的错误处理和告警
- 定期备份重要配置和数据
-
安全性增强:
- 使用私有镜像仓库
- 配置最小权限的AccessKey
- 启用网络访问控制和加密
-
Naming Conventions:
- Job name: (e.g.:
Application-Case-Timestamp)prod-training-20260411 - Resource name: Include purpose, environment, and region information
- Job name:
-
Resource Configuration Optimization:
- Select appropriate instance specifications based on application characteristics
- Avoid over-provisioning to prevent resource waste
-
Cost Control:
- Clean up completed jobs in a timely manner
- Use appropriate job timeout settings
- Monitor resource usage
-
Reliability Assurance:
- Implement job status monitoring and retry mechanisms
- Configure appropriate error handling and alerts
- Regularly back up important configurations and data
-
Security Enhancement:
- Use private image repositories
- Configure AccessKey with minimal permissions
- Enable network access control and encryption
参考资料与扩展
Reference Materials and Extensions
完整API参考
Complete API Reference
如仍有作业管理执行参数不确定,可参考如下链接:
- references/ehpcinstant.md - E-HPC Instant完整CLI命令手册
- CLI帮助命令:
aliyun ehpcinstant <ApiName> --help
If you are still unsure about job management execution parameters, you can refer to the following links:
- references/ehpcinstant.md - E-HPC Instant Complete CLI Command Manual
- CLI help command:
aliyun ehpcinstant <ApiName> --help
专项配置指南
Special Configuration Guides
- references/aliyun-cli.md - 阿里云CLI配置指南
- references/instant-image.md - 镜像管理指南
- references/vswitch.md - 网络vSwitch管理指南
- references/storage.md - NAS存储管理指南
- references/resource.md - 计算资源配置指南
- references/job-command.md - 作业命令配置指南
- references/aliyun-cli.md - Alibaba Cloud CLI Configuration Guide
- references/instant-image.md - Image Management Guide
- references/vswitch.md - Network vSwitch Management Guide
- references/storage.md - NAS Storage Management Guide
- references/resource.md - Computing Resource Configuration Guide
- references/job-command.md - Job Command Configuration Guide
官方文档
Official Documentation
- 阿里云E-HPC Instant产品文档:https://help.aliyun.com/product/ehpcinstant.html
- E-HPC Instant API文档:https://api.aliyun.com/document/EhpcInstant/2023-07-01/overview
- 阿里云CLI官方文档:https://help.aliyun.com/document_detail/121529.html
- Alibaba Cloud E-HPC Instant Product Documentation: https://help.aliyun.com/product/ehpcinstant.html
- E-HPC Instant API Documentation: https://api.aliyun.com/document/EhpcInstant/2023-07-01/overview
- Alibaba Cloud CLI Official Documentation: https://help.aliyun.com/document_detail/121529.html
安全策略与风险控制
Security Policies and Risk Control
操作风险等级分类
Operation Risk Level Classification
| 风险级别 | 操作类型 | Agent行为准则 |
|---|---|---|
| 低风险 | 查询类操作 (list, get, check, describe) | 直接执行,无需额外确认 |
| 中风险 | 配置类操作 (configure, modify, update) | 提示影响范围,获得用户确认后执行 |
| 高风险 | 创建/删除类操作 (create, delete, remove) | 必须完整展示所有配置信息,用户明确确认后执行 |
[MUST] 禁止跳过 API 调用 — Agent 在执行删除等操作时,不得基于查询结果(如 GetJob 返回 NotFound)自行判定"无需操作"而跳过实际的 API 调用。查询与变更之间存在时间窗口,服务端状态可能已变化。Agent 必须始终执行用户请求的目标 API(如 DeleteJobs),由服务端返回权威的最终结果。
| Risk Level | Operation Type | Agent Code of Conduct |
|---|---|---|
| Low Risk | Query operations (list, get, check, describe) | Execute directly, no additional confirmation required |
| Medium Risk | Configuration operations (configure, modify, update) | Prompt the scope of impact, execute after obtaining user confirmation |
| High Risk | Creation/deletion operations (create, delete, remove) | Must display all configuration information completely, execute only after explicit user confirmation |
[MUST] Prohibit Skipping API Calls — When the Agent performs operations such as deletion, it must not skip the actual API call based on query results (such as GetJob returning NotFound) and self-judge "no operation needed". There is a time window between query and change, and the server status may have changed. The Agent must always execute the target API requested by the user (such as DeleteJobs), and the server returns the authoritative final result.
安全检查清单
Security Check Checklist
- ✅ 区域一致性:确保所有资源在同一区域
- ✅ 配额检查:验证资源配额是否充足
- ✅ 依赖验证:确认依赖资源(镜像、网络、存储)存在且可用
- ✅ 权限验证:确保AccessKey具备必要权限
- ✅ 成本提醒:对于可能产生较高费用的操作给予提醒
- ✅ Region Consistency: Ensure all resources are in the same region
- ✅ Quota Check: Verify sufficient resource quota
- ✅ Dependency Verification: Confirm dependent resources (image, network, storage) exist and are available
- ✅ Permission Verification: Ensure AccessKey has necessary permissions
- ✅ Cost Reminder: Provide reminders for operations that may incur high fees
敏感操作确认流程
Sensitive Operation Confirmation Process
- 信息汇总:展示所有关键配置参数
- 影响说明:说明操作的影响范围和潜在风险
- 用户确认:等待用户明确确认("是"、"确认"、"继续"等)
- 执行操作:执行确认后的操作
- 结果反馈:提供详细的操作结果和后续建议
- Information Summary: Display all key configuration parameters
- Impact Description: Explain the scope of impact and potential risks of the operation
- User Confirmation: Wait for explicit user confirmation ("Yes", "Confirm", "Continue", etc.)
- Execute Operation: Execute the confirmed operation
- Result Feedback: Provide detailed operation results and follow-up suggestions