alibabacloud-oss-media-process

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Alibaba Cloud OSS Media Processing

阿里云OSS媒体处理

Process images, audio, and video files stored in Alibaba Cloud OSS using native OSS media processing capabilities. Synchronous processing returns immediate results via
x-oss-process
; asynchronous processing handles long-running jobs via
x-oss-async-process
with polling.
Default language: 默认中文回复。Only use English when the user explicitly writes in English.
利用阿里云OSS原生媒体处理能力,处理存储在OSS中的图片、音频和视频文件。同步处理通过
x-oss-process
即时返回结果;异步处理通过
x-oss-async-process
处理耗时任务,并自动轮询等待完成。
默认语言:默认中文回复。仅当用户明确使用英文提问时才用英文回复。

Quick Start

快速开始

Working directory

工作目录

All script commands run from the skill package root. Use full absolute paths to invoke scripts:
bash
python /path/to/skill/scripts/process.py ...
Do not
cd
into the directory and use relative paths. If a script fails with "No such file or directory", use Glob to find
**/alibabacloud-oss-media-process/scripts/process.py
and use its full path.
Setup workspace output directory (run once per session):
bash
WORKSPACE_OUTPUT=$(pwd)/outputs && mkdir -p "$WORKSPACE_OUTPUT"
All
--output-path
arguments MUST use
$WORKSPACE_OUTPUT/<filename>
— files saved inside the skill directory will NOT be renderable.
所有脚本命令需从技能包根目录运行。使用完整绝对路径调用脚本:
bash
python /path/to/skill/scripts/process.py ...
不要
cd
到目标目录后使用相对路径。若脚本提示"No such file or directory",使用Glob查找
**/alibabacloud-oss-media-process/scripts/process.py
并使用其完整路径。
设置工作区输出目录(每个会话运行一次):
bash
WORKSPACE_OUTPUT=$(pwd)/outputs && mkdir -p "$WORKSPACE_OUTPUT"
所有
--output-path
参数必须使用
$WORKSPACE_OUTPUT/<filename>
格式——保存到技能目录内的文件无法被渲染。

Credentials (Aliyun CLI)

凭证(Aliyun CLI)

This skill uses Aliyun CLI for credential management. Python scripts auto-discover credentials via the
alibabacloud-credentials
default chain (supporting
~/.aliyun/config.json
, environment variables, ECS instance roles, etc.).
Security rules:
  • Never read, echo, print,
    cat
    , or dump
    ~/.aliyun/config.json
    , credential files, or any raw command output that contains
    access_key_id
    ,
    access_key_secret
    ,
    sts_token
    ,
    AccessKeyId
    ,
    AccessKeySecret
    , or
    SecurityToken
    values.
  • Never ask the user to input AK/SK directly in the conversation or command line
  • Guide users to use
    aliyun configure
    to set up credentials securely
  • Never write
    AccessKeyId
    ,
    AccessKeySecret
    , or
    SecurityToken
    into any temporary Python/Shell script, here-doc, env export, or intermediate file. All credentials must be discovered through Aliyun CLI or the SDK default credential chain.
  • For credential diagnostics, use
    aliyun configure list
    ,
    python scripts/load_env.py
    , or other non-secret checks. If you must inspect configuration structure, only inspect non-sensitive fields and do not print secret or token values to the transcript.
  • Treat full presigned URLs as sensitive whenever they contain signing parameters such as
    OSSAccessKeyId
    ,
    accessKeyId
    ,
    x-oss-credential
    ,
    Signature
    ,
    x-oss-signature
    ,
    security-token
    ,
    SecurityToken
    , or
    sts_token
    . Do not print these full URLs into the conversation transcript, command echo, markdown summary, or ordinary log files.
  • When a signed URL is needed for user consumption, distinguish between delivery and display: it is acceptable to generate a usable signed URL, but unless the runtime provides a secure private-output channel that does not enter the transcript or logs, only display a redacted URL or an OSS path in normal user-facing text.
本技能使用Aliyun CLI管理凭证。Python脚本通过
alibabacloud-credentials
默认链自动发现凭证(支持
~/.aliyun/config.json
、环境变量、ECS实例角色等)。
安全规则
  • 切勿读取、回显、打印、
    cat
    或导出
    ~/.aliyun/config.json
    、凭证文件或任何包含
    access_key_id
    access_key_secret
    sts_token
    AccessKeyId
    AccessKeySecret
    SecurityToken
    值的原始命令输出。
  • 切勿要求用户在对话或命令行中直接输入AK/SK
  • 引导用户使用
    aliyun configure
    安全设置凭证
  • 切勿将
    AccessKeyId
    AccessKeySecret
    SecurityToken
    写入任何临时Python/Shell脚本、here-doc、环境变量导出或中间文件。所有凭证必须通过Aliyun CLI或SDK默认凭证链自动发现。
  • 如需诊断凭证问题,使用
    aliyun configure list
    python scripts/load_env.py
    或其他非敏感检查方式。若必须检查配置结构,仅查看非敏感字段,切勿将密钥或令牌值打印到对话记录中。
  • 当签名URL包含
    OSSAccessKeyId
    accessKeyId
    x-oss-credential
    Signature
    x-oss-signature
    security-token
    SecurityToken
    sts_token
    等签名参数时,需将其视为敏感信息。切勿将完整URL打印到对话记录、命令回显、Markdown摘要或普通日志文件中。
  • 当用户需要签名URL时,区分交付和展示:生成可用的签名URL是可行的,但除非运行时提供不进入对话记录或日志的安全私有输出通道,否则在面向用户的文本中仅显示脱敏URL或OSS路径。

Prerequisites

前置条件

StepActionCommand
1Install Aliyun CLI (>=3.3.3)`curl -fsSL https://aliyuncli.alicdn.com/setup.sh
2Configure credentials
aliyun configure
3Run blocking preflight check 1
python scripts/load_env.py
4Run blocking preflight check 2
aliyun configure list
5Enable plugins
aliyun configure set --auto-plugin-install true && aliyun plugin update
6Install Python deps
pip install -r scripts/requirements.txt
7Set bucket/region (choose one)
export ALIBABA_CLOUD_OSS_BUCKET=<b> ALIBABA_CLOUD_OSS_REGION=<r>
(add to
~/.bashrc
/
~/.zshrc
for persistence), or pass
--bucket <b> --region <r>
on every command
Blocking preflight policy:
  • python scripts/load_env.py
    may report missing SDKs, missing credentials, missing bucket/region, or RAM permission problems.
  • aliyun configure list
    must show a usable configured CLI profile.
  • Treat preflight results as stale after any environment or runtime change. If you install Python packages, run
    aliyun configure
    , change env vars, edit shell profiles, switch users, or otherwise modify credential/runtime state, you must rerun both
    python scripts/load_env.py
    and
    aliyun configure list
    before the next
    python scripts/process.py ...
    command.
  • If either command fails these checks, stop immediately.
  • Do not run
    python scripts/process.py ...
    .
  • Do not retry media processing.
  • Do not simulate a successful result.
  • Return only configuration guidance until both checks pass.
步骤操作命令
1安装Aliyun CLI(版本≥3.3.3)`curl -fsSL https://aliyuncli.alicdn.com/setup.sh
2配置凭证
aliyun configure
3运行阻塞式预检检查1
python scripts/load_env.py
4运行阻塞式预检检查2
aliyun configure list
5启用插件
aliyun configure set --auto-plugin-install true && aliyun plugin update
6安装Python依赖
pip install -r scripts/requirements.txt
7设置存储桶/地域(二选一)
export ALIBABA_CLOUD_OSS_BUCKET=<b> ALIBABA_CLOUD_OSS_REGION=<r>
(添加到
~/.bashrc
/
~/.zshrc
以持久生效),或在每次命令中传递
--bucket <b> --region <r>
阻塞式预检策略
  • python scripts/load_env.py
    可能报告缺少SDK、缺少凭证、缺少存储桶/地域或RAM权限问题。
  • aliyun configure list
    必须显示可用的已配置CLI配置文件。
  • 任何环境或运行时变更后,预检结果视为过期。若安装Python包、运行
    aliyun configure
    、修改环境变量、编辑Shell配置文件、切换用户或修改凭证/运行时状态,必须在下次运行
    python scripts/process.py ...
    命令前重新运行
    python scripts/load_env.py
    aliyun configure list
  • 若任一命令未通过检查,立即停止操作。
  • 切勿运行
    python scripts/process.py ...
  • 切勿重试媒体处理。
  • 切勿模拟成功结果。
  • 仅返回配置指导,直到两项检查均通过。

AI-Mode

AI模式

Enable at session start:
bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-oss-media-process"
Disable on every exit: success, failure, error, cancellation, or session end:
bash
aliyun configure ai-mode disable
会话开始时启用:
bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-oss-media-process"
会话结束时禁用:无论成功、失败、错误、取消或会话终止,均需执行:
bash
aliyun configure ai-mode disable

Preflight then Execute

预检后执行

When the user requests a media operation (resize, detect faces, watermark, etc.), apply the blocking preflight policy above before running any
python scripts/process.py ...
command.
process.py
also performs a runtime dependency preflight and exits with
pip install -r scripts/requirements.txt
guidance if required SDKs are missing. If you change the environment after a failed attempt (for example by installing dependencies, editing env vars, or re-running
aliyun configure
), do not assume the earlier preflight still holds — rerun the full blocking preflight first.
当用户请求媒体操作(resize、人脸检测、水印等)时,在运行任何
python scripts/process.py ...
命令前执行上述阻塞式预检策略。
process.py
也会执行运行时依赖预检,若缺少所需SDK,将退出并提示
pip install -r scripts/requirements.txt
指导。若在失败尝试后更改环境(例如安装依赖、编辑环境变量或重新运行
aliyun configure
),切勿假设之前的预检仍然有效——先重新执行完整的阻塞式预检。

First-time setup

首次设置

Direct users to run
aliyun configure
to set up credentials, then verify with:
bash
aliyun configure list
Python scripts use the
alibabacloud-credentials
SDK to auto-discover credentials from the Aliyun CLI config. Bucket and region are read from the
ALIBABA_CLOUD_OSS_BUCKET
/
ALIBABA_CLOUD_OSS_REGION
environment variables, or from
--bucket
/
--region
CLI flags.
load_env.py
scans shell config files (
~/.bashrc
,
~/.zshrc
) for these exports and loads them into
os.environ
.
引导用户运行
aliyun configure
设置凭证,然后通过以下命令验证:
bash
aliyun configure list
Python脚本使用
alibabacloud-credentials
SDK从Aliyun CLI配置中自动发现凭证。存储桶和地域从
ALIBABA_CLOUD_OSS_BUCKET
/
ALIBABA_CLOUD_OSS_REGION
环境变量读取,或从
--bucket
/
--region
CLI标志获取。
load_env.py
会扫描Shell配置文件(
~/.bashrc
~/.zshrc
)中的这些导出变量并加载到
os.environ
中。

Recommended Workflow

推荐工作流程

Follow this numbered workflow for every request:
  1. Prepare Confirm the bucket and region are available through
    --bucket
    /
    --region
    or the
    ALIBABA_CLOUD_OSS_BUCKET
    /
    ALIBABA_CLOUD_OSS_REGION
    environment variables. Apply the blocking preflight policy before any media command. Create
    $WORKSPACE_OUTPUT
    once per session for all local downloads.
  2. Choose the source Use
    --source
    for an existing OSS object key. Use
    --uri
    for a local file path or HTTP(S) URL that should be uploaded temporarily before processing.
  3. Decide the execution path Use
    python scripts/process.py
    for all media processing and file operations. If the request involves video, audio, HLS, or image-intelligent features, run
    python scripts/imm_admin.py auto-setup --bucket <b> --region <r>
    first to ensure IMM bucket binding exists.
  4. Execute Build exactly one valid operation chain. Prefer
    --output-mode download --output-path $WORKSPACE_OUTPUT/<name>
    for sync image outputs,
    --output-mode save --target-key <key>
    for async media outputs, and
    --output-mode url
    when the result is meant to be consumed remotely.
  5. Verify Read the returned JSON. Check
    success
    ,
    request_id
    ,
    task_id
    (async only),
    target_key
    , local
    path
    , and any
    validation_warnings
    . If the command downloaded a local file, present the absolute path to the user. Only report a local output path after the file was actually written to
    $WORKSPACE_OUTPUT
    and the returned absolute path matches the real downloaded file. Do not claim that a file was saved to
    outputs/...
    or any other local path unless it truly exists there. If you need to record
    task_id
    ,
    request_id
    ,
    target_key
    ,
    generated_keys
    , or similar fields in logs, notes, or output files, extract them directly from the
    process.py
    JSON response. Do not transcribe, rewrite, or manually retype these values. If the user or eval explicitly requires verification of any machine-verifiable output property (for example codec, bitrate, sample rate, channel count, duration, resolution, frame rate, width, height, or format), prefer running one additional read-only verification step against a persisted OSS output object before finalizing the summary. Use
    audio/info
    or
    video/info
    for audio/video outputs, and use a separate
    --operations info
    command for image outputs. Do not download the file locally just for this purpose. For image width, height, format, and file-size verification, treat OSS-side
    --operations info
    on the saved target object as the default and preferred verification path.
    info
    is a standalone read-only image metadata operation, not a follow-up segment that should be appended to a basic image processing chain. Do not switch to local image-library inspection when
    info
    can answer the question. If a read-only verification step was performed and its result differs from the requested value, report the actual verified output value. Do not substitute the request value, and do not claim the request was fully satisfied when the verification result shows otherwise. If no read-only verification step was performed, do not describe machine-verifiable output properties as independently confirmed. Do not assume local verification tools such as
    PIL
    /
    Pillow
    ,
    ffprobe
    , or similar utilities are installed. If a tool is unavailable, do not claim that you performed the corresponding local pixel-level or media-property verification. For image-property verification in particular, do not introduce ad hoc local-library checks such as
    PIL
    /
    Pillow
    unless the workflow explicitly requires a local-file-only inspection and OSS-side
    info
    cannot provide the property. In normal skill usage and evals, prefer OSS-side verification and avoid emitting local
    PIL
    /
    Pillow
    commands entirely. If the workflow returns only a signed URL and does not persist a reusable OSS target object, do not claim that you performed a follow-up
    info
    check on the final output object unless such an object actually exists. In that case, either save the result to OSS first and verify the saved object, or state that only the immediate processing result was available and no persisted-object verification was performed. Before sending the final user-facing summary, follow the
    Language rule
    in
    Result Presentation
    .
  6. Recover If the command fails, use the
    Error Recovery
    table below. Retry only after correcting the concrete cause, such as missing IMM binding, bad parameters, or insufficient RAM permissions.

针对每个请求,遵循以下编号工作流程:
  1. 准备 确认存储桶和地域可通过
    --bucket
    /
    --region
    ALIBABA_CLOUD_OSS_BUCKET
    /
    ALIBABA_CLOUD_OSS_REGION
    环境变量获取。在执行任何媒体命令前应用阻塞式预检策略。每个会话创建一次
    $WORKSPACE_OUTPUT
    用于所有本地下载。
  2. 选择源 使用
    --source
    指定现有OSS对象键。使用
    --uri
    指定本地文件路径或HTTP(S)URL,脚本会先临时上传该文件再进行处理。
  3. 确定执行路径 所有媒体处理和文件操作均使用
    python scripts/process.py
    。若请求涉及视频、音频、HLS或图片智能功能,先运行
    python scripts/imm_admin.py auto-setup --bucket <b> --region <r>
    确保IMM存储桶绑定已存在。
  4. 执行 构建一个有效的操作链。同步图片输出优先使用
    --output-mode download --output-path $WORKSPACE_OUTPUT/<name>
    ,异步媒体输出使用
    --output-mode save --target-key <key>
    ,当结果需远程使用时使用
    --output-mode url
  5. 验证 读取返回的JSON。检查
    success
    request_id
    task_id
    (仅异步任务)、
    target_key
    、本地
    path
    以及任何
    validation_warnings
    。若命令下载了本地文件,向用户展示绝对路径。 仅当文件实际写入
    $WORKSPACE_OUTPUT
    且返回的绝对路径与真实下载文件匹配时,才报告本地输出路径。切勿声称文件已保存到
    outputs/...
    或其他本地路径,除非该文件确实存在。 若需在日志、笔记或输出文件中记录
    task_id
    request_id
    target_key
    generated_keys
    或类似字段,直接从
    process.py
    的JSON响应中提取。切勿转录、重写或手动重新输入这些值。 若用户或评估明确要求验证任何可机器验证的输出属性(例如编解码器、比特率、采样率、声道数、时长、分辨率、帧率、宽度、高度或格式),优先对持久化的OSS输出对象执行额外的只读验证步骤,再完成摘要。音频/视频输出使用
    audio/info
    video/info
    ,图片输出使用单独的
    --operations info
    命令。切勿仅为此目的下载文件到本地。 对于图片宽度、高度、格式和文件大小验证,默认优先对保存的目标对象执行OSS端
    --operations info
    验证。
    info
    是独立的只读图片元数据操作,不应附加到基础图片处理链中。当
    info
    可回答问题时,切勿切换到本地图片库检查。 若执行了只读验证步骤且结果与请求值不同,报告实际验证的输出值。切勿替换请求值,当验证结果显示未完全满足请求时,切勿声称请求已完全完成。 若未执行只读验证步骤,切勿声称可机器验证的输出属性已独立确认。 切勿假设本地验证工具如
    PIL
    /
    Pillow
    ffprobe
    或类似工具已安装。若工具不可用,切勿声称已执行相应的本地像素级或媒体属性验证。 尤其对于图片属性验证,除非工作流明确要求仅本地文件检查且OSS端
    info
    无法提供该属性,否则切勿引入临时本地库检查如
    PIL
    /
    Pillow
    。在常规技能使用和评估中,优先使用OSS端验证,完全避免执行本地
    PIL
    /
    Pillow
    命令。 若工作流仅返回签名URL且未持久化可重用的OSS目标对象,切勿声称已对最终输出对象执行后续
    info
    检查,除非该对象确实存在。在这种情况下,要么先将结果保存到OSS并验证保存的对象,要么说明仅能获取即时处理结果,无法执行持久化对象验证。 在发送最终面向用户的摘要前,遵循「结果展示」中的「语言规则」。
  6. 恢复 若命令失败,使用下方的「错误恢复」表。仅在纠正具体原因(如缺少IMM绑定、参数错误或RAM权限不足)后重试。

Quick Decision Guide

快速决策指南

All processing goes through
process.py

所有处理均通过
process.py
执行

Image, video, and audio operations MUST be executed via
python scripts/process.py --operations "..."
. The agent must not write its own SDK or CLI calls to bypass
process.py
or
imm_admin.py
for video/audio/image processing. Underlying SDK or API requests triggered internally by these scripts (including IMM requests such as
CreateMediaConvertTask
) are expected implementation behavior and do not count as direct agent-side SDK usage. The only intentional script-level IMM entry points are
imm_admin.py
for project setup and
blindwatermark-extract
for async watermark extraction.
Never create your own Python scripts or wrappers to bypass
process.py
. When
process.py
doesn't support a feature, check SKILL.md and
references/
documentation, use
--dry-run
to preview, and report to the user if it truly cannot be done.
图片、视频和音频操作必须通过
python scripts/process.py --operations "..."
执行。Agent不得编写自己的SDK或CLI调用以绕过
process.py
imm_admin.py
进行视频/音频/图片处理。这些脚本内部触发的底层SDK或API请求(包括IMM请求如
CreateMediaConvertTask
)属于预期实现行为,不算作Agent端直接使用SDK。唯一有意的脚本级IMM入口点是用于项目设置的
imm_admin.py
和用于异步水印提取的
blindwatermark-extract
切勿创建自己的Python脚本或包装器以绕过
process.py
。当
process.py
不支持某功能时,检查SKILL.md和
references/
文档,使用
--dry-run
预览,若确实无法实现则告知用户。

IMM setup (before IMM-dependent ops)

IMM设置(依赖IMM的操作前)

Before running video, audio, HLS, or image-intelligent operations, first run
imm_admin.py auto-setup
to ensure the bucket is bound to an IMM project. Pass
--imm-project <project_name>
only for
blindwatermark-extract
, or if you intentionally want to override the optional
ALIBABA_CLOUD_IMM_PROJECT
fallback used by that operation.
在运行视频、音频、HLS或图片智能操作前,先运行
imm_admin.py auto-setup
确保存储桶已绑定到IMM项目。仅在
blindwatermark-extract
操作中,或有意覆盖该操作使用的可选
ALIBABA_CLOUD_IMM_PROJECT
回退值时,才传递
--imm-project <project_name>

Source selection

源选择

  • OSS object →
    --source object-key
  • Local file or URL →
    --uri /path/to/file
    (auto-uploads, processes, cleans up)
  • OSS对象 →
    --source object-key
  • 本地文件或URL →
    --uri /path/to/file
    (自动上传、处理、清理)

Sync vs Async (auto-detected)

同步vs异步(自动检测)

  • Sync (
    x-oss-process
    ): image ops,
    video/snapshot
    ,
    video/info
    ,
    audio/info
    ,
    hls/m3u8
    , AI detection
  • Async (
    x-oss-async-process
    ):
    video/convert
    ,
    video/animation
    ,
    video/snapshots
    ,
    video/sprite
    ,
    video/concat
    ,
    audio/convert
    ,
    audio/concat
    ,
    blindwatermark-extract
The script auto-detects async-only operations and handles routing/polling automatically — no
--async
or
--wait
flags needed.
  • 同步(
    x-oss-process
    ):图片操作、
    video/snapshot
    video/info
    audio/info
    hls/m3u8
    、AI检测
  • 异步(
    x-oss-async-process
    ):
    video/convert
    video/animation
    video/snapshots
    video/sprite
    video/concat
    audio/convert
    audio/concat
    blindwatermark-extract
脚本会自动检测仅支持异步的操作,并自动处理路由/轮询——无需
--async
--wait
标志。

Output rules

输出规则

Operation typeOutput modeCommand pattern
Sync (image)
download
--output-mode download --output-path $WORKSPACE_OUTPUT/<file>
Async (video/audio)
save
then
download
1.
--output-mode save --target-key output/<file>
→ 2.
--operations download --output-path $WORKSPACE_OUTPUT/<file>
video/snapshots
save
with auto-download
--output-mode save --target-key output/frames/frame --output-path $WORKSPACE_OUTPUT/
— script auto-polls and downloads all frames
hls/m3u8
url
--output-mode url
— returns signed URL for browser/player (not a downloadable file)
All
--output-path
MUST use
$WORKSPACE_OUTPUT/<filename>
— files saved inside the skill directory will NOT be renderable.
No-local-download rule: if the user explicitly says not to download locally, only to save in OSS, or only to return a link/URL, do not pass
--output-path
and do not perform any follow-up download for verification. Use
--output-mode url
for sync results meant to be consumed remotely, and use
--output-mode save --target-key ...
for async media results that should remain in OSS. Never download to
$WORKSPACE_OUTPUT
,
/tmp
, or any local path just to verify success; rely on the
process.py
JSON response instead.
Ambiguous save wording rule: if the user says "保存", "保存下来", "存起来", or similar wording but does not explicitly say "下载到本地", "本地查看", "给我本地文件", or another clear local-destination phrase, default to saving the result back to OSS with
--output-mode save --target-key ...
. Only use
--output-mode download --output-path ...
when the user explicitly asks for a local file. If the user only wants to inspect the result and does not require a persisted local copy, prefer
--output-mode url
for sync outputs and
--output-mode save
plus the OSS path for async outputs.
Signed-URL delivery rule: the purpose of
--output-mode url
is to make a remote result accessible, not to force the full signed query string into the transcript. In ordinary text responses, prefer an OSS path or a redacted URL. Only provide a full presigned URL when the runtime offers a secure private-output channel that keeps the raw URL out of transcript/log surfaces. If no such channel exists, explain the limitation briefly and avoid printing the full signed query parameters. A redacted URL should keep the path and any non-sensitive query parameters, while replacing sensitive signing values with
***
, for example:
https://bucket.oss-cn-hangzhou.aliyuncs.com/output/result.webp?OSSAccessKeyId=***&x-oss-credential=***&Signature=***&security-token=***&Expires=1700000000
.
Unique suffix rule: when you need a unique OSS target key suffix for evals, retries, or parallel runs, prefer Python-generated UUIDs or a timestamp-plus-random suffix. Do not rely on
uuidgen
being available. If you must generate a suffix from shell commands, first verify the command exists; otherwise fall back to a timestamp plus random digits. Safe shell example:
SUFFIX=$(python3 -c "import uuid; print(uuid.uuid4().hex[:8])" 2>/dev/null || date +%Y%m%d_%H%M%S_$RANDOM)
.
操作类型输出模式命令模式
同步(图片)
download
--output-mode download --output-path $WORKSPACE_OUTPUT/<file>
异步(视频/音频)
save
download
1.
--output-mode save --target-key output/<file>
→ 2.
--operations download --output-path $WORKSPACE_OUTPUT/<file>
video/snapshots
save
并自动下载
--output-mode save --target-key output/frames/frame --output-path $WORKSPACE_OUTPUT/
—— 脚本自动轮询并下载所有帧
hls/m3u8
url
--output-mode url
—— 返回供浏览器/播放器使用的签名URL(非可下载文件)
所有
--output-path
必须使用
$WORKSPACE_OUTPUT/<filename>
格式——保存到技能目录内的文件无法被渲染。
禁止本地下载规则:若用户明确表示不下载到本地,仅保存到OSS或仅返回链接/URL,则不传递
--output-path
,也不执行任何后续下载验证。同步结果需远程使用时使用
--output-mode url
,异步媒体结果需保留在OSS中时使用
--output-mode save --target-key ...
。切勿仅为验证成功而下载到
$WORKSPACE_OUTPUT
/tmp
或任何本地路径;依赖
process.py
的JSON响应即可。
模糊保存措辞规则:若用户说「保存」「保存下来」「存起来」等类似措辞,但未明确说「下载到本地」「本地查看」「给我本地文件」或其他明确的本地目标短语,默认使用
--output-mode save --target-key ...
将结果保存回OSS。仅当用户明确要求本地文件时,才使用
--output-mode download --output-path ...
。若用户仅需查看结果且不需要持久化本地副本,同步输出优先使用
--output-mode url
,异步输出优先使用
--output-mode save
加OSS路径。
签名URL交付规则
--output-mode url
的目的是让远程结果可访问,而非强制将完整签名查询字符串写入对话记录。在普通文本回复中,优先使用OSS路径或脱敏URL。仅当运行时提供安全私有输出通道以避免原始URL进入对话记录/日志时,才提供完整签名URL。若没有此类通道,简要说明限制并避免打印完整签名参数。脱敏URL应保留路径和任何非敏感查询参数,将敏感签名值替换为
***
,例如:
https://bucket.oss-cn-hangzhou.aliyuncs.com/output/result.webp?OSSAccessKeyId=***&x-oss-credential=***&Signature=***&security-token=***&Expires=1700000000
唯一后缀规则:当需要为评估、重试或并行运行生成唯一的OSS目标键后缀时,优先使用Python生成的UUID或时间戳加随机后缀。切勿依赖
uuidgen
可用。若必须通过Shell命令生成后缀,先验证命令是否存在;否则回退到时间戳加随机数字。安全Shell示例:
SUFFIX=$(python3 -c "import uuid; print(uuid.uuid4().hex[:8])" 2>/dev/null || date +%Y%m%d_%H%M%S_$RANDOM)

Chaining rules

链式规则

See the dedicated
Chaining Rules
section below for full chaining guidelines.

完整链式指南请见下方「链式规则」章节。

Core Parameter Rules

核心参数规则

  1. Only pass parameters the user specifies — do not invent defaults. OSS uses official defaults for unspecified parameters (e.g., keep original width/height, original bitrate, original framerate).
  2. Recipes are examples, not defaults — parameter values in recipe tables (e.g.,
    w=800
    ,
    vb=2000000
    ) are for specific scenarios and should NOT be used as defaults.
  3. video/convert — remux vs re-encode: omitting
    vcodec
    means OSS only does remux (stream copy without re-encoding). Parameters like
    videoslim
    ,
    vb
    ,
    crf
    ,
    s
    ,
    fps
    are silently ignored in remux mode. Always specify
    vcodec
    (default
    h264
    ) when the user says "transcode", "compress", or "slim".
    Only omit
    vcodec
    for pure remux (e.g., AVI→MP4 container switch) or audio extraction.
  4. video/concat — when input params differ: if input videos have different resolution, framerate, or codec, you must ask the user which video to align to (option A: first video, B: second video, C: custom params). Never auto-decide.
  5. video/concat — validation scope:
    process.py
    always performs input compatibility checks before submitting the async task. Additional local
    ffprobe
    output validation only runs when the command also downloads the result via
    --output-path
    . If you use
    --output-mode save
    without a local download path, there is no post-download media validation step.
  6. Snapshots vs snapshot: use
    video/snapshots
    (async) for multi-frame extraction. Never use multiple
    video/snapshot
    calls as a workaround.
    video/snapshots
    target-key must NOT have a file extension.
  7. For full parameter specifications, see the corresponding reference files in
    references/
    .

  1. 仅传递用户指定的参数 —— 切勿自行设置默认值。OSS对未指定的参数使用官方默认值(例如保留原始宽/高、原始比特率、原始帧率)。
  2. 示例仅作参考,非默认值 —— 示例表中的参数值(如
    w=800
    vb=2000000
    )适用于特定场景,不应作为默认值使用。
  3. video/convert —— 封装转换vs重新编码:省略
    vcodec
    意味着OSS仅执行封装转换(流复制,不重新编码)。
    videoslim
    vb
    crf
    s
    fps
    等参数在封装转换模式下会被忽略。当用户说「转码」「压缩」或「瘦身」时,必须指定
    vcodec
    (默认
    h264
    。仅在纯封装转换(如AVI→MP4容器切换)或音频提取时省略
    vcodec
  4. video/concat —— 输入参数不同时:若输入视频的分辨率、帧率或编解码器不同,必须询问用户对齐到哪个视频(选项A:第一个视频,B:第二个视频,C:自定义参数)。切勿自行决定。
  5. video/concat —— 验证范围
    process.py
    在提交异步任务前始终执行输入兼容性检查。仅当命令同时通过
    --output-path
    下载结果时,才会执行额外的本地
    ffprobe
    输出验证。若使用
    --output-mode save
    且未指定本地下载路径,则无下载后媒体验证步骤。
  6. 多帧截图vs单帧截图:使用
    video/snapshots
    (异步)提取多帧。切勿使用多次
    video/snapshot
    调用作为替代方案。
    video/snapshots
    的目标键不得包含文件扩展名。
  7. 完整参数说明请见
    references/
    中的对应参考文件。

Result Presentation

结果展示

After every successful
process.py
execution, present results in this format:
Language rule: unless the user explicitly requested English, the final user-facing result summary in this section must be written in Chinese. Use a result template that matches the response language. For Chinese responses, use a Chinese lead-in such as
处理结果如下:
and Chinese field labels such as
状态
/
请求 ID
/
任务 ID
/
源文件
/
输出
/
参数
/
文件大小
/
OSS 路径
. For English responses, use
Result summary:
and the corresponding English labels
Status
/
RequestID
/
Task ID
/
Source
/
Output
/
Params
/
File Size
/
OSS Path
.
1. File path: output the local absolute path in a code block (e.g.,
/path/to/outputs/snapshot.jpg
). Never use
open
or Read tool to display files. Only include this section when the file was actually downloaded or written locally. Do not present an
outputs/...
path that was only planned, inferred, or mentioned in a transcript.
2. Result table:
ItemDetail
Status✅ Completed
RequestID
<request_id>
(or
N/A
)
Task ID
<task_id>
(async only)
Source
source/input.mp4
Output
output/result.mp4
ParamsDynamic — from your command (e.g., MP4/H.264/2Mbps, or 800x600/JPEG)
File SizeFrom download output
OSS Path
oss://<bucket>/<target-key>
(save mode only)
Field sourcing rules:
Status
and
Params
must be quoted directly from the
process.py
JSON response.
Status
must come from the returned
success
field, and
Params
must come from the returned
operations
field. Never rewrite, estimate, normalize, or summarize numeric/media values by hand, including confidence scores, bitrate, resolution, dimensions, frame rate, or codec details.
If you need a textual summary, include the original command or process string in a fenced code block and describe it conservatively. Do not invent parameter values or restate them in free-form prose when they are not explicitly present in the
process.py
response.
Final summary constraints:
  • Do not insert fixed English filler such as
    Task Completed Successfully
    .
  • Numeric values such as sample rate, bitrate, resolution, duration, frame rate, and file count must be copied directly from
    process.py
    JSON fields or an explicitly performed read-only verification result.
  • If a value was not obtained directly from machine output, omit it instead of rewriting, estimating, rounding, or normalizing it by hand.
  • If an explicitly performed read-only verification result differs from the requested value, report the actual verified output value and describe the request as only partially satisfied when necessary. Do not replace the verified value with the requested one.
  • If no read-only verification result was obtained, do not claim that machine-verifiable output properties were independently confirmed.
If the user forbids local downloads, omit the
File path
row/section entirely and do not create temporary local files for validation. In that case, present only the JSON-backed metadata returned by
process.py
, such as
success
,
request_id
,
task_id
,
target_key
,
generated_keys
, or
url
.
If
process.py
returns a signed URL, treat the full query string as sensitive output. In normal visible summaries, prefer the OSS path, target key, or a redacted URL. Do not expand raw signing parameters into the final summary unless the runtime has a secure private-output channel for secret delivery.
If independent verification was requested but the workflow returned only a signed URL and did not create a persisted OSS target object, do not claim that a follow-up
info
check was performed on a final output object. Either save the result first and verify the saved object, or state clearly that no persisted-object verification was available.
For image outputs and visual effects such as watermarks, overlays, blur regions, or face redaction, distinguish between metadata verification and visual verification. If the output was not downloaded or rendered locally, do not claim that a visual element was independently confirmed by inspection; state that only the service-reported processing result was verified unless a local render or explicit inspection step was actually performed.
Rules:
  • Do not run
    video/info
    ,
    audio/info
    , or image
    --operations info
    after processing for ordinary result reporting. However, if the user explicitly asks you to verify concrete machine-verifiable output properties such as codec, bitrate, sample rate, channel count, duration, resolution, frame rate, width, height, or format, or if the eval/acceptance criteria explicitly require an independent property check, prefer running one additional read-only verification step against a persisted OSS output object and report that verification separately from the main
    process.py
    result. Use
    audio/info
    or
    video/info
    for audio/video outputs, and use a separate
    --operations info
    command for image outputs.
  • Do not assume local verification libraries or binaries such as
    PIL
    /
    Pillow
    ,
    ffprobe
    , or similar tools are preinstalled. Use them only when they are actually available and the workflow genuinely requires a local-file check; otherwise rely on
    process.py
    JSON output and permitted read-only OSS-side checks.
  • For image width/height/format verification, prefer OSS-side
    --operations info
    on the saved target object even if a local file is present. Do not use
    PIL
    /
    Pillow
    as the default verification method for evals or routine skill runs.
  • Requests to verify image width, height, format, or similar machine-verifiable properties do not by themselves authorize a local download. If the user did not explicitly request a local file, and a saved OSS target object can be verified with
    info
    , do not switch to
    --output-mode download
    solely for verification.
  • Do not use
    head_object
    as a substitute for media-property verification.
  • Avoid
    sleep
    + retry loops; the script handles async polling internally.
  • All media processing goes through
    process.py
    ; if unsupported, check
    references/
    and report — do not write custom scripts.

每次成功执行
process.py
后,按以下格式展示结果:
语言规则:除非用户明确要求英文,否则本章节的最终面向用户结果摘要必须使用中文。 使用与响应语言匹配的结果模板。中文回复使用中文开头如「处理结果如下:」和中文字段标签如「状态」「请求ID」「任务ID」「源文件」「输出」「参数」「文件大小」「OSS路径」。英文回复使用「Result summary:」和对应的英文标签「Status」「RequestID」「Task ID」「Source」「Output」「Params」「File Size」「OSS Path」。
1. 文件路径:在代码块中输出本地绝对路径(例如
/path/to/outputs/snapshot.jpg
)。切勿使用
open
或读取工具显示文件内容。 仅当文件实际下载或写入本地时,才包含此部分。切勿展示仅计划、推断或在对话记录中提及的
outputs/...
路径。
2. 结果表格
项目详情
状态✅ 已完成
请求ID
<request_id>
(或
N/A
任务ID
<task_id>
(仅异步任务)
源文件
source/input.mp4
输出
output/result.mp4
参数动态获取——来自你的命令(例如MP4/H.264/2Mbps,或800x600/JPEG)
文件大小来自下载输出
OSS路径
oss://<bucket>/<target-key>
(仅保存模式)
字段来源规则
状态
参数
必须直接引用
process.py
的JSON响应。
状态
必须来自返回的
success
字段,
参数
必须来自返回的
operations
字段。切勿手动重写、估算、标准化或总结数值/媒体值,包括置信度分数、比特率、分辨率、尺寸、帧率或编解码器细节。
若需要文本摘要,在代码块中包含原始命令或处理字符串,并保守描述。切勿在
process.py
响应中不存在参数值时自行编造或用自由格式散文重述。
最终摘要约束
  • 切勿插入固定英文填充内容如
    Task Completed Successfully
  • 采样率、比特率、分辨率、时长、帧率和文件数量等数值必须直接复制自
    process.py
    JSON字段或明确执行的只读验证结果。
  • 若值未直接从机器输出获取,省略该值,切勿手动重写、估算、四舍五入或标准化。
  • 若明确执行的只读验证结果与请求值不同,报告实际验证的输出值,必要时说明请求仅部分满足。切勿替换验证值为请求值。
  • 若未获取只读验证结果,切勿声称可机器验证的输出属性已独立确认。
若用户禁止本地下载,完全省略「文件路径」行/部分,切勿创建临时本地文件用于验证。在这种情况下,仅展示
process.py
返回的基于JSON的元数据,如
success
request_id
task_id
target_key
generated_keys
url
process.py
返回签名URL,将完整查询字符串视为敏感输出。在普通可见摘要中,优先使用OSS路径、目标键或脱敏URL。除非运行时提供用于秘密交付的安全私有输出通道,否则切勿在最终摘要中展开原始签名参数。
若请求独立验证但工作流仅返回签名URL且未创建持久化OSS目标对象,切勿声称已对最终输出对象执行后续
info
检查。要么先保存结果并验证保存的对象,要么明确说明无法执行持久化对象验证。
对于图片输出和视觉效果如水印、叠加层、模糊区域或人脸打码,区分元数据验证和视觉验证。若输出未下载或本地渲染,切勿声称通过检查独立确认了视觉元素;除非实际执行了本地渲染或明确检查步骤,否则说明仅验证了服务报告的处理结果。
规则
  • 普通结果报告时,处理后切勿运行
    video/info
    audio/info
    或图片
    --operations info
    。但如果用户明确要求验证具体的可机器验证输出属性如编解码器、比特率、采样率、声道数、时长、分辨率、帧率、宽度、高度或格式,或评估/验收标准明确要求独立属性检查,优先对持久化的OSS输出对象执行额外的只读验证步骤,并将该验证与主
    process.py
    结果分开报告。音频/视频输出使用
    audio/info
    video/info
    ,图片输出使用单独的
    --operations info
    命令。
  • 切勿假设本地验证库或二进制文件如
    PIL
    /
    Pillow
    ffprobe
    或类似工具已预装。仅当工具实际可用且工作流确实需要本地文件检查时才使用;否则依赖
    process.py
    JSON输出和允许的OSS端只读检查。
  • 对于图片宽度/高度/格式验证,即使存在本地文件,优先对保存的目标对象执行OSS端
    --operations info
    验证。在评估或常规技能运行中,切勿将
    PIL
    /
    Pillow
    作为默认验证方法。
  • 请求验证图片宽度、高度、格式或类似可机器验证属性本身并不授权本地下载。若用户未明确请求本地文件,且保存的OSS目标对象可通过
    info
    验证,切勿仅为验证而切换到
    --output-mode download
  • 切勿使用
    head_object
    替代媒体属性验证。
  • 避免
    sleep
    +重试循环;脚本内部处理异步轮询。
  • 所有媒体处理均通过
    process.py
    执行;若不支持,检查
    references/
    并报告——切勿编写自定义脚本。

Chaining Rules

链式规则

Image Operations

图片操作

  • Basic operations can be freely chained with each other
  • blindwatermark-embed
    can follow basic ops but must be the last operation
  • blindwatermark-extract
    must be used alone — no chaining
  • AI detection (
    faces
    ,
    bodies
    ,
    cars
    ,
    codes
    ,
    labels
    ,
    score
    ) must be used alone
  • 基础操作可自由链式组合
  • blindwatermark-embed
    可跟随基础操作,但必须是最后一个操作
  • blindwatermark-extract
    必须单独使用——不可链式组合
  • AI检测(
    faces
    bodies
    cars
    codes
    labels
    score
    )必须单独使用

Video/Audio Operations

视频/音频操作

  • Video/audio operations cannot be chained with image operations
  • Only one video/audio operation per request (no chaining)
  • For complex workflows, use multiple separate requests

  • 视频/音频操作不可与图片操作链式组合
  • 每个请求仅可包含一个视频/音频操作(不可链式组合)
  • 复杂工作流需使用多个独立请求

Credential & Environment Setup

凭证与环境设置

Credentials are managed by Aliyun CLI (
~/.aliyun/config.json
). Python scripts auto-discover them via the
alibabacloud-credentials
SDK default chain. See Prerequisites above for setup steps.
Diagnostic check:
bash
python scripts/load_env.py
This scans for legacy env vars and verifies RAM permissions. Use this if operations fail with access errors.
Runtime dependency preflight:
process.py
checks required Python packages before execution. Basic OSS/file operations require
oss2
and
alibabacloud-credentials
; video/audio/HLS/IMM operations also require the IMM SDK packages from
scripts/requirements.txt
. If any dependency is missing, the command fails fast with an install hint instead of starting a partial execution.
IMM project — usually discovered by
imm_admin.py auto-setup
.
process.py
only consumes
--imm-project
/
ALIBABA_CLOUD_IMM_PROJECT
for
blindwatermark-extract
.

凭证由Aliyun CLI管理(
~/.aliyun/config.json
)。Python脚本通过
alibabacloud-credentials
SDK默认链自动发现凭证。设置步骤请见上方「前置条件」。
诊断检查
bash
python scripts/load_env.py
该命令扫描遗留环境变量并验证RAM权限。若操作因访问错误失败,使用此命令。
运行时依赖预检
process.py
在执行前检查所需Python包。基础OSS/文件操作需要
oss2
alibabacloud-credentials
;视频/音频/HLS/IMM操作还需要
scripts/requirements.txt
中的IMM SDK包。若缺少任何依赖,命令会快速失败并提示安装指引,而非开始部分执行。
IMM项目——通常由
imm_admin.py auto-setup
自动发现。
process.py
仅在
blindwatermark-extract
操作中使用
--imm-project
/
ALIBABA_CLOUD_IMM_PROJECT

IMM Auto-Setup

IMM自动设置

Video/audio processing and image-intelligent features require an IMM project bound to the bucket. Follow this workflow for IMM-dependent operations:
Step 1 — Detect IMM project (before any processing command):
bash
python scripts/imm_admin.py auto-setup --bucket <bucket> --region <region>
This ensures the bucket is bound to a usable IMM project and prints the resolved project name.
Step 2 — Execute the media operation:
bash
python scripts/process.py --source video.mp4 \
  --operations "video/convert:f=mp4,vcodec=h264" \
  --output-mode save --target-key output/video.mp4
For
blindwatermark-extract
, append
--imm-project <project_name>
if you do not want to rely on the optional
ALIBABA_CLOUD_IMM_PROJECT
fallback.
Step 3 — Present results per Execution & Output Workflow above.
Operations that require IMM bucket setup: all video/audio/HLS ops, image-intelligent ops (faces, bodies, cars, codes, labels, score, blindwatermark-embed/extract), smart crop (
crop:g=auto
/
crop:g=face
), face blur (
blur:g=face
/
blur:g=faces
). Only
blindwatermark-extract
requires the project name as a direct
process.py
input.

视频/音频处理和图片智能功能需要绑定到存储桶的IMM项目。依赖IMM的操作遵循以下工作流程:
步骤1——检测IMM项目(任何处理命令前):
bash
python scripts/imm_admin.py auto-setup --bucket <bucket> --region <region>
该命令确保存储桶已绑定到可用的IMM项目,并打印解析后的项目名称。
步骤2——执行媒体操作
bash
python scripts/process.py --source video.mp4 \
  --operations "video/convert:f=mp4,vcodec=h264" \
  --output-mode save --target-key output/video.mp4
对于
blindwatermark-extract
,若不想依赖可选的
ALIBABA_CLOUD_IMM_PROJECT
回退值,附加
--imm-project <project_name>
步骤3——按执行与输出工作流程展示结果
需要IMM存储桶设置的操作:所有视频/音频/HLS操作、图片智能操作(faces、bodies、cars、codes、labels、score、blindwatermark-embed/extract)、智能裁剪(
crop:g=auto
/
crop:g=face
)、人脸模糊(
blur:g=face
/
blur:g=faces
)。仅
blindwatermark-extract
需要将项目名称作为
process.py
的直接输入。

Available Operations

可用操作

Image Processing (Sync)

图片处理(同步)

OperationDescriptionReference
resize
,
crop
,
indexcrop
,
rotate
,
flip
Basic transformations
references/image-basic-operations.md
quality
,
format
,
interlace
Quality & format
references/image-basic-operations.md
watermark
,
blur
,
sharpen
,
bright
,
contrast
Effects
references/image-basic-operations.md
auto-orient
,
circle
,
rounded-corners
Utilities
references/image-basic-operations.md
info
,
average-hue
Metadata (JSON)
references/image-basic-operations.md
操作描述参考
resize
crop
indexcrop
rotate
flip
基础变换
references/image-basic-operations.md
quality
format
interlace
质量与格式
references/image-basic-operations.md
watermark
blur
sharpen
bright
contrast
特效
references/image-basic-operations.md
auto-orient
circle
rounded-corners
工具类
references/image-basic-operations.md
info
average-hue
元数据(JSON)
references/image-basic-operations.md

Image-Intelligent (IMM)

图片智能处理(IMM)

OperationModeDescriptionReference
blindwatermark-embed
SyncEmbed invisible watermark. Must be last in chain.
references/image-imm-operations.md
blindwatermark-extract
AsyncExtract watermark. Use alone.
references/image-imm-operations.md
faces
,
bodies
,
cars
SyncDetect faces/bodies/cars (JSON).
references/image-imm-operations.md
codes
,
labels
,
score
SyncQR/barcode recognition, labels, quality score (JSON).
references/image-imm-operations.md
操作模式描述参考
blindwatermark-embed
同步嵌入不可见水印。必须是操作链的最后一步。
references/image-imm-operations.md
blindwatermark-extract
异步提取水印。单独使用。
references/image-imm-operations.md
faces
bodies
cars
同步检测人脸/人体/车辆(JSON)。
references/image-imm-operations.md
codes
labels
score
同步二维码/条形码识别、标签、质量评分(JSON)。
references/image-imm-operations.md

Video Processing

视频处理

OperationModeDescriptionReference
video/convert
AsyncTranscode video. Must specify
vcodec
for re-encode.
references/video-operations.md
video/snapshot
SyncExtract single frame.
t
(time ms) required.
references/video-operations.md
video/info
SyncVideo metadata (JSON).
references/video-operations.md
video/animation
AsyncVideo to GIF/WebP.
references/video-operations.md
video/snapshots
AsyncMulti-frame extraction. target-key must NOT have extension.
references/video-operations.md
video/sprite
AsyncSprite sheet. Must specify
num
or
inter
.
references/video-operations.md
video/concat
AsyncConcatenate videos (max 11). Must verify input params match.
references/video-operations.md
操作模式描述参考
video/convert
异步视频转码。重新编码必须指定
vcodec
references/video-operations.md
video/snapshot
同步提取单帧。必须指定
t
(时间,毫秒)。
references/video-operations.md
video/info
同步视频元数据(JSON)。
references/video-operations.md
video/animation
异步视频转GIF/WebP。
references/video-operations.md
video/snapshots
异步多帧提取。目标键不得包含扩展名。
references/video-operations.md
video/sprite
异步雪碧图。必须指定
num
inter
references/video-operations.md
video/concat
异步视频拼接(最多11个)。必须验证输入参数匹配。
references/video-operations.md

Audio Processing

音频处理

OperationModeDescriptionReference
audio/convert
AsyncTranscode audio.
references/audio-operations.md
audio/concat
AsyncConcatenate audio files.
references/audio-operations.md
audio/info
SyncAudio metadata (JSON).
references/audio-operations.md
操作模式描述参考
audio/convert
异步音频转码。
references/audio-operations.md
audio/concat
异步音频文件拼接。
references/audio-operations.md
audio/info
同步音频元数据(JSON)。
references/audio-operations.md

HLS Streaming

HLS流媒体

OperationModeDescriptionReference
hls/m3u8
SyncHLS playlist (returns a playlist, not a file — use
--output-mode url
).
references/video-operations.md
操作模式描述参考
hls/m3u8
同步HLS播放列表(返回播放列表,非文件——使用
--output-mode url
)。
references/video-operations.md

File Operations

文件操作

OperationModeDescription
upload
SyncUpload local file/URL to OSS. Use with
--uri
and
--target-key
.
download
SyncDownload OSS object. Use with
--source
and
--output-path
.

操作模式描述
upload
同步上传本地文件/URL到OSS。配合
--uri
--target-key
使用。
download
同步下载OSS对象。配合
--source
--output-path
使用。

Processing Modes

处理模式

  • Synchronous (
    x-oss-process
    ): image basic processing,
    video/snapshot
    ,
    video/info
    ,
    audio/info
    ,
    hls/m3u8
    , AI detection — results returned immediately
  • Asynchronous (
    x-oss-async-process
    ): video/audio transcoding, animation, sprite, snapshots, concat, blindwatermark-extract — auto-detected, auto-polled until completion

  • 同步
    x-oss-process
    ):图片基础处理、
    video/snapshot
    video/info
    audio/info
    hls/m3u8
    、AI检测——结果即时返回
  • 异步
    x-oss-async-process
    ):视频/音频转码、动图生成、雪碧图、多帧截图、拼接、盲水印提取——自动检测,自动轮询直到完成

Usage

使用方法

bash
python scripts/process.py \
  [--bucket BUCKET_NAME] \
  [--region REGION_ID] \
  (--source OSS_OBJECT_KEY | --uri URI) \
  --operations OPERATION [OPERATION ...] \
  [--output-mode url|download|save] \
  [--expires SECONDS] \
  [--output-path LOCAL_PATH] \
  [--target-key OSS_TARGET_KEY] \
  [--endpoint CUSTOM_ENDPOINT] \
  [--imm-project IMM_PROJECT_NAME] \
  [--dry-run]
--imm-project
is only consumed by
blindwatermark-extract
; other operations rely on IMM bucket binding, not this flag.
bash
python scripts/process.py \
  [--bucket BUCKET_NAME] \
  [--region REGION_ID] \
  (--source OSS_OBJECT_KEY | --uri URI) \
  --operations OPERATION [OPERATION ...] \
  [--output-mode url|download|save] \
  [--expires SECONDS] \
  [--output-path LOCAL_PATH] \
  [--target-key OSS_TARGET_KEY] \
  [--endpoint CUSTOM_ENDPOINT] \
  [--imm-project IMM_PROJECT_NAME] \
  [--dry-run]
--imm-project
仅被
blindwatermark-extract
使用;其他操作依赖IMM存储桶绑定,而非此标志。

--uri

--uri

Process a file from a local file path or URL (http/https) without pre-uploading. The script auto-uploads to a temp key, processes, and cleans up.
--uri
and
--source
are mutually exclusive.
无需预上传,直接处理本地文件路径或URL(http/https)的文件。脚本会自动上传到临时键、处理并清理。
--uri
--source
互斥。

--dry-run

--dry-run

Prints the generated process string and operation details as JSON to stdout, then exits without connecting to OSS.
将生成的处理字符串和操作详情以JSON格式打印到标准输出,然后退出,不连接到OSS。

Operation String Format

操作字符串格式

Each operation:
name:key=value,key=value
. No-param operations use just the name (e.g.,
info
,
video/info
). Video/audio operations use slash notation:
video/convert
,
audio/convert
.
每个操作格式:
name:key=value,key=value
。无参数操作仅使用名称(例如
info
video/info
)。视频/音频操作使用斜杠表示法:
video/convert
audio/convert

End-to-End Example

端到端示例

User request:
text
Resize `images/photo.jpg` in OSS to width 600px, add a bottom-right text watermark `Copyright 2026`, and download the result locally. The bucket is `my-media-bucket` in region `cn-shanghai`.
Command:
bash
python scripts/process.py --bucket my-media-bucket --region cn-shanghai \
  --source images/photo.jpg \
  --operations "resize:w=600" "watermark:text=Copyright 2026,g=se,opacity=60,size=30" \
  --output-mode download \
  --output-path "$WORKSPACE_OUTPUT/photo-watermarked.jpg"
Expected result shape:
json
{
  "success": true,
  "mode": "download",
  "path": "/absolute/path/to/outputs/photo-watermarked.jpg",
  "size": 12345,
  "request_id": "xxxxxx"
}
Interpretation:
  • success: true
    means OSS processing completed successfully.
  • path
    is the local file path you should present to the user.
  • request_id
    is the server-side request trace ID for troubleshooting.
用户请求
text
将OSS中`images/photo.jpg`调整为宽度600px,添加右下角文字水印`Copyright 2026`,并将结果下载到本地。存储桶为`my-media-bucket`,地域为`cn-shanghai`。
命令
bash
python scripts/process.py --bucket my-media-bucket --region cn-shanghai \
  --source images/photo.jpg \
  --operations "resize:w=600" "watermark:text=Copyright 2026,g=se,opacity=60,size=30" \
  --output-mode download \
  --output-path "$WORKSPACE_OUTPUT/photo-watermarked.jpg"
预期结果格式
json
{
  "success": true,
  "mode": "download",
  "path": "/absolute/path/to/outputs/photo-watermarked.jpg",
  "size": 12345,
  "request_id": "xxxxxx"
}
解释:
  • success: true
    表示OSS处理成功完成。
  • path
    是应展示给用户的本地文件路径。
  • request_id
    是服务器端请求跟踪ID,用于故障排查。

Additional Examples

更多示例

bash
undefined
bash
undefined

HLS streaming (with IMM auto-setup)

HLS流媒体(配合IMM自动设置)

python scripts/imm_admin.py auto-setup --bucket my-bucket --region cn-hangzhou
python scripts/imm_admin.py auto-setup --bucket my-bucket --region cn-hangzhou

→ Capture project name from output

→ 从输出中获取项目名称

python scripts/process.py --bucket my-bucket --region cn-hangzhou
--source videos/input.mp4
--operations "hls/m3u8:ss=15000,t=1800000,vcodec=h264,fps=25,s=1280x720,vb=2000000,acodec=aac,ab=128000"
--output-mode url
python scripts/process.py --bucket my-bucket --region cn-hangzhou
--source videos/input.mp4
--operations "hls/m3u8:ss=15000,t=1800000,vcodec=h264,fps=25,s=1280x720,vb=2000000,acodec=aac,ab=128000"
--output-mode url

Upload a local file to OSS

上传本地文件到OSS

python scripts/process.py --bucket my-bucket --region cn-hangzhou
--uri /path/to/report.pdf --operations upload --target-key documents/report.pdf
python scripts/process.py --bucket my-bucket --region cn-hangzhou
--uri /path/to/report.pdf --operations upload --target-key documents/report.pdf

Download a file from OSS

从OSS下载文件

python scripts/process.py --bucket my-bucket --region cn-hangzhou
--source documents/report.pdf --operations download --output-path $WORKSPACE_OUTPUT/report.pdf
undefined
python scripts/process.py --bucket my-bucket --region cn-hangzhou
--source documents/report.pdf --operations download --output-path $WORKSPACE_OUTPUT/report.pdf
undefined

Edge Cases

边缘情况

  • watermark
    values that contain commas should be quoted. For example:
    preprocess="resize:w=200,text=demo,image/logo.png"
    .
  • video/snapshots
    target keys must not include a file extension. Use
    output/frames/frame
    , not
    output/frames/frame.jpg
    .
  • video/concat
    always performs input compatibility checks before task submission. Additional local
    ffprobe
    output validation only runs when the result is also downloaded via
    --output-path
    .
  • Async media polling defaults to 600 seconds. Override with
    --timeout-seconds <n>
    or
    ALIBABA_CLOUD_ASYNC_TIMEOUT_SECONDS
    .
  • blindwatermark-extract
    must run alone.
    blindwatermark-embed
    can follow basic image operations, but it must be the last operation in the chain.

  • 包含逗号的
    watermark
    值需加引号。例如:
    preprocess="resize:w=200,text=demo,image/logo.png"
  • video/snapshots
    的目标键不得包含文件扩展名。使用
    output/frames/frame
    ,而非
    output/frames/frame.jpg
  • video/concat
    在提交任务前始终执行输入兼容性检查。仅当结果同时通过
    --output-path
    下载时,才会执行额外的本地
    ffprobe
    输出验证。
  • 异步媒体轮询默认超时为600秒。可通过
    --timeout-seconds <n>
    ALIBABA_CLOUD_ASYNC_TIMEOUT_SECONDS
    覆盖。
  • blindwatermark-extract
    必须单独运行。
    blindwatermark-embed
    可跟随基础图片操作,但必须是操作链的最后一步。

Error Recovery

错误恢复

ErrorCauseRecovery
Repeated
AccessDenied
or
InvalidArgument
twice in a row
Configuration or authorization is still unresolved, and blind retries risk fabricated diagnosisStop immediately. Do not simulate output, do not fabricate logs, and do not keep retrying
process.py
. Run
aliyun configure list
to verify the active CLI profile, then check RAM permissions with
python scripts/check_permissions.py
or the relevant RAM policy setup. If you changed dependencies, env vars, or CLI configuration while recovering, rerun
python scripts/load_env.py
and
aliyun configure list
before any next
process.py
attempt.
task_id: null
IMM project not bound to bucket, or
blindwatermark-extract
missing
--imm-project
/
ALIBABA_CLOUD_IMM_PROJECT
Run
python scripts/imm_admin.py auto-setup --bucket <b> --region <r>
first; for
blindwatermark-extract
, also pass
--imm-project <project>
if needed
NoSuchKey
Source file does not exist in OSSCheck
--source
path, or upload first with
--uri
and
upload
operation
AccessDenied
/
403
RAM policy missing required permissionsRun
python scripts/check_permissions.py
for diagnosis
InvalidArgument
Wrong parameter format or unsupported combinationCheck parameter spelling; verify against
references/
docs
Async timeout / polling exceeds limitJob too large or queue backlogNote the
task_id
, tell user to retry later; do NOT use
sleep
loops

错误原因恢复方法
连续两次出现
AccessDenied
InvalidArgument
配置或授权问题仍未解决,盲目重试可能导致错误诊断立即停止。切勿模拟输出、伪造日志或持续重试
process.py
。运行
aliyun configure list
验证当前CLI配置文件,然后使用
python scripts/check_permissions.py
或相关RAM策略设置检查RAM权限。若在恢复过程中更改了依赖、环境变量或CLI配置,在下次尝试
process.py
前重新运行
python scripts/load_env.py
aliyun configure list
task_id: null
IMM项目未绑定到存储桶,或
blindwatermark-extract
缺少
--imm-project
/
ALIBABA_CLOUD_IMM_PROJECT
先运行
python scripts/imm_admin.py auto-setup --bucket <b> --region <r>
;对于
blindwatermark-extract
,若需要还需传递
--imm-project <project>
NoSuchKey
源文件在OSS中不存在检查
--source
路径,或先使用
--uri
upload
操作上传文件
AccessDenied
/
403
RAM策略缺少必要权限运行
python scripts/check_permissions.py
进行诊断
InvalidArgument
参数格式错误或不支持的组合检查参数拼写;对照
references/
文档验证
异步超时 / 轮询超过限制任务过大或队列积压记录
task_id
,告知用户稍后重试;切勿使用
sleep
循环

Quick References

快速参考

  • Parameter details:
    references/image-basic-operations.md
    ,
    references/image-imm-operations.md
    ,
    references/video-operations.md
    ,
    references/audio-operations.md
  • RAM Permissions:
    references/ram-policies.md
  • Format Support & Limitations:
    references/limitations.md
  • IMM Administration:
    references/imm-admin.md
  • 参数详情
    references/image-basic-operations.md
    references/image-imm-operations.md
    references/video-operations.md
    references/audio-operations.md
  • RAM权限
    references/ram-policies.md
  • 格式支持与限制
    references/limitations.md
  • IMM管理
    references/imm-admin.md