ascendc-operator-dev

Original🇨🇳 Chinese
Translated

AscendC Operator End-to-End Development Orchestrator. Used when users need to develop new operators, implement custom operators, or complete the full process from requirements to testing. Keywords: operator development, end-to-end, full process, workflow orchestration, new operator creation.

5installs
Added on

NPX Install

npx skill4agent add ascend/agent-skills ascendc-operator-dev

SKILL.md Content (Chinese)

View Translation Comparison →

AscendC Operator End-to-End Development Orchestration

Skill Type: Process-oriented (seven-stage workflow with serial orchestration of sub-skills)
This skill orchestrates seven sub-skills to drive ascend-kernel operators from scratch to production-ready.

Core Principles

  1. Seven-stage Serial Execution: Project Initialization → Design Documentation → Test Case Generation → Code Generation & Testing → Interface Documentation → Precision Evaluation → Performance Benchmarking, executed in strict order
  2. Sub-skill Execution: Each stage MUST call the corresponding sub-skill, no self-implementation allowed
  3. Stage Gating: Proceed to the next stage only after all checkpoints of the previous stage are passed
  4. Design-driven Coding: Code generation depends on the Tiling strategy and UB allocation table in the design document
  5. Automated Design: No need for users to provide pre-prepared design documents; the design stage generates them automatically
  6. Unified Test Case Generation: Generate test case documents immediately after design completion for reuse in subsequent precision evaluation and performance benchmarking
  7. Documentation Closure: After passing compilation and testing, MUST generate Chinese interface documents in PyTorch style and display them in the chat interface
  8. Precision Closure: Operators must pass ≥30 comprehensive precision evaluation cases to be considered complete
  9. Performance Closure: Operators must pass msprof performance comparison and benchmarking, with a performance report output
  10. Result Visualization: Results of Phase 4/5/6/7 MUST be directly displayed in the chat interface in Markdown format, do not only output paths

Available Sub-skill List

SkillPathResponsibility
ascendc-operator-project-init
ascendc-operator-project-init/SKILL.md
Detect/create ascend-kernel project, generate operator skeleton directory
ascendc-operator-design
ascendc-operator-design/SKILL.md
Analyze operator requirements, generate design document (including Tiling strategy, UB allocation table)
ascendc-operator-testcase-gen
ascendc-operator-testcase-gen/SKILL.md
Generate unified test case document based on design document for reuse in precision evaluation and performance benchmarking
ascendc-operator-code-gen
ascendc-operator-code-gen/SKILL.md
Generate op_host/op_kernel code, framework adaptation, compilation testing
ascendc-operator-compile-debug
ascendc-operator-compile-debug/SKILL.md
Compile, install whl package, generate test files, run precision tests (called internally by code-gen)
ascendc-operator-doc-gen
ascendc-operator-doc-gen/SKILL.md
Extract interface information from source code, generate Chinese API documents in PyTorch style (mandatory stage)
ascendc-operator-precision-eval
ascendc-operator-precision-eval/SKILL.md
Generate ≥30 precision test cases, run them and output precision verification report (mandatory stage)
ascendc-operator-performance-eval
ascendc-operator-performance-eval/SKILL.md
Use msprof to compare performance between project operators and native operators, output performance benchmarking report (mandatory stage)

Workflow Overview

Phase 1        Phase 2        Phase 3        Phase 4                      Phase 5        Phase 6         Phase 7
Project Init  ──▶  Design Doc  ──▶  Test Case Gen  ──▶  Code Gen + Framework Adaptation + Compile Test  ──▶  Interface Doc  ──▶  Precision Eval Report  ──▶  Performance Benchmark Report
project-init   design         testcase-gen   code-gen → compile-debug      doc-gen        precision-eval  performance-eval

Input: Operator Name + Function Description                              Output: Production-ready Operator + Test Case Doc + Interface Doc + Precision Report + Performance Report

Anti-pattern List (NEVER DO THESE)

  • ❌ Do not skip the design stage and directly write code
  • ❌ Do not skip the test case generation stage; Phase 3 (testcase-gen) must be executed after Phase 2 is passed
  • ❌ Do not implement any operator code by yourself, must call sub-skills
  • ❌ Do not modify framework files (ops.h / register.cpp / CMakeLists.txt) before code generation
  • ❌ Do not manually execute compilation and testing, handle uniformly via compile-debug skill
  • ❌ Do not reference non-existent skills
  • ❌ Do not skip checkpoint verification
  • ❌ Do not skip the interface documentation stage; Phase 5 must be executed after Phase 4 is passed
  • ❌ Do not skip the precision evaluation stage; Phase 6 must be executed after Phase 5 is passed
  • ❌ Do not skip the performance benchmarking stage; Phase 7 must be executed after Phase 6 is passed
  • ❌ Do not use timing methods other than msprof as performance conclusions
  • ❌ Do not design test cases for precision evaluation and performance benchmarking by yourself, must first read the test case document generated by testcase-gen

Phase 0: Requirements Collection

Goal: Confirm the minimum information set required for operator development, including development environment and operator requirements

Step 0.1: Environment Confirmation (MUST be completed before any development action)

The development environment is a prerequisite for all subsequent stages, must be confirmed first.

CANN Environment

Automatic Detection Process:
  1. Check if the environment variable
    ASCEND_HOME_PATH
    is set (
    echo $ASCEND_HOME_PATH
    )
  2. If set: Use it directly as
    CANN_PATH
    without asking the user
  3. If not set: MUST ask the user for the CANN installation path (e.g.,
    /usr/local/Ascend/ascend-toolkit
    )
Activation Method:
bash
source ${CANN_PATH}/*/set_env.sh
In every Shell session that requires compiling or running operators, this activation command must be executed first.

Conda Environment

Automatic Detection Process:
  1. Check if a conda environment is currently activated (
    echo $CONDA_DEFAULT_ENV
    )
  2. If activated (value is not
    base
    and not empty): Use the current environment directly without asking the user
  3. If not activated or is
    base
    : MUST ask the user for the name of the conda environment to use
Activation Method:
bash
conda activate <env_name>
In every Shell session that requires compiling or running operators, the conda environment must be activated first.

Environment Confirmation Checkpoints

  • CANN path is confirmed (auto-detected or provided by user)
  • source ${CANN_PATH}/*/set_env.sh
    can be executed normally
  • Conda environment name is confirmed (auto-detected or provided by user)
  • conda activate <env_name>
    can be executed normally

Step 0.2: Operator Requirements Collection

Mandatory Information to Confirm

InformationFormat RequirementMandatoryDescription
CANN Environment PathAbsolute pathYesAuto-detect
$ASCEND_HOME_PATH
, ask user if not set
Conda Environment NameStringYesAuto-detect
$CONDA_DEFAULT_ENV
, ask user if not activated
Operator Namesnake_caseYese.g.,
acosh
,
rms_norm
,
flash_attn
Function DescriptionText/Mathematical FormulaYese.g., "Inverse hyperbolic cosine acosh(x) = ln(x + sqrt(x²-1))"
Optional Information (with default values):
InformationDefault ValueDescription
Supported Data Typesfloat16, float32Can be extended to bfloat16
SoC Platformascend910bAuto-obtained via platform API

Decision Tree

User RequestHandling Method
"Generate X operator" / "Develop X operator"Complete environment confirmation (Step 0.1) first, then infer the function from the operator name, and execute the full process directly after confirmation
"Help me develop a new operator" (no specific name)Complete environment confirmation (Step 0.1) first, then ask for the operator name and function description
"Continue operator development"Complete environment confirmation (Step 0.1) first, then check existing files to determine the stage and resume from the interrupted point

Acceptance Criteria

  • CANN environment path is confirmed and can be activated
  • Conda environment name is confirmed and can be activated
  • Operator name is confirmed (snake_case format)
  • Function description is clear (including mathematical formula or calculation logic)

Phase 1: Project Initialization

Called Skill:
ascendc-operator-project-init

Execution Content

MANDATORY: Execute according to the ascendc-operator-project-init skill process:
1. Detect if the ascend-kernel project exists
2. Copy from template if it does not exist
3. Create operator skeleton under csrc/ops/<op_name>/
4. Prompt three registration update points

Checkpoints

  • ascend-kernel project exists (build.sh, CMakeLists.txt, csrc/)
  • csrc/ops/<op_name>/
    directory has been created
  • Contains
    op_host/<op_name>.cpp
    ,
    op_kernel/<op_name>.cpp
    ,
    CMakeLists.txt
    ,
    design.md
All passed → Proceed to Phase 2

Phase 2: Design Document Generation

Called Skill:
ascendc-operator-design

Execution Content

MANDATORY: Execute according to the ascendc-operator-design skill process:
1. Analyze operator requirements (name, function, data types)
2. Determine implementation path (AscendC Kernel / CATLASS / ACLNN)
3. Design Tiling strategy (Block-level + UB-level)
4. Fill in UB allocation table, derive bufferCoefficient
5. Generate complete design document to csrc/ops/<op_name>/design.md

Checkpoints

  • csrc/ops/<op_name>/design.md
    is complete in content
  • Contains function signature and supported data types
  • Contains calculation logic pseudocode (AscendC API call sequence)
  • Contains UB allocation table (lists all buffers and total coefficients)
  • Contains bufferCoefficient (value for each dtype)
All passed → Proceed to Phase 3

Phase 3: Test Case Generation

Called Skill:
ascendc-operator-testcase-gen

Execution Content

MANDATORY: Execute according to the ascendc-operator-testcase-gen skill process:
1. Read csrc/ops/<op_name>/design.md, extract parameter constraints, supported dtypes, typical shapes
2. Generate TEST_SHAPES (regular shapes), GENERAL_SHAPES (generalized shapes), BOUNDARY_VALUES (boundary values)
3. Generate operator benchmarks (CPU reference implementation, NPU calling method)
4. Output test case document to csrc/ops/<op_name>/test/<op_name>-test-cases.md

Checkpoints

  • csrc/ops/<op_name>/test/<op_name>-test-cases.md
    has been generated
  • Contains SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, BOUNDARY_VALUES
  • Contains operator benchmarks (NPU calling method + CPU reference implementation)
  • Shapes and parameter values are within the constraints of design.md
All passed → Proceed to Phase 4

Phase 4: Code Generation + Framework Adaptation + Compile Test

Called Skill:
ascendc-operator-code-gen
(internally calls
ascendc-operator-compile-debug
automatically)

Execution Content

MANDATORY: Execute according to the ascendc-operator-code-gen skill process:

Stage 1: Load Reference Documents
  - Read references/GUIDE.md
  - Load corresponding reference according to operator type

Stage 2: Read Design Document
  - Extract function signature, UB allocation table, calculation pseudocode

Stage 3: Select Template and Generate Code
  - Select elementwise / row template
  - Generate op_host/<op_name>.cpp (includes Tiling calculation logic)
  - Generate op_kernel/<op_name>.cpp (includes Compute calculation logic)

Stage 4: Framework Adaptation
  - Update csrc/ops.h (function declaration)
  - Update csrc/register.cpp (m.def + m.impl)
  - Update csrc/CMakeLists.txt (OP_SRCS + ascendc_library)

Stage 5: Compilation, Installation and Testing (call compile-debug skill)
  - Compile via ./build.sh
  - Install via pip install whl
  - Generate tests/test_<op_name>.py
  - Run functional tests and precision tests
  - Debug up to 3 times if compilation/test fails

Checkpoints

  • op_host/<op_name>.cpp
    uses platform API to obtain hardware parameters
  • op_kernel/<op_name>.cpp
    contains complete CopyIn → Compute → CopyOut pipeline
  • Function declaration has been added to
    ops.h
  • m.def
    and
    m.impl
    have been added to
    register.cpp
  • Host and kernel source files have been added to
    csrc/CMakeLists.txt
  • Compilation is successful (whl package has been generated)
  • Functional tests pass (exit code 0)
  • All precision tests pass (pytest all green)
All passed → Proceed to Phase 5

Phase 5: Interface Document Generation

Called Skill:
ascendc-operator-doc-gen

Execution Content

MANDATORY: Execute according to the ascendc-operator-doc-gen skill process:

Stage 1: Information Extraction
  - Extract Python calling signature (m.def schema) from register.cpp
  - Extract C++ function declaration and return type from ops.h
  - Extract algorithm description, parameter description, dtype support, constraint conditions from design.md
  - Extract TORCH_CHECK constraints from op_host
  - Extract usage examples from tests/test_<op_name>.py

Stage 2: Document Structure Assembly
  - Assemble Chinese interface documents in PyTorch official documentation style
  - Includes: Title Signature + Function Description + Parameter Description + Supported Data Types + Shape + Constraint Conditions + Usage Examples + Return Value

Stage 3: File Generation
  - Generate csrc/ops/<op_name>/README.md

Stage 4: Display complete document content in the interactive interface

Checkpoints

  • Complete interface information has been extracted from source code (signature, parameters, dtype, shape, constraints)
  • README.md contains all 7 sections (title signature + function description + parameter description + supported data types + shape + constraint conditions + usage examples + return value)
  • Python calling signature is consistent with
    m.def
    in
    register.cpp
  • Parameter descriptions use PyTorch documentation style, described in Chinese
  • Code in usage examples is runnable
  • README.md has been written to
    csrc/ops/<op_name>/README.md
  • Interface document has been fully displayed in the chat interface
All passed → Proceed to Phase 6

Phase 6: Precision Evaluation Report

Called Skill:
ascendc-operator-precision-eval

Execution Content

MANDATORY: Execute according to the ascendc-operator-precision-eval skill process:

Stage 1: Load Test Case Document + Information Collection
  - Read csrc/ops/<op_name>/test/<op_name>-test-cases.md (output from testcase-gen)
  - Extract SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, BOUNDARY_VALUES, operator benchmarks
  - Supplement and extract information such as precision thresholds from existing code

Stage 2: Test Case Adaptation ((shapes + boundary) × dtypes ≥ 30 cases)
  - Directly reuse TEST_SHAPES and BOUNDARY_VALUES from testcase-gen
  - Traverse all dtypes supported by the operator for each shape / boundary value

Stage 3: Test Script Generation (output to operator directory csrc/ops/<op_name>/test/)
  - Generate test_<op_name>_precision.py (pytest format) based on template
  - Generate run_<op_name>_precision_report.py (report generator) based on template

Stage 4: Execution
  - Run pytest and all tests pass
  - Run report generator to output JSON

Stage 5: Report Generation
  - Generate <op_name>_precision_report.md (includes regular shape + boundary value table + summary + key findings)
  - Prompt the user for the report path

Checkpoints

  • Number of test cases = (shapes + boundary) × dtypes ≥ 30
  • Each dtype supported by the operator has been tested
  • All pytest precision tests pass
  • JSON report is generated (includes 5 precision metrics: MaxAbsErr / MeanAbsErr / MaxRelErr / MeanRelErr / CosineSim)
  • Markdown report is generated at
    csrc/ops/<op_name>/test/<op_name>_precision_report.md
  • Precision test results have been displayed in the chat interface in Markdown table format
  • The user has been prompted for the precision report path
All passed → Proceed to Phase 7

Phase 7: Performance Benchmarking Report

Called Skill:
ascendc-operator-performance-eval

Execution Content

MANDATORY: Execute according to the ascendc-operator-performance-eval skill process:

Stage 1: Load Test Case Document + Information Collection
  - Read csrc/ops/<op_name>/test/<op_name>-test-cases.md (output from testcase-gen)
  - Extract SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, operator benchmarks
  - Supplement and extract information such as OP Type keywords from existing code

Stage 2: Test Case Adaptation (JSONL format, ≥8 cases)
  - Select representative shapes from TEST_SHAPES + GENERAL_SHAPES of testcase-gen
  - Cover all dtypes supported by the operator
  - Convert to JSONL format

Stage 3: Script Generation (output to operator directory csrc/ops/<op_name>/test/)
  - Generate run_<op_name>_case.py (single case msprof executor) based on template
  - Generate benchmark_<op_name>_msprof.py (master control script) based on template
  - Generate <op_name>_cases.jsonl

Stage 4: Execution and Collection
  - Run the master control script, 20 iterations per case (first 10 for warm-up)
  - Extract Task Duration(us) and hardware metrics from op_summary_*.csv by OP Type
  - Output JSON results

Stage 5: Report Generation
  - Generate <op_name>_perf_report.md (includes result table + summary + brief analysis)
  - Prompt the user for the report path

Checkpoints

  • JSONL test cases cover multiple shape × dtype combinations (≥8 cases)
  • Uses
    msprof
    for collection, no other timing methods
  • Filters target operators by
    OP Type
    (not Op Name)
  • 20/10 warm-up/statistics strategy is used
  • JSON report is generated (includes Task Duration + hardware metrics)
  • Markdown report is generated at
    csrc/ops/<op_name>/test/<op_name>_perf_report.md
  • Report contains brief analysis (≥3 conclusions)
  • Performance test results have been displayed in the chat interface in Markdown table format
  • The user has been prompted for the performance report path
All passed → Operator development is complete

Inter-stage Data Flow

Phase 1 Output                    Phase 2 Input
  csrc/ops/<op_name>/    ────▶    Operator name, directory structure
  design.md (placeholder)

Phase 2 Output                    Phase 3 Input
  design.md (complete)       ────▶    Parameter constraints, supported dtypes, typical shapes
                                  → Generate unified test case document

Phase 3 Output                    Phase 4 Input
  <op_name>-test-cases.md ────▶    design.md (complete)
  (test case document for subsequent reuse)          Function signature, UB allocation table → bufferCoefficient
                                  Calculation pseudocode → Compute logic
                                  Tiling strategy → Block/UB splitting parameters

Phase 4 Output                    Phase 5 Input
  Installed operator whl        ────▶    register.cpp / ops.h / design.md /
  tests/test_<op_name>.py        op_host / test files
                                  → Extract interface information to generate documents

Phase 5 Output                    Phase 6 Input
  csrc/ops/<op>/README.md ────▶    <op_name>-test-cases.md (from Phase 3)
  Interface document completed                     Operator name, calling method, input domain constraints
                                  All supported dtypes, precision thresholds
                                  → Output to csrc/ops/<op_name>/test/

Phase 6 Output                    Phase 7 Input
  Precision report passed             ────▶    <op_name>-test-cases.md (from Phase 3)
  csrc/ops/<op>/test/            Operator name, project/native calling method
                                  All supported dtypes, OP Type keywords
                                  → Output to csrc/ops/<op_name>/test/

Status Tracking Table

PhasePreconditionCalled SkillKey Deliverables
0. Requirements CollectionNoneCANN path + Conda environment + Operator name + Function description
1. Project InitializationPhase 0
ascendc-operator-project-init
Operator skeleton directory
2. Design DocumentPhase 1
ascendc-operator-design
design.md (includes Tiling + UB allocation table)
3. Test Case GenerationPhase 2
ascendc-operator-testcase-gen
<op_name>-test-cases.md
(unified test case document)
4. Code & TestingPhase 3
ascendc-operator-code-gen
compile-debug
Runnable operator + basic tests passed
5. Interface DocumentPhase 4
ascendc-operator-doc-gen
PyTorch-style Chinese API document (README.md)
6. Precision EvaluationPhase 5
ascendc-operator-precision-eval
≥30 precision test cases + precision report
7. Performance BenchmarkingPhase 6
ascendc-operator-performance-eval
msprof performance comparison + performance report

Error Recovery

Resume from Interrupted Point

When the user says "Continue operator development":
Detection ConditionDetermined StageRecovery Action
csrc/ops/<op_name>/
does not exist
Phase 1 not completedStart from Phase 1
design.md
is placeholder or empty
Phase 2 not completedStart from Phase 2
csrc/ops/<op_name>/test/<op_name>-test-cases.md
does not exist
Phase 3 not completedStart from Phase 3
op_host/
still contains skeleton code
Phase 4 not completedStart from Phase 4
whl package not generatedPhase 4 compilation not completedResume from compilation step
Basic tests not passedPhase 4 testing not completedResume from testing step
csrc/ops/<op_name>/README.md
does not exist
Phase 5 not completedStart from Phase 5
No precision report in
csrc/ops/<op_name>/test/
Phase 6 not startedStart from Phase 6
Precision report does not exist or precision tests not all passedPhase 6 not completedResume from Phase 6
Precision report exists but performance report does notPhase 7 not startedStart from Phase 7
<op_name>_perf_report.md
does not exist or is incomplete
Phase 7 not completedResume from Phase 7

Compilation/Test Failure

Handled internally by
ascendc-operator-compile-debug
skill, up to 3 debugging attempts. If it still fails after 3 times, stop and report detailed errors to the user.