Loading...
Loading...
Found 9 Skills
Search products on Chinese e-commerce platforms including Taobao, Tmall, 1688, and AliExpress. Parse product links, get product details, compare prices across platforms. Use when the user asks to find products, get product info by URL, search Chinese suppliers, or compare prices.
Search products across Chinese e-commerce platforms: Taobao, Tmall, XHS (小红书). Zero-config — no API keys needed. Powered by Shopme unified product database. Use when the user asks to find products, get product info by URL or ID, search Chinese suppliers, or compare prices.
Query NVIDIA PTX ISA 9.1, CUDA Runtime API 13.1, Driver API 13.1, Programming Guide v13.1, Best Practices Guide, Nsight Compute, Nsight Systems local documentation. Debug and optimize GPU kernels with nsys/ncu/compute-sanitizer workflows. Use when writing, debugging, or optimizing CUDA code, GPU kernels, PTX instructions, inline PTX, TensorCore operations (WMMA, WGMMA, TMA, tcgen05), or when the user mentions CUDA API functions, error codes, device properties, memory management, profiling, GPU performance, compute capabilities, CUDA Graphs, Cooperative Groups, Unified Memory, dynamic parallelism, or CUDA programming model concepts.
Write, debug, and optimize CUTLASS and CuTeDSL GPU kernels using local source code, examples, and header references. Use when the user mentions CUTLASS, CuTe, CuTeDSL, cute::Layout, cute::Tensor, TiledMMA, TiledCopy, CollectiveMainloop, CollectiveEpilogue, GEMM kernel, grouped GEMM, sparse GEMM, flash attention CUTLASS, blackwell GEMM, hopper GEMM, FP8 GEMM, blockwise scaling, MoE GEMM, StreamK, warp specialization CUTLASS, TMA CUTLASS, or asks about writing high-performance CUDA kernels with CUTLASS/CuTe templates.
Write, debug, and optimize Triton and Gluon GPU kernels using local source code, tutorials, and kernel references. Use when the user mentions Triton, Gluon, tl.load, tl.store, tl.dot, triton.jit, gluon.jit, wgmma, tcgen05, TMA, tensor descriptor, persistent kernel, warp specialization, fused attention, matmul kernel, kernel fusion, tl.program_id, triton autotune, MXFP, FP8, FP4, block-scaled matmul, SwiGLU, top-k, or asks about writing GPU kernels in Python.
Workflow for learning CuTe Python DSL by reading, importing, profiling, and extracting reusable patterns from CUTLASS Blackwell example kernels. Use when: (1) studying CUTLASS CuTe DSL reference implementations, (2) importing CUTLASS examples into the project runtime infrastructure, (3) building CuTe DSL knowledge base entries from profiling experiments, (4) understanding CuTe DSL API patterns, TMA pipelining, warpgroup scheduling, or persistent kernel structure.
Comprehensive guide for creating Telegram Mini Apps with React using @tma.js/sdk-react. Covers SDK initialization, component mounting, signals, theming, back button handling, viewport management, init data, deep linking, and environment mocking for development. Use when building or debugging Telegram Mini Apps with React.
CuTe Python DSL kernel workflow, CuteKernel runtime wrapper, suitability gate, tiling guidance, and CuTe-specific pitfalls. Use when: (1) planning or implementing a kernel in the CuTe Python DSL, (2) the optimization needs more explicit control than cuTile exposes but should remain in a Python-driven workflow, (3) defining package naming for cute-dsl kernels, (4) documenting CuTe Python DSL design choices, (5) recording language-specific knowledge for CuTe Python DSL.
CuTe Python DSL API reference and implementation patterns for NVIDIA GPU kernel programming. Provides execution model, core API table, key constraints, common patterns, and documentation index. Use when: (1) writing or modifying CuTe DSL kernel code, (2) looking up CuTe DSL API syntax, (3) implementing attention/GEMM/MLA patterns in CuTe DSL, (4) understanding CuTe DSL execution model and compilation pipeline, (5) checking what CuTe DSL can and cannot do.