Skill4Agent
Skill4Agent
All SkillsSearchTools
|
Explore
Skill4Agent
Skill4Agent

AI Agent Skills Directory with categorization, English/Chinese translation, and script security checks.

Sitemap

  • Home
  • All Skills
  • Search
  • Tools

About

  • About Us
  • Disclaimer
  • Copyright

Help

  • FAQ
  • Privacy
  • Terms
Contact Us:osulivan147@qq.com

© 2026 Skill4Agent. All rights reserved.

All Skills

Total 39,389 skills

Categories

Showing 12 of 39389 skills

Per page
Downloads
Sort
AI & Machine Learningpepperu96/hyper-mla

mla-analysis

MLA (Multi-Latent Attention) cost models, regime analysis, and kernel selection guide. Use when: (1) reasoning about which kernel approach to use for a given regime, (2) understanding cost model tradeoffs between FlashMLA, FlashAttention, and MLAvar6+, (3) analyzing roofline behavior across decode/speculative/prefill regimes, (4) setting optimization targets, (5) understanding MLA math and absorption trick.

🇺🇸|EnglishTranslated
3
AI & Machine Learningbbuf/sglang-auto-driven-s...

sglang-prod-incident-triage

Replay-first debug flow for SGLang serving problems. Use when a live or recent server shows health-check failures, latency or throughput regressions, queue growth, timeouts, distributed stalls, crash dumps, wrong outputs after deploys, or PD/EP/HiCache issues, and the job is to turn the problem into a replay plus the right next debug tool.

🇺🇸|EnglishTranslated
3
2 scripts/Attention
AI & Machine Learningkiterlin/intelligent-dete...

training-llms-megatron

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

🇺🇸|EnglishTranslated
3
AI & Machine Learningpepperu96/hyper-mla

optimization-catalog-cute-dsl

Shared optimization guidance plus CuTe Python DSL overlays. Use when: (1) selecting optimizations for a CuTe Python DSL kernel, (2) deciding whether a finding is shared or cute-dsl-specific, (3) recording CuTe Python DSL implementation notes, (4) reviewing the knowledge layout for cute-dsl work, (5) mapping shared patterns to a CuTe Python DSL implementation surface.

🇺🇸|EnglishTranslated
3
AI & Machine Learningpepperu96/hyper-mla

design-cutile-dsl-kernel

cuTile Python DSL kernel implementation patterns, CtKernel runtime wrapper, suitability gate, and cuTile-specific pitfalls. Use when: (1) creating or modifying a cuTile Python DSL kernel version, (2) implementing an optimization that still fits within cuTile's exposed control surface, (3) deciding whether cuTile is still the right DSL, (4) reviewing cuTile-specific runtime patterns. Always also load /design-kernel for shared naming, versioning, and workflow.

🇺🇸|EnglishTranslated
3
Tools & Utilitiesultimatile/cuda-x-skills

cuda-webdoc-search

Search CUDA-X library documentation (cuBLAS, cuTENSOR, cuTensorNet, cuSOLVER, etc.) to find API symbols, functions, and types. Use when you need to look up CUDA library APIs, discover available functions, or find documentation URLs for specific operations.

🇺🇸|EnglishTranslated
3
12 scripts/Attention
AI & Machine Learningpepperu96/hyper-mla

optimization-catalog

Compatibility router for the shared optimization knowledge base and the language-specific optimization catalog skills. Use when: (1) selecting which optimization catalog skill to load, (2) the implementation language is not fixed yet, (3) a workflow still references the legacy optimization-catalog skill name, (4) deciding whether a finding is shared or language-specific, (5) updating the generalized knowledge-base structure.

🇺🇸|EnglishTranslated
3
AI & Machine Learningbbuf/sglang-auto-driven-s...

h100

SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/sgl-workspace/sglang`, and use the ready H100 remote environment for SGLang development and validation. Use when a task needs remote CUDA work, GPU-backed smoke tests, diffusion checks, or a safe remote copy instead of local-only execution.

🇺🇸|EnglishTranslated
3
Backend Developmentpepperu96/hyper-mla

design-kernel

Shared kernel design workflow across all supported languages and DSLs. Provides language selection table, naming conventions, versioning rules, KernelPlan structure, composition patterns, clone workflow, implementation workflow, devlog template, and designer output contract. Use when: (1) choosing which language-specific kernel design skill to load, (2) the intended implementation language is not fixed yet, (3) you need naming or versioning guidance before selecting a DSL, (4) you are implementing any kernel regardless of DSL, (5) you are updating docs that refer to kernel design skills.

🇺🇸|EnglishTranslated
3
AI & Machine Learningkiterlin/intelligent-dete...

ml-paper-writing

Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation verification workflows.

🇺🇸|EnglishTranslated
3
AI & Machine Learningbbuf/sglang-auto-driven-s...

sglang-torch-profiler-analysis

Compact SGLang torch-profiler triage skill. Use when Codex should inspect an existing `trace.json(.gz)` or profile directory, trigger `sglang.profiler` against a live server, and return one compact report with kernel, overlap-opportunity, and fuse-pattern tables. Single-trace triage is enough for quick diagnosis; mapping+formal two-trace triage gives stronger overlap conclusions.

🇺🇸|EnglishTranslated
3
4 scripts/Checked
AI & Machine Learningkiterlin/intelligent-dete...

weights-and-biases

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform

🇺🇸|EnglishTranslated
3
1...19941995199619971998...3283
Page