Loading...
Loading...
Found 14 Skills
Azure Event Hubs SDK for Python streaming. Use for high-throughput event ingestion, producers, consumers, and checkpointing. Triggers: "event hubs", "EventHubProducerClient", "EventHubConsumerClient", "streaming", "partitions".
Reviews LangGraph code for bugs, anti-patterns, and improvements. Use when reviewing code that uses StateGraph, nodes, edges, checkpointing, or other LangGraph features. Catches common mistakes in state management, graph structure, and async patterns.
How WAL mechanics, checkpointing, concurrency rules, recovery work in tursodb
Implements stateful agent graphs using LangGraph. Use when building graphs, adding nodes/edges, defining state schemas, implementing checkpointing, handling interrupts, or creating multi-agent systems with LangGraph.
LangGraph workflow patterns for state management, routing, parallel execution, supervisor-worker, tool calling, checkpointing, human-in-loop, streaming, subgraphs, and functional API. Use when building LangGraph pipelines, multi-agent systems, or AI workflows.
Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
This skill should be used for multi-session autonomous agent work requiring progress checkpointing, failure recovery, and task dependency management. Triggers on '/harness' command, or when a task involves many subtasks needing progress persistence, sleep/resume cycles across context windows, recovery from mid-task failures with partial state, or distributed work across multiple agent sessions. Synthesized from Anthropic and OpenAI engineering practices for long-running agents.
Orchestrate the full paper pipeline end-to-end. Manage state propagation between phases (literature → plan → code → experiments → figures → tables → writing → review), support checkpointing and resumption. Use for assembling a complete paper from components.
Expert GPU optimization for modern consumer GPUs (8-24GB VRAM). Use this skill when you need to optimize GPU training, speed up CUDA code, reduce OOM errors, tune XGBoost for GPU, migrate NumPy to CuPy, make a model faster, manage GPU memory, optimize VRAM usage, or benchmark PyTorch. Covers mixed precision, gradient checkpointing, XGBoost GPU acceleration, CuPy/cuDF migration, vectorization, torch.compile, and diagnostics. NVIDIA GPUs only. PyTorch, XGBoost, and RAPIDS frameworks.
Refactor PyTorch code to improve maintainability, readability, and adherence to best practices. Identifies and fixes DRY violations, long functions, deep nesting, SRP violations, and opportunities for modular components. Applies PyTorch 2.x patterns including torch.compile optimization, Automatic Mixed Precision (AMP), optimized DataLoader configuration, modular nn.Module design, gradient checkpointing, CUDA memory management, PyTorch Lightning integration, custom Dataset classes, model factory patterns, weight initialization, and reproducibility patterns.
LangGraph checkpointing and persistence. Use when implementing fault-tolerant workflows, resuming interrupted executions, debugging with state history, or avoiding re-running expensive operations.