Loading...
Loading...
Found 1,066 Skills
Route low-risk coding tasks to cheaper LLMs while keeping Codex for high-risk decisions, using MCP tools for cost-aware delegation
A minimal teaching framework for understanding AI Agent architecture with core loop, fake LLM interface, and skill discovery system
Structured learning roadmap for AI Agent development from LLM basics to multi-agent systems (bilingual Chinese/English)
Local proxy that lets OpenAI Codex CLI/desktop talk to MiMo, DeepSeek, and other LLMs via Responses API translation
Sync, search, and classify X/Twitter bookmarks locally with full-text search, LLM classification, and agent integration
Control and automate real browser sessions through CDP, preserving login state and cookies for LLM-driven interactions
Generate llms.txt and llms-full.txt files for a website to improve AI discoverability. Use when the user asks to create llms.txt, generate llms.txt, fix llms.txt, make site AI-readable, or mentions llms.txt generation.
Guides research engineering and science on LLM tokens—hypotheses about context use, tokenization, compression, and inference efficiency; rigorous benchmarks (tokens per task, quality–cost Pareto); ablation design; instrumentation and reproducible logs; and research memos that inform product decisions. Use when designing token-efficiency experiments, measuring context utilization, comparing compression or routing methods, analyzing tokenizer effects, or writing technical reports on token/cost trade-offs—not for phased cost roadmaps and owners (ai-token-improvement-plan-engineer), production context pipeline implementation (ai-context-engineer), single-prompt edits (prompt-engineer), general non-token AI research (ai-researcher), or shipping features (ai-engineer).
Guides AI ops leadership—LLM SRE, model/prompt releases, eval/incidents, cost/capacity, vendors, and cross-functional cadence. Use for AI platform ops, LLM SLAs, incidents, rollout governance, unit economics, red-team/eval gates, and team rituals—not memory (ai-memory-developer), context code (ai-context-engineer), security programs (cybersecurity), token roadmaps (ai-token-improvement-plan-engineer), solution architecture (applied-ai-architect-commercial-enterprise), skills portfolio (ai-skill-manager), or vertical AI product eng management (engineering-manager-vertical-ai-products). Prompt/eval team management and golden-set release policy: engineering-manager-agent-prompts-evals. Safeguard inference platform: ml-infrastructure-engineer-safeguards. Safeguard model research: ml-research-engineer-safeguards.
Design, test, and optimize prompts for LLM interactions. Cover prompt patterns (few-shot, chain-of-thought, ReAct), system prompt design, output formatting, prompt evaluation, and prompt optimization techniques. Triggers on "write prompt", "optimize prompt", "design system prompt", "few-shot examples", "chain of thought", "prompt evaluation", "LLM output formatting", "prompt testing", or "prompt patterns".
Adversarial robustness engineering for ML/AI—evasion, poisoning, extraction, membership-inference threat models; robust training, sanitization, detectors; ASR/certified evals; lab model attacks; data-pipeline integrity; production I/O guardrails (classical ML and LLM/multimodal). Use for adversarial examples, robustness suites, poison audits, deploy guardrails—not LLM app red team (ai-redteam), governance (ai-risk-governance), safety classifier R&D (ml-research-engineer-safeguards), safeguard serving (ml-infrastructure-engineer-safeguards), privacy research (privacy-research-engineer-safeguards), AppSec pentest (penetration-tester).
Calculate agreement between human ground truth and machine labels for a text LLM judge metric, then analyze transcripts and reviewer notes to propose an improved metric prompt. One metric at a time.