Loading...
Loading...
Found 50 Skills
Guides ML/research engineering for safeguards—safety classifier development, harm benchmarks and eval suites, labeled dataset design, fine-tuning and ablations, calibration and slice analysis, attack-surface research memos, and promotion criteria for new moderation models. Use when building or evaluating guardrail models, designing safety benchmarks, measuring precision/recall on policy categories, comparing mitigation techniques, or writing research reports on classifier improvements—not for production inference gateways (ml-infrastructure-engineer-safeguards), PII/leakage privacy research (privacy-research-engineer-safeguards), red-team attack campaigns (ai-redteam), AI governance policy (ai-risk-governance), general non-safety research (ai-researcher), or token-efficiency studies (research-engineer-scientist-tokens).
Provides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
Guides proactive threat hunting for advanced SOC—hypothesis-driven hunt campaigns, advanced SIEM/query workflows, baseline and anomaly analysis, MITRE ATT&CK–aligned techniques, threat intel fusion, detection engineering feedback, and hunt reporting with IR handoff. Use for threat hunting, proactive hunt, hypothesis-driven detection, advanced SOC, hunt campaign, detection engineering, MITRE ATT&CK hunt, anomaly hunting—not routine SOC alert triage (soc-analyst), declared incident command (incident-responder), adversary simulation campaigns (red-team-specialist), disk forensics acquisition (digital-forensics-analyst), authorized pentest (penetration-tester), or binary RE lab work (reverse-engineer).
Guides AI ops leadership—LLM SRE, model/prompt releases, eval/incidents, cost/capacity, vendors, and cross-functional cadence. Use for AI platform ops, LLM SLAs, incidents, rollout governance, unit economics, red-team/eval gates, and team rituals—not memory (ai-memory-developer), context code (ai-context-engineer), security programs (cybersecurity), token roadmaps (ai-token-improvement-plan-engineer), solution architecture (applied-ai-architect-commercial-enterprise), skills portfolio (ai-skill-manager), or vertical AI product eng management (engineering-manager-vertical-ai-products). Prompt/eval team management and golden-set release policy: engineering-manager-agent-prompts-evals. Safeguard inference platform: ml-infrastructure-engineer-safeguards. Safeguard model research: ml-research-engineer-safeguards.
Guides information security risk analysis—risk identification and scoring, risk registers, threat/vulnerability/control mapping, treatment recommendations (accept/mitigate/transfer/avoid), third-party and supply-chain risk framing, business impact analysis, KRIs, and risk committee or board narratives. Aligns with ISO 27005 and NIST RMF concepts without full compliance audits. Use for security risk assessment, risk register maintenance, inherent/residual risk scoring, FAIR-style quantitative framing, treatment decisions, third-party risk tiers, or executive risk reporting—not SOC alert triage (soc-analyst), pentest execution (penetration-tester, web-pentester, network-pentester), control implementation (information-security-engineer, cloud-security-engineer), GRC program and audit prep (compliance-specialist), audit evidence automation (compliance-engineer, cloud-compliance-specialist), AI model risk programs (ai-risk-governance), or adversary simulation (red-team-specialist).
Guides SOC operations—alert triage, SIEM/EDR investigation, enrichment, playbook execution, false-positive closure, escalation decisions, and detection tuning feedback. Use when working SOC queues, investigating suspicious alerts, correlating events, documenting analyst notes, or deciding escalate vs close—not for declared incident command, timelines, evidence preservation, or regulatory comms (incident-responder), incident program design (incident-management-engineer), binary/firmware RE (reverse-engineer), red team operations (red-team-specialist), or enterprise security strategy (cybersecurity).
Drafts, reviews, rewrites, and coaches outcome-based OKR sets across team, department, product, or company scopes. Supports five entry modes (Guided default, One-Shot via --oneshot, Sustained Coach, Audit Only, Rewrite). Diagnoses empowered-team context and adjusts framing; refuses to fabricate baselines or targets; refuses to use OKR scores for compensation; reframes feature-delivery KRs into outcome KRs. Use when planning quarterly OKRs, translating strategy into team outcomes, reviewing draft OKRs for quality, or converting roadmap-as-OKR drafts into proper OKR sets.
Use when the user asks to "create an evaluator", "create evals", "create a scenario", "write a test scenario", "design a test case", "test my agent", "build eval coverage", "plan a test suite", "create red team tests", "set up test profiles", "configure conditional actions", "write a conditional action evaluator", "build a deterministic test", "design an IVR test", "IVR navigation test", "write a unit test for a voice agent", "build a regression test", "scripted scenario", "scripted voice test", "structured evaluator", "exact flow test", "sequential conditions", "fixed sequence test", or "run evals". Covers individual evaluator design, suite coverage strategy, test profiles, mock-tool data design, conditional actions (deterministic / unit test / regression / IVR navigation flows), and best practices for workflow / red-team / edge-case / deterministic test types.
Answers AI agent evaluation methodology questions with practical, opinionated guidance grounded primarily in Microsoft's agent evaluation ecosystem (MS Learn, Eval Scenario Library, Triage & Improvement Playbook, Eval Guidance Kit) supplemented by select industry sources.
Linux security mechanism bypass playbook. Use when facing restricted bash/rbash, read-only or noexec filesystems, AppArmor, SELinux, seccomp filters, or audit logging that must be evaded during post-exploitation.
Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.
Windows lateral movement playbook. Use when pivoting between Windows hosts via PsExec, WMI, WinRM, DCOM, RDP, pass-the-hash, overpass-the-hash, or pass-the-ticket techniques.