Loading...
Loading...
Found 26 Skills
Answers AI agent evaluation methodology questions with practical, opinionated guidance grounded primarily in Microsoft's agent evaluation ecosystem (MS Learn, Eval Scenario Library, Triage & Improvement Playbook, Eval Guidance Kit) supplemented by select industry sources.
Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.
Kerberos attack playbook for Active Directory. Use when targeting AD authentication via AS-REP roasting, Kerberoasting, golden/silver/diamond tickets, delegation abuse, or pass-the-ticket attacks.
Windows lateral movement playbook. Use when pivoting between Windows hosts via PsExec, WMI, WinRM, DCOM, RDP, pass-the-hash, overpass-the-hash, or pass-the-ticket techniques.
Use when conducting authorized penetration tests, performing security assessments, running red team exercises, testing security controls, identifying attack paths, or validating hardening measures
Comprehensively evaluate the overall security of an application from two perspectives: attackers (Red Team) and defenders (Blue Team). Run two agents in parallel → output an integrated report via review-aggregator. Use this when you want to "understand the overall security status of the application", "identify vulnerabilities from an attacker's perspective", or "verify that there are no gaps in the defense system". Use security-hardening for addressing specific vulnerabilities, and security-audit-quick for fast detection of known patterns.
Use when challenging ideas, plans, decisions, or proposals. Invoke to play devil's advocate, run a pre-mortem, red team, stress test assumptions, audit evidence quality, or find blind spots before committing. Do NOT use for building plans, making decisions, or generating solutions — this skill only challenges and critiques.
Analyze arguments, detect biases, evaluate claims, and improve reasoning. Use when asked to fact-check, identify logical fallacies, evaluate arguments, analyze predictions, find root causes, or think adversarially about plans. Triggers include "evaluate this argument", "logical fallacies", "fact check", "analyze the claims", "identify biases", "devil's advocate", "red team this", "root cause".
(Industry standard: Loop Agent / Single Agent) Primary Use Case: Self-contained research, content generation, and exploration where no inner delegation is required. Self-directed research and knowledge capture loop. Use when: starting a session (Orientation), performing research (Synthesis), or closing a session (Seal, Persist, Retrospective). Ensures knowledge survives across isolated agent sessions.
Build and configure a resilient command-and-control infrastructure using BishopFox's Sliver C2 framework with redirectors, HTTPS listeners, and multi-operator support for authorized red team engagements.
MS17-010 (EternalBlue) is a critical vulnerability in Microsoft's SMBv1 implementation that allows remote code execution. Originally discovered by the NSA and leaked by the Shadow Brokers in 2017, it
LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing