Loading...
Loading...
Found 3 Skills
AI-powered adversarial UI testing via the browse CLI. Analyzes git diffs to test only what changed, or explores the full app to find bugs. Tests functional correctness, accessibility, responsive layout, and UX heuristics. Use when the user asks to test UI changes, QA a pull request, audit accessibility, or run exploratory testing. Supports local browser (localhost) and remote Browserbase (deployed sites).
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Find every way users can break your AI before they do. Use when you need to red-team your AI, test for jailbreaks, find prompt injection vulnerabilities, run adversarial testing, do a safety audit before launch, prove your AI is safe for compliance, stress-test guardrails, or verify your AI holds up against adversarial users. Covers automated attack generation, iterative red-teaming with DSPy, and MIPROv2-optimized adversarial testing.