Loading...
Loading...
Found 21 Skills
Minimal image-understanding smoke test for Model Studio Qwen VL.
AI rules for writing tests with Stably Playwright SDK. Use this skill when writing or modifying Playwright tests with Stably AI features. Covers when to use Playwright vs Stably methods, plus minimal patterns for aiAssert, extract, getLocatorsByAI, agent.act, Inbox, and Google auth.
Write, refine, run, and QA promptfoo evaluation suites: promptfooconfig.yaml, prompts, providers, vars, tests, assertions, model-graded rubrics, transforms, datasets, exports, and CI gates. Use for non-redteam eval coverage, regression tests, or new eval matrices. Do not use for adversarial redteam plugin or strategy setup.
Full evaluation workflow - launch a run, watch progress, and summarize results. Use for end-to-end agent testing.
Onboard 1-node GitHub MR functional tests for GB200 from existing mr-scoped 2-node tests.
The meta skill. Turn any raw feature into a properly-skilled, tested, resolvable unit of agent capability. Cross-modal eval is the recommended Phase 3 quality gate: 3 frontier models from different providers critique the output, you iterate to quality, THEN write tests that lock in the proven-good behavior.
Scaffolds evaluation suites for the Axiom AI SDK. Generates eval files, scorers, flag schemas, and config from natural-language descriptions. Use when creating evals, writing scorers, setting up flag schemas, or configuring axiom.config.ts.
Orchestrator workflow for running ZeroContext Lab (ZCL) attempts/suites with deterministic artifacts, trace-backed evidence, and fast post-mortems (shim support for "agent only types tool name").
VSCode extension for Browser DevTools MCP Server enabling AI-driven browser automation, debugging, and testing via Playwright and Model Context Protocol