Loading...
Loading...
Found 316 Skills
Browser automation using the agent-browser CLI. Use when user asks to browse websites, open webpages, interact with page elements, take screenshots, fill forms, click buttons, scrape content, or automate browser tasks.
Non-interactive X11 desktop control for AI agents. Use when the task involves controlling a Linux desktop - clicking, typing, reading windows, waiting for UI state, or taking screenshots inside a sandbox or VM.
Analyze, describe, and extract information from images using the MiniMax vision MCP tool. Use when: user shares an image file path or URL (any message containing .jpg, .jpeg, .png, .gif, .webp, .bmp, or .svg file extension) or uses any of these words/phrases near an image: "analyze", "analyse", "describe", "explain", "understand", "look at", "review", "extract text", "OCR", "what is in", "what's in", "read this image", "see this image", "tell me about", "explain this", "interpret this", in connection with an image, screenshot, diagram, chart, mockup, wireframe, or photo. Also triggers for: UI mockup review, wireframe analysis, design critique, data extraction from charts, object detection, person/animal/activity identification. Triggers: any message with an image file extension (jpg, jpeg, png, gif, webp, bmp, svg), or any request to analyze/describ/understand/review/extract text from an image, screenshot, diagram, chart, photo, mockup, or wireframe.
Automated web QA skill: analyzes a website or project, generates end-user use cases, derives a structured test plan, executes tests via Playwright browser automation, and produces a full HTML/Markdown QA report with screenshots and pass/fail results. TRIGGER this skill whenever the user asks to: test a website, run QA on a web app, check if a site works, find bugs on a site, validate a web project, create a test plan for a website, run functional tests, check a landing page, audit a web app for issues, test user flows — or any variation of "проверить сайт", "протестировать сайт", "QA сайта", "тест веб-приложения", "найти баги на сайте". Even if the user just says "посмотри работает ли всё нормально на сайте" — use this skill.
Extract a complete design system from an existing website or screenshot into a DESIGN.md file. Analyses colours, typography, component styles, spacing, and atmosphere through browser automation and HTML inspection. Produces a semantic design system document optimised for consistent page generation. Triggers: 'extract design system', 'design system', 'create DESIGN.md', 'analyse the design', 'what design does this site use', 'extract styles from', 'reverse engineer the design'.
Build and test the longest uncovered user journey from spec.md. Reads the product spec, checks existing journeys, picks the longest untested path, writes a UI test with screenshots at every step, then runs 3 polish rounds (testability → refactor UI test → UI review) until everything is clean. Use when the user says "next journey", "add journey", "test the next flow", "journey builder", or "cover more user paths".
Deep UI walkthrough with screenshot-based analysis across all pages and viewports (desktop + tablet + mobile). Delivers per-page improvement pitches grounded in what you actually see. Use when user says 'review the UI', 'pitch UI improvements', 'how does this look', 'UX audit', 'walk through the app'.
Systematic usability evaluation using established heuristics (Nielsen's 10, Shneiderman's 8, or custom rubrics). Use when reviewing UI designs, screenshots, prototypes, or live products for usability issues. Triggers on "review this design", "what's wrong with this UI", "usability check", "evaluate this interface", or when user shares screenshots/mockups asking for feedback.
Analyze videos, screen recordings, and screenshots to generate structured, actionable notes for coding agents. Supports Loom, YouTube, and local files. Extracts visual context, on-screen text, and audio narration. Use when someone shares a video and you need to understand what it shows.
Watch a bug video, screen recording, or screenshot, understand what's broken, find the relevant code, implement a fix, and raise a PR. Supports Loom, YouTube, and local files. Uses Gemini Flash for video understanding.
[QwenCloud] Understand images and videos with Qwen vision models. TRIGGER when: user wants to analyze, describe, or extract information from images or videos, OCR text extraction, chart/table reading, visual reasoning, multi-image comparison, screenshot understanding, video comprehension, or explicitly invokes this skill by name (e.g. use qwencloud-vision). DO NOT TRIGGER when: user wants to generate/create images (use qwencloud-image-generation), generate videos (use qwencloud-video-generation), text-only tasks without visual input, or non-Qwen vision tasks.
Use kuri-agent to automate Chrome — navigate pages, interact with elements via a11y refs, capture screenshots, run security audits, enumerate cookies/JWTs, probe for IDOR vulnerabilities, and make authenticated fetches. Use when the user wants to automate a browser, test a web app, scrape data, or run security trajectories against a live site.