Loading...
Loading...
Found 316 Skills
Turn any concept, idea, or description into a polished static HTML visual, then export it as a PNG or SVG image. Use this skill whenever the user wants to create a visual representation of an idea and needs an image file output (PNG or SVG). This includes: infographics, concept diagrams, flowcharts, comparison charts, process visuals, educational diagrams, social media graphics, data visualizations, posters, cards, badges, icons, logos sketches, or any "make me an image of X" request that can be achieved with HTML/CSS/SVG rather than photographic AI generation. Also trigger when the user has an existing HTML visual and wants to export/convert it to PNG or SVG. Trigger phrases include: "create an image of", "make a visual", "design a graphic", "export as PNG", "save as SVG", "concept to image", "turn this into an image", "screenshot this HTML", "generate an infographic", or any request combining a concept description with image output.
Interact with Excel files (.xlsx, .xlsm, .xlsb, .xls, .ods) using the agent-xlsx CLI for data extraction, analysis, writing, formatting, visual capture, VBA analysis, and sheet management. Use when the user asks to: (1) Read, analyse, or search data in spreadsheets, (2) Write values or formulas to cells, (3) Inspect formatting, formulas, charts, or metadata, (4) Take screenshots or visual captures of sheets, (5) Export sheets to CSV/JSON/Markdown, (6) Manage sheets (create, rename, delete, copy, hide), (7) Analyse or execute VBA macros, (8) List/export embedded objects (charts, shapes, pictures), (9) Check for formula errors, or (10) Any task involving Excel file interaction. Prefer over openpyxl/pandas scripts — faster, structured JSON optimised for AI.
Test website responsiveness across viewport widths using browser automation. Resizes a single session through breakpoints, screenshots each width, and detects layout transitions (column changes, nav switches, overflow). Produces comparison reports showing exactly where layouts break. Trigger with 'responsiveness check', 'check responsive', 'breakpoint test', 'viewport test', 'responsive sweep', 'check breakpoints', or 'test at mobile'.
Automates browser interactions using AgentGo's distributed cloud browser cluster via playwright@1.51.0. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information — running on AgentGo's remote cloud browsers instead of a local browser.
Understand images with Alibaba Cloud Model Studio Qwen VL models (qwen3-vl-plus/qwen3-vl-flash and latest aliases). Use when building image Q&A, visual analysis, OCR-like extraction, chart/table reading, or screenshot understanding workflows.
Collaboration Process for UI Style Modifications. Used when users request page style changes, layout adjustments, or UI detail tweaks. The structured process of "Screenshot Localization → Current Status Description → Option Selection → Code Modification → Fine-tuning" reduces communication deviations and avoids token waste.
Extract actionable Linear tickets from ambiguous input — Slack conversations, call transcripts, screenshots, meeting notes, or any unstructured material. Proposes tickets in a scratchpad file for user review, then creates them in Linear on approval. Use when the user wants to turn conversations, transcripts, screenshots, or notes into Linear tickets. Also use when user says "create tickets from this", "send to linear", "make issues from this call/chat", or provides raw material and asks for tickets.
When the user wants to A/B test App Store product page elements to improve conversion rate. Also use when the user mentions "A/B test", "product page optimization", "test my screenshots", "test my icon", "conversion rate optimization", "CPP", or "custom product pages". For screenshot design, see screenshot-optimization. For metadata optimization, see metadata-optimization.
Remote KVM control via PiKVM REST API. Use for controlling remote computers through PiKVM - taking screenshots, moving mouse, clicking, typing text, pressing keys, keyboard shortcuts, scrolling, or power management.
Browser automation via Puppeteer CLI scripts (JSON output). Capabilities: screenshots, PDF generation, web scraping, form automation, network monitoring, performance profiling, JavaScript debugging, headless browsing. Actions: screenshot, scrape, automate, test, profile, monitor, debug browser. Keywords: Puppeteer, headless Chrome, screenshot, PDF, web scraping, form fill, click, navigate, network traffic, performance audit, Lighthouse, console logs, DOM manipulation, element selector, wait, scroll, automation script. Use when: taking screenshots, generating PDFs from web, scraping websites, automating form submissions, monitoring network requests, profiling page performance, debugging JavaScript, testing web UIs.
Visual design intelligence and UI aesthetics. Integrates: chrome-devtools, ai-multimodal, media-processing. Capabilities: design analysis, visual hierarchy, color theory, typography, micro-interactions, animation, design systems, accessibility. Actions: analyze, design, create, capture, evaluate, implement UI aesthetics. Keywords: Dribbble, Behance, Mobbin, design inspiration, visual hierarchy, color palette, typography, spacing, animation, micro-interaction, design system, style guide, accessibility, WCAG, contrast ratio, golden ratio, whitespace, visual rhythm. Use when: building beautiful UIs, analyzing design inspiration, implementing visual hierarchy, adding animations/micro-interactions, creating design systems, evaluating aesthetic quality, capturing design screenshots.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.