Loading...
Loading...
Found 1,142 Skills
Structured comparison of competing options with weighted scoring matrices, trade-off analysis, decision frameworks, and recommendation templates. Use when evaluating alternatives, making purchase decisions, or comparing strategies.
Conduct multi-dimensional comparative analysis based on user-input technical options or project requirements, and output structured technology selection reports. Applicable scenarios: front-end framework selection, back-end technology comparison, database selection, deployment solution evaluation
Build automated evaluation suites for AI agents using golden datasets, rubrics, and regression gates.
Evaluate and improve user experience of interfaces (CLI, web, mobile)
Automatically collect hot topics in the AI field or complete AI technical article writing in the writing style of 'Second Brother' according to specified topics. It focuses on actual tests of AI Coding tools (Claude Code, Qoder, Cursor, TRAE, etc.), engineering implementation of large models (SpringAI, LangChain, RAG, etc.), AI Agent and workflow orchestration, evaluation of domestic large models (GLM, Tongyi Qianwen, DeepSeek, MiniMax, Kimi, etc.), and evaluation of various AI tools and Agent tools. Trigger keywords: write an AI article, AI technical article, large model evaluation, AI tool actual test, GLM, Claude Code, Qoder, Cursor, TRAE, SpringAI, RAG, Agent, workflow, domestic large model, collect AI hot topics, AI topic, etc.
Teaches learners to extract transferable design lessons from real-world codebases through critical evaluation and systematic exploration. Use when a learner wants to study existing code to learn patterns, architecture, or design decisions—not just understand what it does. Guides through navigation, pattern recognition, critical evaluation (deliberate choice vs. compromise), and lesson extraction. Triggers on phrases like "learn from this codebase", "study how X is implemented", "understand design patterns in Y", or when a learner wants to improve by reading real code.
Critically review terminal user interfaces for UX quality, responsiveness, visual design, and interactivity. Use when asked to "review my TUI", "test my TUI UX", "audit my terminal UI", "check TUI responsiveness", "review TUI keybindings", "check interactivity", or any request to evaluate the user experience quality of a ratatui/crossterm/ncurses-based terminal application. Launches the TUI in tmux, systematically tests 10 dimensions (responsiveness, input conflicts, visual clarity, navigation, feedback loops, error states, layout, keyboard design, permission flows, visual design & color), and produces a graded report with screenshots and specific findings. Benchmarks against Claude Code, OpenCode, and Codex — the three best-in-class AI terminal UIs.
Evaluate creative work against explicit taste preferences. Use when drafting to align with project aesthetics, when reviewing to surface preference conflicts, or when generating voting options to reflect diverse tastes.
Run verification checks for a task and evaluate results. Use when the user wants to verify a task's acceptance criteria.
AI situational awareness — internal threat detection for hallucination risk, scope creep, and context degradation. Maps Cooper color codes to reasoning states and OODA loop to real-time decisions. Use during any task where reasoning quality matters, when operating in unfamiliar territory, after detecting early warning signs such as an uncertain fact or suspicious tool result, or before high-stakes output like irreversible changes or architectural decisions.
Use this skill for ANY question about creating test or evaluation datasets for LangChain agents. Covers generating datasets from traces (final_response, single_step, trajectory, RAG types), uploading to LangSmith, and managing evaluation data.
Specialized business logic evaluator for the Evaluate-Loop. Use this for evaluating tracks that implement core product logic — pipelines, dependency resolution, state machines, pricing/tier enforcement, packaging. Checks feature correctness against product rules, edge cases, state transitions, data flow, and user journey completeness. Dispatched by loop-execution-evaluator when track type is 'business-logic', 'generator', or 'core-feature'. Triggered by: 'evaluate logic', 'test business rules', 'verify business rules', 'check feature'.