Loading...
Loading...
Found 1,744 Skills
Use this skill when you need to test or evaluate LangGraph/LangChain agents: writing unit or integration tests, generating test scaffolds, mocking LLM/tool behavior, running trajectory evaluation (match or LLM-as-judge), running LangSmith dataset evaluations, and comparing two agent versions with A/B-style offline analysis. Use it for Python and JavaScript/TypeScript workflows, evaluator design, experiment setup, regression gates, and debugging flaky/incorrect evaluation results.
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
Use this skill when users need to evaluate potential co-founders, assess founder compatibility, design equity splits, or navigate co-founder relationships. Activates for "should I work with this person," "co-founder fit," "equity split," or founding team questions.
Evaluates and sharpens content hooks using The Hook Stack™ framework. Use when scoring headlines, refining hooks for video/social/newsletter, or when asked to "evaluate this hook", "run through hook stack", or "score my headline".
Under the assumption that the US dollar or a certain currency loses its reserve status and gold becomes the only anchor, deduce the 'implied gold price that the balance sheet can withstand' by dividing central bank monetary liabilities by gold reserves, and output the leverage level, gap and ranking of each country or currency.
Evaluate GitHub contributors for MLOps/engineering roles. Use when analyzing candidates, researching GitHub profiles, or updating CONTRIBUTORS.md with hiring assessments.
Use when "evaluating technology", "choosing frameworks", "stack comparison", "technology decisions", or asking about "React vs Vue", "PostgreSQL vs MySQL", "AWS vs GCP", "build vs buy"
Use when evaluating agent performance, building test frameworks, measuring quality, or asking about "agent evaluation", "LLM-as-judge", "agent testing", "quality metrics", "evaluation rubrics", "agent benchmarks"
Organize online information of IPs and conduct multi-dimensional evaluation and scoring. Suitable for assessing the adaptation value of IPs such as novels and scripts, analyzing market potential and innovative attributes
Systematic usability evaluation using established heuristics (Nielsen's 10, Shneiderman's 8, or custom rubrics). Use when reviewing UI designs, screenshots, prototypes, or live products for usability issues. Triggers on "review this design", "what's wrong with this UI", "usability check", "evaluate this interface", or when user shares screenshots/mockups asking for feedback.
Discover scientific equations from data using LLM-guided evolutionary search (LLM-SR). Multi-island algorithm with softmax-based cluster sampling, island reset, and LLM-proposed equation mutations. Use for symbolic regression and equation discovery.
Assess business quality, competitive positioning, and sustainability of value creation beyond financial models. Use when the user asks about economic moats, competitive advantages, Porter's Five Forces, management quality, ESG integration, or business model analysis. Also trigger when users mention 'does this company have a moat', 'switching costs', 'network effects', 'brand value', 'management track record', 'capital allocation', 'insider ownership', 'red flags', or ask whether a company's advantage is durable.