Loading...
Loading...
Found 11 Skills
Defense techniques against prompt injection attacks including direct injection, indirect injection, and jailbreaks - theUse when "prompt injection, jailbreak prevention, input sanitization, llm security, injection attack, security, prompt-injection, llm, owasp, jailbreak, ai-safety" mentioned.
Comprehensive AI prompt engineering safety review and improvement prompt. Analyzes prompts for safety, bias, security vulnerabilities, and effectiveness while providing detailed improvement recommendations with extensive frameworks, testing methodologies, and educational content.
Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when: - Building AI agents that call external tools (APIs, databases, file systems) - Implementing policy-based access controls for agent tool usage - Adding semantic intent classification to detect dangerous prompts - Creating trust scoring systems for multi-agent workflows - Building audit trails for agent actions and decisions - Enforcing rate limits, content filters, or tool restrictions on agents - Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen)
Google Model Armor: Filter user-generated content for safety.
Google Model Armor: Sanitize a user prompt through a Model Armor template.
Google Model Armor: Sanitize a model response through a Model Armor template.
Azure AI Evaluation SDK for Python. Use for evaluating generative AI applications with quality, safety, agent, and custom evaluators. Triggers: "azure-ai-evaluation", "evaluators", "GroundednessEvaluator", "evaluate", "AI quality metrics", "RedTeam", "agent evaluation".
Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.
Find every way users can break your AI before they do. Use when you need to red-team your AI, test for jailbreaks, find prompt injection vulnerabilities, run adversarial testing, do a safety audit before launch, prove your AI is safe for compliance, stress-test guardrails, or verify your AI holds up against adversarial users. Covers automated attack generation, iterative red-teaming with DSPy, and MIPROv2-optimized adversarial testing.
Verify and validate AI output before it reaches users. Use when you need guardrails, output validation, safety checks, content filtering, fact-checking AI responses, catching hallucinations, preventing bad outputs, quality gates, or ensuring AI responses meet your standards before shipping them. Covers DSPy assertions, verification patterns, and generate-then-filter pipelines.
Audits AI systems for bias, fairness, and privacy. Analyzes prompts and datasets to ensure ethical and safe AI implementation.