Search Results: ai-safety

Found 11 Skills

Security & Complianceomer-metin/skills-for-ant...

prompt-injection-defense

Defense techniques against prompt injection attacks including direct injection, indirect injection, and jailbreaks - theUse when "prompt injection, jailbreak prevention, input sanitization, llm security, injection attack, security, prompt-injection, llm, owasp, jailbreak, ai-safety" mentioned.

🇺🇸|EnglishTranslated

AI & Machine Learninggoogleworkspace/cli

gws-modelarmor

Google Model Armor: Filter user-generated content for safety.

🇺🇸|EnglishTranslated

11.2k

AI & Machine Learninggoogleworkspace/cli

gws-modelarmor-sanitize-prompt

Google Model Armor: Sanitize a user prompt through a Model Armor template.

🇺🇸|EnglishTranslated

11.0k

AI & Machine Learninggoogleworkspace/cli

gws-modelarmor-sanitize-response

Google Model Armor: Sanitize a model response through a Model Armor template.

🇺🇸|EnglishTranslated

10.9k

AI & Machine Learningmicrosoft/agent-skills

azure-ai-evaluation-py

Azure AI Evaluation SDK for Python. Use for evaluating generative AI applications with quality, safety, agent, and custom evaluators. Triggers: "azure-ai-evaluation", "evaluators", "GroundednessEvaluator", "evaluate", "AI quality metrics", "RedTeam", "agent evaluation".

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learninggithub/awesome-copilot

ai-prompt-engineering-safety-review

Comprehensive AI prompt engineering safety review and improvement prompt. Analyzes prompts for safety, bias, security vulnerabilities, and effectiveness while providing detailed improvement recommendations with extensive frameworks, testing methodologies, and educational content.

🇺🇸|EnglishTranslated

AI & Machine Learninggithub/awesome-copilot

agent-governance

Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when: - Building AI agents that call external tools (APIs, databases, file systems) - Implementing policy-based access controls for agent tool usage - Adding semantic intent classification to detect dangerous prompts - Creating trust scoring systems for multi-agent workflows - Building audit trails for agent actions and decisions - Enforcing rate limits, content filters, or tool restrictions on agents - Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen)

🇺🇸|EnglishTranslated

AI & Machine Learninglebsral/dspy-programming-...

ai-checking-outputs

Verify and validate AI output before it reaches users. Use when you need guardrails, output validation, safety checks, content filtering, fact-checking AI responses, catching hallucinations, preventing bad outputs, quality gates, or ensuring AI responses meet your standards before shipping them. Covers DSPy assertions, verification patterns, and generate-then-filter pipelines.

🇺🇸|EnglishTranslated

AI & Machine Learninglebsral/dspy-programming-...

ai-testing-safety

Find every way users can break your AI before they do. Use when you need to red-team your AI, test for jailbreaks, find prompt injection vulnerabilities, run adversarial testing, do a safety audit before launch, prove your AI is safe for compliance, stress-test guardrails, or verify your AI holds up against adversarial users. Covers automated attack generation, iterative red-teaming with DSPy, and MIPROv2-optimized adversarial testing.

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

🇺🇸|EnglishTranslated

AI & Machine Learningfamaoai-creator/gemini-sk...

ai-ethics-auditor

Audits AI systems for bias, fairness, and privacy. Analyzes prompts and datasets to ensure ethical and safe AI implementation.

🇺🇸|EnglishTranslated

1 scripts/Checked