Search Results: ai-safety

Found 14 Skills

AI & Machine Learningorchestra-research/ai-res...

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

🇺🇸|EnglishTranslated

Security & Compliancepluginagentmarketplace/cu...

safety-filter-bypass

Techniques to test and bypass AI safety filters, content moderation systems, and guardrails for security assessment

🇺🇸|EnglishTranslated

1 scripts/Checked