Search Results: mechanistic-interpretability

Found 3 Skills

AI & Machine Learningdavila7/claude-code-templ...

transformer-lens-interpretability

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

obliteratus-abliteration

One-click model liberation toolkit for removing refusal behaviors from LLMs via surgical abliteration techniques

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

sparse-autoencoder-training

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.

🇺🇸|EnglishTranslated