COSMIC: Generalized Refusal Direction Identification in LLM Activations

The Annual Meeting of the Association for Computational Linguistics (ACL 2025), 2025-01-21 00:00:00 -0800