COSMIC: Generalized Refusal Identification in LLM Activations

The Annual Meeting of the Association for Computational Linguistics (ACL 2025), 2025-01-20 00:00:00 -0800