RepIt: Steering Language Models with Concept-Specific Refusal Vectors

The International Conf. on Learning Representations (ICLR 2026), 2025-04-28 00:00:00 -0700