RepIt: Steering Language Models with Concept-Specific Refusal Vectors

arXiv preprint 2025, 2025-10-17 00:00:00 -0700