SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs

The International Conference on Machine Learning (ICML 2026), 2025-06-01 00:00:00 -0700