Preference Poisoning Attacks on Reward Model Learning

IEEE Symposium on Security and Privacy (IEEE S&P 2025), 2025-01-01 00:00:00 -0800