Preference Poisoning Attacks on Reward Model LearningarXiv preprint arXiv:2402.01920 (arXiv 2024), 2024-09-01 00:00:00 -0700Share on Twitter Facebook LinkedIn Previous Next