Publications

SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs

Vincent Siu, Nicholas Crispino, David Park, Nathan W Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026).

paper code huggingface linkedin x

Peer-Preservation in Frontier Models

Yujin Potter, Nicholas Crispino, Vincent Siu, Chenguang Wang, Dawn Song.

In The International Conference on Machine Learning (ICML 2026).

paper code linkedin x blog news

CyberCycle: Scalable Real-World Benchmark for AI Agents’ End-to-End Cybersecurity Capabilities

Tianneng Shi, Robin Rheem, Dongwei Jiang, Mona Wang, Francisco De La Riega, Zhun Wang, Jingzhi Jiang, Alexander Cheung, Sean Tai, Jonah Cha, Jianhong Tu, Gabriel Han, Chenguang Wang, Wenbo Guo, Jingxuan He, Dawn Song.

In The International Conference on Machine Learning (ICML 2026).

Position: Agent Security Needs Redefinition through a Holistic Framework

Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Chenguang Wang, Dawn Song.

In The International Conference on Machine Learning (ICML 2026).

A benchmark of expert-level academic questions to assess AI capabilities

Long Phan, Wenjin Zhang, Nick Crispino, Chenguang Wang, Daofeng Li, Jiawei Shen, Kyle Montgomery, Hannah Szlyk, Ting Wang, Summer Yue, Alexandr Wang, Dan Hendrycks, many others.

In Nature 2026.

SudoBench: A Contextual Authorization Benchmark for LLM Agents

Vincent Siu, Tianneng Shi, Shangding Gu, Zhun Wang, Dawn Song, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

Controlling Tool Use with Heading-Specific Activation Steering

Yuqi Chen, Vincent Siu, Yang Liu, Dawn Song, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

SafeClawBench: An Operating-System Perspective on Evaluating the Security of Claw-like Agent Systems

Peizhi Niu, Shangding Gu, Wenjie Qu, Tianneng Shi, Yuankai Li, Ahmad Tawaha, Hend Alzahrani, Vincent Siu, Boyi Li, Chenguang Wang, Jiaheng Zhang, Basel Alomair, Ming Jin, Muhao Chen, Chi Wang, Costas Spanos, Dawn Song.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

Component and Dimension Sparsity in Transformer Refusal Mechanisms

Vincent Siu, Glenn Grant-Richards, Vlad Pavlovich, Yizhou Sun, Dawn Song, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

Pratibha Revankar, Kargi Chauhan, Jihye Kim, Sadiba Nusrat Nur, Vincent Siu, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

FaultLoc: Evaluating Coding Agents For Fault Localization

Jianhong Tu, Shubham Gaur, Rathik Murtinty, Zhun Wang, Tianneng Shi, Dawn Song, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

SkillOptimizer: Agent Skill Optimization Through Subskills Without Task Supervision

Nicholas Crispino, Shubham Gaur, Xuefang Yang, Angela Yu, Berat Ercevik, Clara Sapugay, Yujin Potter, Dawn Song, Chenguang Wang.

In The International Conference on Machine Learning (ICML 2026 AIWILD Workshop).

ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents

Lei Ding, Bin He, Chenguang Wang, Yang Liu.

In The Annual Meeting of the Association for Computational Linguistics (ACL 2026).

FICO: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale

Jianhong Tu, Nicholas Crispino, Kyle Montgomery, Chenguang Wang, Dawn Song.

In The Annual Meeting of the Association for Computational Linguistics (ACL 2026).

RepIt: Steering Language Models with Concept-Specific Refusal Vectors

Vincent Siu, Nathan W Henry, Nicholas Crispino, Yang Liu, Dawn Song, Chenguang Wang.

In The International Conf. on Learning Representations (ICLR 2026).

paper code linkedin x

LLM Agentic System Safety Requires Hybrid Alignment

Vincent Siu, Kyle Montgomery, Yujin Potter, Zhun Wang, Dawn Song, Chenguang Wang.

In The International Conf. on Learning Representations (ICLR 2026 AIWILD Workshop).

VMDT: Decoding the Trustworthiness of Video Foundation Models

Yujin Potter, Zhun Wang, Nicholas Crispino, Kyle Montgomery, Alexander Xiong, Ethan Y. Chang, Francesco Pinto, Yuqi Chen, Rahul Gupta, Morteza Ziyadi, Christos Christodoulopoulos, Bo Li, Chenguang Wang, Dawn Song.

In The Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

Budget-aware Test-time Scaling via Discriminative Verification

Kyle Montgomery, Sijun Tan, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, Chenguang Wang.

In The Annual Conference on Neural Information Processing Systems (NeurIPS 2025 Workshop on Efficient Reasoning).

paper code huggingface linkedin x blog

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess

Sai Kolasani, Maxim Saplin, Nicholas Crispino, Kyle Montgomery, Jared Quincy Davis, Matei Zaharia, Chi Wang, Chenguang Wang.

In The Annual Conference on Neural Information Processing Systems (NeurIPS 2025 Workshop on Foundations of Reasoning).

paper leaderboard code linkedin x blog

Towards an AI biomedical scientist: Accelerating discoveries in neurodegenerative disease

Kaleigh F. Roberts, Eric C. Landsness, Justin Reese, Donald Elbert, Gabrielle Strobel, Elizabeth Wu, Yixin Chen, Albert Lai, Zachary B. Abrams, Mingfang Zhu, Justin Melendez, Srinivas Koutarapu, Sihui Song, Yun Chen, Robert Lazar, Payam Barnaghi, John F. Crary, Sergio Pablo Sardi, Marc D. Voss, Rajaraman Krishnan, Joel W. Schwartz, Ron Mallon, Gustavo A. Jimenez-Maggiora, Chenguang Wang, Thomas Sandmann, Niranjan Bose, Mukta Phatak, Gayle Wittenberg, Yannis G. Kevrekidis, Cassie S. Mitchell, Ludovico Mitchener, Towfique Raj, Luca Foschini, Gregory J. Moore, Randall J. Bateman.

In The Journal of Prevention of Alzheimer's Disease (JPAD 2026).

AGENTVIGIL: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song.

In Proc. 2025 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2025).

COSMIC: Generalized Refusal Direction Identification in LLM Activations

Vincent Siu, Nicholas Crispino, Zihao Yu, Sam Pan, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang.

In The Annual Meeting of the Association for Computational Linguistics (ACL 2025).

Predicting Task Performance with Context-aware Scaling Laws

Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, Chenguang Wang.

In The Annual Meeting of the Association for Computational Linguistics (ACL 2025 Workshop on Towards Knowledgeable Foundation Models).

paper code huggingface x

MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Jianhong Tu*, Zhuohao Ni*, Nicholas Crispino, Zihao Yu, Michael Bendersky, Beliz Gunel, Ruoxi Jia, Xin Liu, Lingjuan Lyu, Dawn Song, Chenguang Wang.

In The Annual Meeting of the Association for Computational Linguistics (ACL 2025 Workshop on Towards Knowledgeable Foundation Models).

paper code huggingface x

JudgeBench: A Benchmark for Evaluating LLM-based Judges

Sijun Tan*, Siyuan Zhuang*, Kyle Montgomery*, William Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, Ion Stoica.

In The International Conf. on Learning Representations (ICLR 2025).

paper leaderboard code data x

Preference Poisoning Attacks on Reward Model Learning

Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik.

In IEEE Symposium on Security and Privacy (IEEE S&P 2025).

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

Eric Pasewark*, Kyle Montgomery*, Kefei Duan, Dawn Song, and Chenguang Wang.

In The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).

paper code slides poster

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang.

In The Forty-first International Conference on Machine Learning (ICML 2024).

paper code huggingface x blog slides poster

Measuring Vision-Language STEM Skills of Neural Models

Jianhao Shen, Ye Yuan, Srbuhi Mirzoyan, Ming Zhang, and Chenguang Wang.

In The Twelfth International Conf. on Learning Representations (ICLR 2024).

paper leaderboard code data slides poster news news (chinese)

Measuring Social Norms of Large Language Models

Ye Yuan, Kexin Tang, Jianhao Shen, Ming Zhang, and Chenguang Wang.

In 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).

paper code data slides poster

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

Chenguang Wang, Ruoxi Jia, Xin Liu, and Dawn Song.

In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2024 Workshop of Adversarial Machine Learning on Computer Vision).

Evaluating Large Language Models in an Emerging Domain: A Pilot Study in Decentralized Finance

Joshua Carter Pearlson, Xiaoyuan Liu, Kripa Ann George, Dawn Song, and Chenguang Wang.

In The Twelfth International Conf. on Learning Representations DPFM Workshop (ICLR 2024 DPFM Workshop).

Enhancing Global Estimation of Fine Particulate Matter Concentrations by Including Geophysical a Priori Information in Deep Learning

Siyuan Shen, Chi Li, Aaron van Donkelaar, Nathan Jacobs, Chenguang Wang, Randall V. Martin.

In ACS ES&T Air (ACS ES&T Air 2024).

CodeIPPrompt: Intellectual Property Infringement Assessment of Code Language Models

Zhiyuan Yu, Yuhao Wu, Ning Zhang, Chenguang Wang, Yevgeniy Vorobeychik, Chaowei Xiao.

In Proc. of the 40th International Conf. on Machine Learning (ICML 2023).

Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study

Myeongseob Ko, Ming Jin, Chenguang Wang, Ruoxi Jia.

In International Conf. on Computer Vision (ICCV 2023).

DeepStruct: Pretraining of language models for structure prediction

Chenguang Wang*, Xiao Liu*, Zui Chen*, Haoyun Hong, Jie Tang, and Dawn Song.

In Proc. 2022 Annual Meeting of the Association for Computational Linguistics (ACL 2022).

paper code slides video poster

Joint language semantic and structure embedding for knowledge graph completion

Jianhao Shen, Chenguang Wang*, Linyuan Gong, and Dawn Song.

In Proc. 2022 Int. Conf. on Computational Linguistics (COLING 2022).

paper code slides

IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models

Chenguang Wang, Xiao Liu and Dawn Song.

In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).

PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion

Jianhao Shen, Chenguang Wang*, Ye Yuan, Jiawei Han, Heng Ji, Koushik Sen, Ming Zhang* and Dawn Song*.

In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).

paper code slides

Benchmarking Language Models for Code Syntax Understanding

Da Shen, Xinyun Chen*, Chenguang Wang*, Koushik Sen and Dawn Song.

In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).

paper code slides

Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models

Zhiyuan Zhang, Lingjuan Lyu, Xingjun Ma, Chenguang Wang and Xu Sun.

In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).

Protecting intellectual property of language generation APIs with lexical watermark

Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang.

In Proc. 2022 AAAI Conf. on Artificial Intelligence (AAAI 2022).

Improving representation of the AOD to PM2.5 relationship with a convolutional neural network

Siyuan Shen, Aaron van Donkelaar, Randall V. Martin, Nathan Jacobs, and Chenguang Wang.

In Proc. 2022 Advancing Earth and Space Science (AGU 2022).

Zero-shot information extraction as a unified text-to-triple translation

Chenguang Wang, Xiao Liu, Zui Chen, Haoyun Hong, Jie Tang, and Dawn Song.

In Proc. 2021 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2021).

paper code slides video poster

Language models are open knowledge graphs

Chenguang Wang, Xiao Liu, and Dawn Song.

In arXiv preprint arXiv:2010.11967 (arXiv 2020).

paper code slides

GluonCV and GluonNLP: Deep learning in computer vision and natural language processing

Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, and Shuai Zheng.

In Journal of Machine Learning Research (JMLR 2020).

PoD: Positional dependency-based word embedding for aspect term extraction

Yichun Yin, Chenguang Wang, and Ming Zhang.

In Proc. 2020 Int. Conf. on Computational Linguistics (COLING 2020).

Transformer on a diet

Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, and Alexander Smola.

In arXiv preprint arXiv:2002.06170 (arXiv 2020).

Language models with Transformers

Chenguang Wang, Mu Li, and Alexander Smola.

In arXiv preprint arXiv:1904.09408 (arXiv 2019).

paper code slides

From shallow to deep language representations: Pre-training, fine-tuning, and beyond

Aston Zhang, Haibin Lin, Chenguang Wang, Mu Li, and Alexander Smola.

In Proc. 2019 ACM SIGKDD Int. Conf.on Knowledge Discovery and Data Mining (KDD 2019).

Co-occurrent features in semantic segmentation

Hang Zhang, Han Zhang, Chenguang Wang, and Junyuan Xie.

In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019).

Unsupervised meta-path selection for similarity measure on heterogeneous information networks

Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han.

In Proc. 2018 Data Mining and Knowledge Discovery (DMKD 2018).

paper code data

Distant meta-path similarities for text-based heterogeneous information networks

Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun, Ming Zhang, and Jiawei Han.

In Proc. 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM 2017).

paper data slides

Crowd-in-the-loop: A hybrid approach for annotating semantic roles

Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, and Anbang Xu.

In Proc. 2017 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2017).

paper data slides

Active learning for black-box semantic role labeling with neural factors

Chenguang Wang, Laura Chiticariu, and Yunyao Li.

In Proc. 2017 Int. Joint Conf. on Artificial Intelligence (IJCAI 2017).

paper data slides

Semi-supervised learning over heterogeneous information networks by ensemble of meta-graph guided random walks

He Jiang, Yangqiu Song, Chenguang Wang, Ming Zhang, and Yizhou Sun.

In Proc. 2017 Int. Joint Conf. on Artificial Intelligence (IJCAI 2017).

Towards re-defining relation understanding in financial domain

Chenguang Wang, Doug Burdick, Laura Chiticariu, Rajasekar krishnamurthy, Yunyao Li, and Huaiyu Zhu.

In Proc. of 2017 ACM SIGMOD Int. Conf. on Management of Data Workshop (SIGMOD 2017 Workshop).

paper slides video

HINE: Heterogeneous information network embedding

Yuxin Chen, and Chenguang Wang.

In Proc. 2017 Int. Conf. on Database Systems for Advanced Applications (DASFAA 2017).

World knowledge as indirect supervision for document clustering

Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han.

In ACM Transactions on Knowledge Discovery from Data (TKDD 2016).

RelSim: Relation similarity search in schema-rich heterogeneous information networks

Chenguang Wang, Yizhou Sun, Yanglei Song, Jiawei Han, Yangqiu Song, Lidan Wang, and Ming Zhang.

In Proc. 2016 SIAM Int. Conf. on Data Mining (SDM 2016)".

Text classification with heterogeneous information network kernels

Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han.

In Proc. 2016 AAAI Conf. on Artificial Intelligence (AAAI 2016).

paper code data slides

KnowSim: A document similarity measure on structured heterogeneous information networks

Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han.

In Proc. of 2015 IEEE Int. Conf. on Data Mining (ICDM 2015).

paper code data slides

Constrained information-theoretic tripartite graph clustering to identify semantically similar relations

Chenguang Wang, Yangqiu Song, Dan Roth, Chi Wang, Jiawei Han, Heng Ji, and Ming Zhang.

In Proc. 2015 Int. Joint Conf. on Artificial Intelligence (IJCAI 2015).

Incorporating world knowledge to document clustering via heterogeneous information networks

Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han.

In Proc. 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2015).

paper code data slides video

Spectral label refinement for noisy and missing text labels

Yangqiu Song, Chenguang Wang, Ming Zhang, Hailong Sun, and Qiang Yang.

In Proc. 2015 AAAI Conf. on Artificial Intelligence (AAAI 2015).

Measuring domain influence in heterogeneous networks

Quan Liu, Chenguang Wang, and Ming Zhang.

In Proc. 2014 ACM Int. Conf. on Web Search and Data Mining Workshop on Diffusion Networks and Cascade Analytics (WSDM 2014 Workshop).

Paraphrasing adaptation for web search ranking

Chenguang Wang, Nan Duan, Ming Zhou, and Ming Zhang.

In Proc. 2013 Annual Meeting of the Association for Computational Linguistics (ACL 2013).

ENGtube: An integrated subtitle environment for ESL

Chi-Ho Li, Shujie Liu, Chenguang Wang, and Ming Zhou.

In MT Summit XIII: the Thirteenth Machine Translation Summit (MTSummit 2011).

A Framework for Formalizing LLM Agent Security

Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, Dawn Song.

In arXiv preprint 2026.

rLLM On-Policy Distillation: Training Smaller Students from Stronger Teachers

Brian Chen, Kyle Montgomery, the rLLM Team.

In rLLM blog 2026.

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, Shanglin Wu, Ruiyao Xu, Liangwei Yang, Rui Yang, Wooseong Yang, Chin-Yuan Yeh, Hanrong Zhang, Haozhen Zhang, Siqi Zhu, Henry Peng Zou, Wanjia Zhao, Song Wang, Wujiang Xu, Zixuan Ke, Zheng Hui, Dawei Li, Yaozu Wu, Langzhou He, Chen Wang, Xiongxiao Xu, Baixiang Huang, Juntao Tan, Shelby Heinecke, Huan Wang, Caiming Xiong, Ahmed A. Metwally, Jun Yan, Chen-Yu Lee, Hanqing Zeng, Yinglong Xia, Xiaokai Wei, Ali Payani, Yu Wang, Haitong Ma, Wenya Wang, Chenguang Wang, Yu Zhang, Xin Wang, Yongfeng Zhang, Jiaxuan You, Hanghang Tong, Xiao Luo, Xue Liu, Yizhou Sun, Wei Wang, Julian McAuley, James Zou, Jiawei Han, Philip S. Yu, Kai Shu.

In arXiv preprint 2026.

paper code linkedin

rLLM v0.2: RL Training over General Agentic Programs

Sijun Tan, Kyle Montgomery, the rLLM Team.

In rLLM blog 2025.

paper code linkedin x

rLLM: A Framework for Post-Training Language Agents

Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, Manan Roongta, Chenguang Wang, Li Erran Li, Raluca Ada Popa, Ion Stoica.

In Notion 2025.

Humanity’s Last Exam

Long Phan, Wenjin Zhang, Nick Crispino, Chenguang Wang, Daofeng Li, Jiawei Shen, Kyle Montgomery, Hannah Szlyk, Ting Wang, Summer Yue, Alexandr Wang, Dan Hendrycks, many others.

In arXiv preprint 2025.

paper code data news