I am an Assistant Professor in the Department of Computer Science and Engineering at Washington University in St. Louis, and the Founder and Director of the WashU Natural Language Processing Group. I am also an affiliate faculty of the Division of Computational and Data Sciences and the AI for Health Institute.
Previously, I was a postdoc in computer science at UC Berkeley. I was also a research scientist at Amazon AI and a research staff member at IBM Research-Almaden. I completed my Ph.D. at Peking University, and was a visiting Ph.D. student at UIUC.
Research Interests: Natural language processing, machine learning, and security.
Prospective Students and Postdocs: We now have postdoc openings! We are also recruiting PhD students. Please see this page for details.
Selected Awards
- Google Research Scholar Award
- X-Camp Academy Research Award
- ACM China Doctoral Dissertation Award Honorable Mention (one of two national finalists)
- Qualcomm Fellowship Honorable Mention
- Baidu Fellowship Honorable Mention (one of the twenty worldwide finalists)
- Sohu Fellowship
Projects
- Creator of AutoGluon, an AutoML toolkit for deep learning
- Co-creator of GluonNLP, a deep learning for NLP toolkit
News
Preprints
MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models
Jianhong Tu*, Zhuohao Ni*, Nicholas Crispino, Zihao Yu, Michael Bendersky, Beliz Gunel, Ruoxi Jia, Xin Liu, Lingjuan Lyu, Dawn Song, Chenguang Wang.
In arXiv preprint 2024.
paper code data huggingface x
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan*, Siyuan Zhuang*, Kyle Montgomery*, William Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, Ion Stoica.
In arXiv preprint 2024.
paper leaderboard code data x Publications
Preference Poisoning Attacks on Reward Model Learning
Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik.
In IEEE Symposium on Security and Privacy (IEEE S&P 2025).
paper
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
Eric Pasewark*, Kyle Montgomery*, Kefei Duan, Dawn Song, and Chenguang Wang.
In The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
paper code slides poster
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang.
In The Forty-first International Conference on Machine Learning (ICML 2024).
paper code huggingface x blog slides poster
Measuring Social Norms of Large Language Models
Ye Yuan, Kexin Tang, Jianhao Shen, Ming Zhang, and Chenguang Wang.
In 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024).
paper code data slides poster
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study
Chenguang Wang, Ruoxi Jia, Xin Liu, and Dawn Song.
In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2024 Workshop of Adversarial Machine Learning on Computer Vision).
paper code
Evaluating Large Language Models in an Emerging Domain: A Pilot Study in Decentralized Finance
Joshua Carter Pearlson, Xiaoyuan Liu, Chengsong Huang, Kripa Ann George, Dawn Song, and Chenguang Wang.
In The Twelfth International Conf. on Learning Representations DPFM Workshop (ICLR 2024 DPFM Workshop).
paper
Enhancing Global Estimation of Fine Particulate Matter Concentrations by Including Geophysical a Priori Information in Deep Learning
Siyuan Shen, Chi Li, Aaron van Donkelaar, Nathan Jacobs, Chenguang Wang, Randall V. Martin.
In ACS ES&T Air (ACS ES&T Air 2024).
paper
CodeIPPrompt: Intellectual Property Infringement Assessment of Code Language Models
Zhiyuan Yu, Yuhao Wu, Ning Zhang, Chenguang Wang, Yevgeniy Vorobeychik, Chaowei Xiao.
In Proc. of the 40th International Conf. on Machine Learning (ICML 2023).
paper
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study
Myeongseob Ko, Ming Jin, Chenguang Wang, Ruoxi Jia.
In International Conf. on Computer Vision (ICCV 2023).
paper
DeepStruct: Pretraining of language models for structure prediction
Chenguang Wang*, Xiao Liu*, Zui Chen*, Haoyun Hong, Jie Tang, and Dawn Song.
In Proc. 2022 Annual Meeting of the Association for Computational Linguistics (ACL 2022).
paper code slides video poster
Joint language semantic and structure embedding for knowledge graph completion
Jianhao Shen, Chenguang Wang*, Linyuan Gong, and Dawn Song.
In Proc. 2022 Int. Conf. on Computational Linguistics (COLING 2022).
paper code slides
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models
Chenguang Wang, Xiao Liu and Dawn Song.
In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).
paper poster
PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion
Jianhao Shen, Chenguang Wang*, Ye Yuan, Jiawei Han, Heng Ji, Koushik Sen, Ming Zhang* and Dawn Song*.
In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).
paper code slides
Benchmarking Language Models for Code Syntax Understanding
Da Shen, Xinyun Chen*, Chenguang Wang*, Koushik Sen and Dawn Song.
In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).
paper code slides
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models
Zhiyuan Zhang, Lingjuan Lyu, Xingjun Ma, Chenguang Wang and Xu Sun.
In Proc. 2022 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2022).
paper
Protecting intellectual property of language generation APIs with lexical watermark
Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang.
In Proc. 2022 AAAI Conf. on Artificial Intelligence (AAAI 2022).
paper
Improving representation of the AOD to PM2.5 relationship with a convolutional neural network
Siyuan Shen, Aaron van Donkelaar, Randall V. Martin, Nathan Jacobs, and Chenguang Wang.
In Proc. 2022 Advancing Earth and Space Science (AGU 2022).
paper
Zero-shot information extraction as a unified text-to-triple translation
Chenguang Wang, Xiao Liu, Zui Chen, Haoyun Hong, Jie Tang, and Dawn Song.
In Proc. 2021 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2021).
paper code slides video poster
Language models are open knowledge graphs
Chenguang Wang, Xiao Liu, and Dawn Song.
In arXiv preprint arXiv:2010.11967 (arXiv 2020).
paper code slides
GluonCV and GluonNLP: Deep learning in computer vision and natural language processing
Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, and Shuai Zheng.
In Journal of Machine Learning Research (JMLR 2020).
paper code
PoD: Positional dependency-based word embedding for aspect term extraction
Yichun Yin, Chenguang Wang, and Ming Zhang.
In Proc. 2020 Int. Conf. on Computational Linguistics (COLING 2020).
paper
Transformer on a diet
Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, and Alexander Smola.
In arXiv preprint arXiv:2002.06170 (arXiv 2020).
paper code
Language models with Transformers
Chenguang Wang, Mu Li, and Alexander Smola.
In arXiv preprint arXiv:1904.09408 (arXiv 2019).
paper code slides
From shallow to deep language representations: Pre-training, fine-tuning, and beyond
Aston Zhang, Haibin Lin, Chenguang Wang, Mu Li, and Alexander Smola.
In Proc. 2019 ACM SIGKDD Int. Conf.on Knowledge Discovery and Data Mining (KDD 2019).
paper code
Co-occurrent features in semantic segmentation
Hang Zhang, Han Zhang, Chenguang Wang, and Junyuan Xie.
In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2019).
paper
Unsupervised meta-path selection for similarity measure on heterogeneous information networks
Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han.
In Proc. 2018 Data Mining and Knowledge Discovery (DMKD 2018).
paper code data
Distant meta-path similarities for text-based heterogeneous information networks
Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun, Ming Zhang, and Jiawei Han.
In Proc. 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM 2017).
paper data slides
Crowd-in-the-loop: A hybrid approach for annotating semantic roles
Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, and Anbang Xu.
In Proc. 2017 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2017).
paper data slides
Active learning for black-box semantic role labeling with neural factors
Chenguang Wang, Laura Chiticariu, and Yunyao Li.
In Proc. 2017 Int. Joint Conf. on Artificial Intelligence (IJCAI 2017).
paper data slides
Semi-supervised learning over heterogeneous information networks by ensemble of meta-graph guided random walks
He Jiang, Yangqiu Song, Chenguang Wang, Ming Zhang, and Yizhou Sun.
In Proc. 2017 Int. Joint Conf. on Artificial Intelligence (IJCAI 2017).
paper code
Towards re-defining relation understanding in financial domain
Chenguang Wang, Doug Burdick, Laura Chiticariu, Rajasekar krishnamurthy, Yunyao Li, and Huaiyu Zhu.
In Proc. of 2017 ACM SIGMOD Int. Conf. on Management of Data Workshop (SIGMOD 2017 Workshop).
paper slides video
HINE: Heterogeneous information network embedding
Yuxin Chen, and Chenguang Wang.
In Proc. 2017 Int. Conf. on Database Systems for Advanced Applications (DASFAA 2017).
paper
World knowledge as indirect supervision for document clustering
Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han.
In ACM Transactions on Knowledge Discovery from Data (TKDD 2016).
paper data
RelSim: Relation similarity search in schema-rich heterogeneous information networks
Chenguang Wang, Yizhou Sun, Yanglei Song, Jiawei Han, Yangqiu Song, Lidan Wang, and Ming Zhang.
In Proc. 2016 SIAM Int. Conf. on Data Mining (SDM 2016)".
paper slides
Text classification with heterogeneous information network kernels
Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han.
In Proc. 2016 AAAI Conf. on Artificial Intelligence (AAAI 2016).
paper code data slides
KnowSim: A document similarity measure on structured heterogeneous information networks
Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han.
In Proc. of 2015 IEEE Int. Conf. on Data Mining (ICDM 2015).
paper code data slides
Constrained information-theoretic tripartite graph clustering to identify semantically similar relations
Chenguang Wang, Yangqiu Song, Dan Roth, Chi Wang, Jiawei Han, Heng Ji, and Ming Zhang.
In Proc. 2015 Int. Joint Conf. on Artificial Intelligence (IJCAI 2015).
paper slides
Incorporating world knowledge to document clustering via heterogeneous information networks
Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han.
In Proc. 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2015).
paper code data slides video
Spectral label refinement for noisy and missing text labels
Yangqiu Song, Chenguang Wang, Ming Zhang, Hailong Sun, and Qiang Yang.
In Proc. 2015 AAAI Conf. on Artificial Intelligence (AAAI 2015).
paper
Measuring domain influence in heterogeneous networks
Quan Liu, Chenguang Wang, and Ming Zhang.
In Proc. 2014 ACM Int. Conf. on Web Search and Data Mining Workshop on Diffusion Networks and Cascade Analytics (WSDM 2014 Workshop).
paper
Paraphrasing adaptation for web search ranking
Chenguang Wang, Nan Duan, Ming Zhou, and Ming Zhang.
In Proc. 2013 Annual Meeting of the Association for Computational Linguistics (ACL 2013).
paper slides
ENGtube: An integrated subtitle environment for ESL
Chi-Ho Li, Shujie Liu, Chenguang Wang, and Ming Zhou.
In MT Summit XIII: the Thirteenth Machine Translation Summit (MTSummit 2011).
paper Selected Press Coverage
- Language agents help large language models βthinkβ better, cheaper. WashU Record β Top stories. Sep 2024.
- Consistency, trustworthiness in large language models goal of new research. WashU Record. Aug 2024.
- Scientists use STEM datasets to evaluate the foundation of neural network models and accelerate the progress of artificial general intelligence. MIT Technology Review. Apr 2024.
- Analyzing generative AIβs copyright crisis. WashU Engineering News. Jul 2023.
- Unsupervisedly Constructed Knowledge Graphs From Pre-Trained Language Models. AI Technology Review. Oct 2020.