I am a postdoc in computer science at UC Berkeley advised by Dr. Dawn Song.
I was a research scientist in Amazon AI and IBM Research-Almaden. I received Ph.D. degree from Peking University advised by Dr. Ming Zhang. I was a visiting Ph.D. student at UIUC advised by Dr. Jiawei Han.
News
- Oct, 2020: “Language Models are Open Knowledge Graphs” is on arXiv [paper]. Our code and knowledge graphs will be made publicly available. Stay tuned!
- Sep, 2020: “PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction” is accepted by COLING 2020 [paper]
- Jul, 2020: I will serve as PC for ICLR 2021.
- Feb, 2020: I will serve as PC for ACL 2020.
- Feb, 2020: “GluonCV and GluonNLP: deep learning in computer vision and natural language processing” is accepted by JMLR [paper] [GluonNLP code] [GluonCV code].
Interests
My research interests span the areas of NLP, Text Mining, ML Systems, and Security. The goal of my research is to make contributions towards addressing the core challenges of AI, specifically with regard to generalization and security. To achieve this goal, I have been working on the intersection of deep text understanding, machine learning with weak supervision, AI systems, and computer security and privacy.
I am enthusiastic to contribute to open source projects. I am the creator of AutoGluon, and co-creator of GluonNLP.
Language Models are Open Knowledge Graphs
Chenguang Wang, Xiao Liu, and Dawn Song.
In arXiv preprint arXiv:2010.11967 (arXiv 2020).
[paper] [slides]

What's the relationship between deep language models (e.g., BERT, GPT-2, GPT-3) and knowledge graphs? Can we use the pre-trained deep language models to construct knowledge graphs? We find that we can construct knowledge graphs from the pre-trained language models. The generated knowledge graphs not only cover the knowledge already in existing knowledge graphs, such as Wikidata, but also feature open factual knowledge that is new.
Language models with Transformers
Chenguang Wang, et al.
In arXiv preprint arXiv:1904.09408 (arXiv 2019).
[paper] [code] [slides]

Gets more than 4.4k blog views and more than 320 Likes and Retweets on Twitter. Experimental results on the PTB, WikiText-2, and WikiText-103 show that proposed method achieves perplexities between 20.42 and 34.11 on all problems, i.e. on average an improvement of 12.0 perplexity units compared to state-of-the-art LSTMs.
Crowd-in-the-loop: A hybrid approach for annotating semantic roles
Chenguang Wang, Alan Akbik, Laura Chiticariu, Yunyao Li, Fei Xia, and Anbang Xu.
In Proc. 2017 Conf. on Empirical Methods on Natural Language Processing (EMNLP 2017).
[paper] [data] [slides]

Our experimental evaluation shows that the proposed approach reduces the workload for experts by over two-thirds, and thus significantly reduces the cost of producing SRL annotation at little loss in quality.