Knowledge is Power: Symbolic Knowledge Distillation, Commonsense Morality, and Multimodal Script Knowledge
University of Washington, Seattle, and Allen Institute for AI
INF Distinguished Lecture Series
Yejin Choi is Brett Helsel Professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington and a senior research manager at AI2 overseeing the project Mosaic. Her research investigates a wide range of problems including commonsense knowledge and reasoning, neuro-symbolic integration, multimodal representation learning, and AI for social good. She is a co-recipient of the ACL Test of Time award in 2021, the CVPR Longuet-Higgins Prize in 2021, a NeurIPS Outstanding Paper Award in 2021, the AAAI Outstanding Paper Award in 2020, the Borg Early Career Award in 2018, the inaugural Alexa Prize Challenge in 2017, IEEE AI's 10 to Watch in 2016, and the ICCV Marr Prize in 2013. https://homes.cs.washington.edu/~yejin/
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI
Scale appears to be the winning recipe in today's AI leaderboards. And yet, extreme-scale neural models are still brittle to make errors that are often nonsensical and even counterintuitive. In this talk, I will argue for the importance of knowledge, especially commonsense knowledge, and demonstrate how smaller models developed in academia can still have an edge over larger industry-scale models, if powered with knowledge.
First, I will introduce "symbolic knowledge distillation", a new framework to distill larger neural language models into smaller commonsense models, which leads to a machine-authored KB that wins, for the first time, over a human-authored KB in all criteria: scale, accuracy, and diversity. Next, I will present an experimental conceptual framework toward computational social norms and commonsense morality, so that neural language models can learn to reason that “helping a friend” is generally a good thing to do, but “helping a friend spread fake news” is not. Finally, I will discuss an approach to multimodal script knowledge demonstrating the power of complex raw data, which leads to new SOTA performances on a dozen leaderboards that require grounded, temporal, and causal commonsense reasoning.