Recent Developments in Multilingual Natural Language Understanding
Roberto Navigli
Sapienza University of Rome
MPI Colloquium Series Distinguished Speaker
Roberto Navigli is Professor at the Sapienza University of Rome, where he leads the Sapienza NLP Group. He has received two ERC grants on multilingual word sense disambiguation and multilingual language-independent sentence representations, selected by the ERC itself among the "15 projects through which the ERC transformed science". In 2015 he received the META prize for groundbreaking work in overcoming language barriers with BabelNet, a project also highlighted in The Guardian and Time magazine, and winner of the Artificial Intelligence Journal prominent paper award 2017. He is the co-founder of Babelscape, a successful spinoff company which enables Natural Language Understanding in dozens of languages. https://www.diag.uniroma1.it/navigli/
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI
Natural Language Processing (NLP) has seen an explosion of interest in recent years, with many industrial applications relying on key technological developments in the field. However, Natural Language Understanding (NLU) – which requires the machine to get beyond processing strings and involves a semantic level – is particularly challenging due to the pervasive ambiguity of language. In this talk I will present some recent developments from Sapienza NLP and current challenges of three key tasks in NLU, namely Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing.
Word Sense Disambiguation, the task of associating a word in context with its most appropriate sense from a predefined sense inventory, is one of the hardest tasks in NLP, however the advent of Deep Learning has led significant improvements, also thanks to the integration of explicit knowledge in the form of lexical knowledge graphs, leading to performances above the hard-to-beat threshold of 80% F1 on standard test sets. I will then move to sentence-level semantics, which is also hampered by the lack of large-scale annotated data. Semantic Role Labeling, aimed at extracting the predicate-argument structure of a sentence and identifying the semantic relationships between a predicate and its arguments, suffers from the existence of different, heterogeneous framesets for each language. Recently, we put forward a unifying neural architecture which, trained on data from different languages, outputs predicate senses and roles according to all the available inventories, and enables the use of previously unseen languages and the creation of a network of predicate-argument meanings. Finally, I will discuss progress in Semantic Parsing, which moves from the predicate-argument structure to the overall structure of a sentence, typically as a semantic graph. I will introduce recent approaches to generative semantic parsing and briefly overview a brand-new language-independent formalism called BabelNet Meaning Representation (BMR), whose aim is to address the current limits of semantic parsing and provide a representation that abstracts away from languages.