MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Fact Extraction and Verification for a low resource language

Mahsa Ghaderan
Sharif University of Technology
PhD Application Talk
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Monday, 23 January 2023
09:30
30 Minutes
Virtual talk
zoom

Abstract

Training and evaluation of automatic fact extraction and verification techniques require large amounts of annotated data which might not be available for low-resource languages. Furthermore, verifying many-hup (requires evidence from multiple sources) claims are more challenging and close to real-world queries than single-hup (evidence from one resource is enough). In this talk, I will present how we tackle these two challenges by gathering ParsFEVER: the first publicly available Farsi dataset for fact extraction and verification. The gathering procedure is inspired by the construction procedure of the standard English dataset for the task, i.e., FEVER, and improved for the case of low-resource languages. Claims are extracted from sentences that are carefully selected to be more informative. The dataset comprises nearly 23K manually-annotated claims. Over 65% of the claims in ParsFEVER are many-hop, making the dataset a challenging benchmark (only 13% of the claims in FEVER are many-hop). Also, despite having a smaller training set (around one-ninth of that in Fever), a model trained on ParsFEVER attains similar downstream performance, indicating the quality of the dataset. 

Contact

Jennifer Gerling
+49 681 9325 1801
--email hidden

Virtual Meeting Details

Zoom
passcode not visible
logged in users only

Jennifer Gerling, 01/22/2023 17:39 -- Created document.