MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Recognizing Textual Entailment

Madhumita
International Max Planck Research School for Computer Science - IMPRS
PhD Application Talk
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
Public Audience
English

Date, Time and Location

Monday, 26 October 2015
11:20
90 Minutes
E1 4
024
Saarbrücken

Abstract

Natural Language Understanding (NLU) is a challenging task due to the complexity

of human language. Several NLU tasks require the knowledge about semantic relation between two pieces of text. Prior knowledge about entailment relations between textual pairs are useful in myriad of applications. A text \T“ is said to entail a hypothesis \H", if a human reading them can infer H from T.
In this talk, I will propose Entailment Decision Algorithm (EDA) which takes a (T,H) pair as input, and returns an entailment decision as output. This algorithm is incorporated within the Excitement Open Platform (EOP).
The developed algorithm is based on alignment techniques. In our approach, we have developed various lexical alignment algorithms to align fragments of text and hypothesis. We have explored approximate distance-based algorithms for token word, lemma and chunk alignment. Utilizing a distance-based match enables handling of noise in data, for example, spelling variations and noisy tokenization. We have additionally developed embedded word-vector based chunk alignment algorithms to capture semantic information. In embedded vector-based chunk alignment algorithm, we distinguish between the positive and negative alignments with respect to entailment, through the use of WordNet and VerbOcean relations. Additionally, we have identified alignment between negation words like \not", \none" etc, because these words significantly affect the entailment decision.
The generated alignments are used as features for classification into \entailing" and \non-entailing" classes using logistic regression with L2 penalty. We have evaluated the EDA on the standard RTE-3 data-set from the RTE challenge. The data-set consists of total 800 (T,H) pairs, balanced-out between entailing and non-entailing cases. The maximum accuracy obtained on this data-set is 65%. In the ongoing work, we plan to evaluate the system on RTE-6 data-set, which is part of the TAC 2010. This has a relatively larger data-set with 15955 pairs for training, and indicates the realistic distribution of entailment in a corpus where about 95% of cases are negative.

Contact

Andrea Ruffing
--email hidden
passcode not visible
logged in users only

Andrea Ruffing, 10/23/2015 18:58 -- Created document.