Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Recognizing Textual Entailment
coming from:International Max Planck Research School for Computer Science - IMPRS
Speakers Bio:
Event Type:PhD Application Talk
Visibility:D1, D2, D3, D4, D5, SWS, RG1, MMCI
We use this to send out email in the morning.
Level:Public Audience
Date, Time and Location
Date:Monday, 26 October 2015
Duration:90 Minutes
Building:E1 4
Natural Language Understanding (NLU) is a challenging task due to the complexity

of human language. Several NLU tasks require the knowledge about semantic relation between two pieces of text. Prior knowledge about entailment relations between textual pairs are useful in myriad of applications. A text \T“ is said to entail a hypothesis \H", if a human reading them can infer H from T.
In this talk, I will propose Entailment Decision Algorithm (EDA) which takes a (T,H) pair as input, and returns an entailment decision as output. This algorithm is incorporated within the Excitement Open Platform (EOP).
The developed algorithm is based on alignment techniques. In our approach, we have developed various lexical alignment algorithms to align fragments of text and hypothesis. We have explored approximate distance-based algorithms for token word, lemma and chunk alignment. Utilizing a distance-based match enables handling of noise in data, for example, spelling variations and noisy tokenization. We have additionally developed embedded word-vector based chunk alignment algorithms to capture semantic information. In embedded vector-based chunk alignment algorithm, we distinguish between the positive and negative alignments with respect to entailment, through the use of WordNet and VerbOcean relations. Additionally, we have identified alignment between negation words like \not", \none" etc, because these words significantly affect the entailment decision.
The generated alignments are used as features for classification into \entailing" and \non-entailing" classes using logistic regression with L2 penalty. We have evaluated the EDA on the standard RTE-3 data-set from the RTE challenge. The data-set consists of total 800 (T,H) pairs, balanced-out between entailing and non-entailing cases. The maximum accuracy obtained on this data-set is 65%. In the ongoing work, we plan to evaluate the system on RTE-6 data-set, which is part of the TAC 2010. This has a relatively larger data-set with 15955 pairs for training, and indicates the realistic distribution of entailment in a corpus where about 95% of cases are negative.

Name(s):Andrea Ruffing
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):
  • Andrea Ruffing, 10/23/2015 06:58 PM -- Created document.