MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D2, D3

What and Who

PhD Application Talk: Text mining for building a biomedical knowledge base on diseases, risk Factors, and symptoms

Min Ye
PhD Application Talk
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, MMCI  
MPI Audience
English

Date, Time and Location

Monday, 25 July 2011
09:00
120 Minutes
E1 4
024
Saarbrücken

Abstract

In view of today's information avalanche, well structured knowledge bases play an important role in simplifying the access to knowledge and its further processing. In the biomedical domain, research results holding important information are hidden in publications or online forums in the form of unstructured free texts. Determining and storing relational information into machine-readable data is therefore crucial to advance the scientific research.


In this talk we introduce a system providing convenient access to knowledge about environmental and behavioral factors involved in human diseases, as well as body parts and symptoms that are affected and caused by diseases. The system is capable of automatically extracting relations between these entities from textual Web sources.

Our knowledge base is bootstrapped by integrating entities from hand-crafted and well organized sources like MeSH, OMIM and UMLS. As these are short on relationships between different types of biomedical entities, this system employs flexible and robust pattern learning and constraint-based reasoning methods to automatically extract new relational facts from textual sources, which are then added to the knowledge base.

The result is a semantic graph of typed entities and relations between diseases, their symptoms, affected body parts, and determining factors, with emphasis on behavioral and environmental factors, including molecular determinants. The facts stored in our knowledge base are provided to the user in a Web-browser interface.

We validated our approach on the basis of four data sets on diseases and their factors gained from different sources. With our approach, we were able to achieve a precision of >80%, a recall of >75%, and thus F1-score of >77%.

Contact

IMPRS-CS
-1803
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

Please note: The talks will take place in random order!

Heike Przybyl, 07/21/2011 12:29
Heike Przybyl, 07/21/2011 12:13 -- Created document.