MPI-INF Logo
Campus Event Calendar

Event Entry

New for: D2, D3

What and Who

PhD Application Talk: A dictionary of chemical names and synonyms merged from different resources based on 2d graph representation for the purpose of recognition of chemical names in the text

Albina Asadulina
University of Bonn
Talk
AG 1, AG 3, AG 5, SWS, AG 2, AG 4, RG1, MMCI  
MPI Audience
English

Date, Time and Location

Monday, 12 July 2010
10:30
120 Minutes
E1 4
024
Saarbrücken

Abstract

Extraction of chemical information and storage is essential in the field of medicine, for instance, when creating or improving a drug. In order to acquire such information from the literature a problem of finding chemical names in the text should be solved. Simple string search is not powerful enough in this field because chemicals can be used in the text under different names.

One of the existing approaches for chemical name detection is look-up approach that uses a dictionary comprising term variations and synonyms. It was chosen in the current work for the analysis. The challenge for this method is that available chemical databases are incomplete or focus sometimes on the certain types of chemical compounds like metabolites or approved drugs. Therefore several resources should be merged to generate a comprehensive dictionary. When merging the data sources the criteria for identity of the compounds should be defined, i.e. how to deal with the structures that differ only in stereochemistry, charges, isotopes, etc.
One can merge the compounds based on CAS numbers, InChI identifiers, Synonym overlap. The method proposed here is to merge databases analyzing the 2D graph representation of the compounds when merging databases. Direct comparison of the structure is a more flexible approach where structure information is not lost.
For the creation of a dictionary a workflow is developed that allows to merge databases comparing 2D graph representation of the compounds. The user is able to set up the criteria for structure identity according to the research needs.
In the course of the work the criteria for structure identity should be defined that serve best for the Text Mining purposes: which structure issues should be considered or ignored for compound comparison. Performance of the created dictionary should be compared to the existing ones.

Contact

IMPRS-CS
--email hidden
passcode not visible
logged in users only

Tags, Category, Keywords and additional notes

Please note: The talks will take place in random order!

Heike Przybyl, 07/01/2010 15:32 -- Created document.