Recently, biomedical ontologies have been growing tremendously. They contain information about diverse concepts such as diseases, symptoms, and medications. These ontologies are rich in IS-A relations that form class hierarchies. However, incompatibility relationships are still very sparse.
We propose a method for automatically discovering incompatible medical concepts in text corpora. The approach is distantly supervised based on a seed set of incompatible concept pairs like symptoms or conditions that rule each other out. Two concepts are considered incompatible if their definitions match a template, and contain an antonym pair derived from WordNet, VerbOcean, or a hand-crafted lexicon. Our method creates templates from dependency parse trees of definitional texts, using seed pairs. The templates are applied to a text corpus, and the resulting candidate pairs are categorized and ranked by statistical measures.
Since experiments show that the results face semantic ambiguity problems, we further cluster the results into different categories. We applied this approach to the concepts in Unified Medical Language System, Human Phenotype Ontology, and Mammalian Phenotype Ontology. Out of 77,496 concepts with definitions, 1,958 pairs were detected as incompatible with an average precision of 0.80