In this talk I will describe two state of the art systems for information harvesting,implementing the above approaches: 'Espresso', a pattern-based system for extracting semantic relations; and a system for inducing lexical knowledge based on distributional techniques.
I will outline strengths and weaknesses of the two approaches, and describe the potential offered by the integration of the two methods in real NLP tasks, such Textual Entailment Recognition. I will also briefly introduce how the extraction process can be scaled to the Web, using new recently introduced computational paradigms based on large clusters of computers.