Claudia Perlich has received her M.Sc. in Computer Science from Colorado University at Boulder, Diplom in Computer Science from Technische Universitaet in Darmstadt, and her Ph.D. in Information Systems from Stern School of Business, New York University. Her Ph.D. thesis concentrated on probability estimation in multi-relational domains that capture information of multiple entity types and relationships between them. Her dissertation was recognized as an additional winner of the International SAP Doctoral Support Award Competition and her submission placed second in the yearly data mining competition in 2003 (KDD-Cup 03).
Claudia joined the Data Analytics Research group as a Research Staff Member in October 2004. Her research interests are in machine learning for complex real-world domains including marketing, finance and medicine. She and her team have been very successful in data mining competitions. Her recent wins include KDD CUP 2007, 2008 and 2009.
The KDD CUP 2008 was organized by Siemens Medical Solutions ( http://www.kddcup2008.com/ ). They provided mammography based data for around 1700 patients. Siemens used proprietary software to extract from the original digital image data candidate regions and to characterize such regions in terms of 117 normalized numeric features with unknown interpretation. Task 1 was the identification of malignant candidate regions in mammography pictures with a ranking-based evaluation measure similar to ROC. Task 2 required submitting the longest list of healthy patients. Any submission with even one false negative was disqualified. Our winning submission to both tasks exploited a) the properties of the evaluation metrics to improve the model scores from of a linear SVM and b) some form of data leakage that resulted in predictive information in the patient identifiers.