Privacy and Anonymity in Text

Christopher Clifton
Purdue University
Friday, 31 August 2007
The increased ability to collect, store, and share information;
combined with new techniques for data analysis; are revolutionary.
Unfortunately, the collection and sharing of data raises significant
privacy concerns. Privacy and data anonymization techniques for
structured data can be used to support data analysis while preventing
breaches of privacy, but anonymizing textual data is more difficult
(as AOL discovered recently.)

This talk will discuss recent progress on several problems
related to privacy and anonymization of text. These include
search (can we have "private information retrieval" if the
server doesn't cooperate?), anonymizing text ("John Doe" received
"an award" at "a conference" for defining the need for experimental
evaluation methodology in data management. Is this anonymous?),
and private verification of authenticity / natural language
watermarking. Emphasis will be on challenges and partially-solved


