MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Thoth: Practical Data flow protection in a search engine

Aastha Mehta
MMCI
SWS Student Defense Talks - Qualifying Exam
SWS  
Expert Audience
English

Date, Time and Location

Friday, 21 November 2014
15:00
60 Minutes
E1 5
422
Saarbrücken

Abstract

Online data retrieval services like commercial search engines, online social networking, and trading and sharing sites process large volumes of data of different origins and types. A search engine like Bing or Google, for instance, indexes online social network (OSN) data, personal email, corporate documents, public web documents and blogs. Each data item potentially has its own usage policy. For example, email is private, OSN data and blogs may be limited to friends, and corporate documents may be restricted to employees. Furthermore, providers must comply with local laws and court orders, requiring them, for instance, to filter certain data items within a given jurisdiction.

Although data items are subject to different policies, scalability, flexibility, and the need to deliver comprehensive search results dictate that the data be handled within the same system. Ensuring compliance with applicable policies in such a complex system, however, is a labor-intensive and error prone challenge. The policy actually in effect for a data item may depend on access control checks and settings in many components and several layers of a system, making it difficult to verify or reason about. Moreover, any design flaw, bug, or misconfiguration in a large and quickly evolving application codebase could potentially cause a policy violation.

The Open Security Foundation’s database of data losses [1] reports 1458 incidents in 2013, and Privacy Rights Clearinghouse (PRC) reports more than 860M PII breached records in 4347 breaches made public since 2005 [2]. Moreover, the stakes are high: providers stand to lose customer confidence, business and reputation, and face stiff fines in the case of policy violations.

In this talk, I will present Thoth, a practical safety net for policy compliance in a search engine. In Thoth, rich confidentiality, integrity provenance and declassification policies are stated in a declarative policy language, and associated with data conduits, i.e., files and network connections. I will explain a few example policies written in the context of Apache Lucene, an open source search engine service. I will describe the efficiency of the policy evaluation and conclude with a brief discussion of the future work.

[1] DataLossDB: Open Security Foundation. http://datalossdb.org.

[2] Privacy Rights Clearinghouse. http://privacyrights.org.

Contact

Maria-Louise Albrecht
--email hidden
passcode not visible
logged in users only

Maria-Louise Albrecht, 11/26/2014 16:24
Maria-Louise Albrecht, 11/25/2014 16:04 -- Created document.