Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

MPI-INF or MPI-SWS or Local Campus Event Calendar

<< Previous Entry Next Entry >> New Event Entry Edit this Entry Login to DB (to update, delete)
What and Who
Title:Thoth: Practical Data flow protection in a search engine
Speaker:Aastha Mehta
coming from:Max Planck Institute for Software Systems
Speakers Bio:
Event Type:SWS Student Defense Talks - Qualifying Exam
We use this to send out email in the morning.
Level:Expert Audience
Date, Time and Location
Date:Friday, 21 November 2014
Duration:60 Minutes
Building:E1 5
Online data retrieval services like commercial search engines, online social networking, and trading and sharing sites process large volumes of data of different origins and types. A search engine like Bing or Google, for instance, indexes online social network (OSN) data, personal email, corporate documents, public web documents and blogs. Each data item potentially has its own usage policy. For example, email is private, OSN data and blogs may be limited to friends, and corporate documents may be restricted to employees. Furthermore, providers must comply with local laws and court orders, requiring them, for instance, to filter certain data items within a given jurisdiction.

Although data items are subject to different policies, scalability, flexibility, and the need to deliver comprehensive search results dictate that the data be handled within the same system. Ensuring compliance with applicable policies in such a complex system, however, is a labor-intensive and error prone challenge. The policy actually in effect for a data item may depend on access control checks and settings in many components and several layers of a system, making it difficult to verify or reason about. Moreover, any design flaw, bug, or misconfiguration in a large and quickly evolving application codebase could potentially cause a policy violation.

The Open Security Foundation’s database of data losses [1] reports 1458 incidents in 2013, and Privacy Rights Clearinghouse (PRC) reports more than 860M PII breached records in 4347 breaches made public since 2005 [2]. Moreover, the stakes are high: providers stand to lose customer confidence, business and reputation, and face stiff fines in the case of policy violations.

In this talk, I will present Thoth, a practical safety net for policy compliance in a search engine. In Thoth, rich confidentiality, integrity provenance and declassification policies are stated in a declarative policy language, and associated with data conduits, i.e., files and network connections. I will explain a few example policies written in the context of Apache Lucene, an open source search engine service. I will describe the efficiency of the policy evaluation and conclude with a brief discussion of the future work.

[1] DataLossDB: Open Security Foundation.

[2] Privacy Rights Clearinghouse.

Name(s):Maria-Louise Albrecht
Video Broadcast
Video Broadcast:NoTo Location:
Tags, Category, Keywords and additional notes
Attachments, File(s):
  • Maria-Louise Albrecht, 11/26/2014 04:24 PM
  • Maria-Louise Albrecht, 11/25/2014 04:04 PM -- Created document.