Although data items are subject to different policies, scalability, flexibility, and the need to deliver comprehensive search results dictate that the data be handled within the same system. Ensuring compliance with applicable policies in such a complex system, however, is a labor-intensive and error prone challenge. The policy actually in effect for a data item may depend on access control checks and settings in many components and several layers of a system, making it difficult to verify or reason about. Moreover, any design flaw, bug, or misconfiguration in a large and quickly evolving application codebase could potentially cause a policy violation.
The Open Security Foundation’s database of data losses [1] reports 1458 incidents in 2013, and Privacy Rights Clearinghouse (PRC) reports more than 860M PII breached records in 4347 breaches made public since 2005 [2]. Moreover, the stakes are high: providers stand to lose customer confidence, business and reputation, and face stiff fines in the case of policy violations.
In this talk, I will present Thoth, a practical safety net for policy compliance in a search engine. In Thoth, rich confidentiality, integrity provenance and declassification policies are stated in a declarative policy language, and associated with data conduits, i.e., files and network connections. I will explain a few example policies written in the context of Apache Lucene, an open source search engine service. I will describe the efficiency of the policy evaluation and conclude with a brief discussion of the future work.
[1] DataLossDB: Open Security Foundation. http://datalossdb.org.
[2] Privacy Rights Clearinghouse. http://privacyrights.org.