Punnoose, Roshan; Crainiceanu, Adina; Rapp, David
Rya: a scalable RDF triple store for the clouds
Rya: A Scalable RDF Triple Store for the Clouds Roshan Punnoose Proteus Technologies Adina Crainiceanu US Naval Academy David Rapp Laboratory for Telecommunication Sciences firstname.lastname@example.org email@example.com firstname.lastname@example.org ABSTRACT Resource Description Framework (RDF) was designed with the initial goal of developing metadata for the Internet. While the Internet is a conglomeration of many interconnected networks and computers, most of today's best RDF storage solutions are confined to a single node. Working on a single node has significant scalability issues, especially considering the magnitude of modern day data. In this paper we introduce a scalable RDF data management system that uses Accumulo, a Google Bigtable variant. We introduce storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes, while providing fast and easy access to the data through conventional query mechanisms such as SPARQL. Our performance evaluation shows that in most cases, our system outperforms existing distributed RDF solutions, even systems much more complex than ours. Categories and Subject Descriptors: H.3.2 Information Storage, H.3.3 Information Search and Retrieval, H.3.4 Systems and Software - Distributed Systems H.2.4 Systems - Distributed Databases, Query Processing General Terms: Algorithms, Management, Performance. Keywords: RDF triple store, distributed, scalable.