Scalable infrastructure for Science using Openstack and Ceph
TimeTuesday, July 246:30pm - 8:30pm
DescriptionIn recent years, the computational needs of research scientists have grown exponentially. This is a trend that is likely to continue as the collection and analysis of bigger and bigger datasets becomes a critical component of scientific inquiry. Managing and maintaining computational resources has become an increasingly demanding aspect of running a lab. In order to minimize management overhead, maximize hardware value, and facilitate future expansion, we have created a small computer cluster using Openstack and Ceph. Our solution combines the advantages of cloud-like resource management with more traditional shared environment computer clusters.
Data storage in a traditional shared environment cluster typically consists of a conventional mounted filesystem. More modern big data storage solutions typically revolve around object stores, which have lower overhead because they require less metadata. However, the workflow of most scientists is still based around the conventional file system model. Therefore, we created a new software package that allows researchers familiar with conventional file systems to work easily with the Ceph object store from within a scientific computing environment like IPython. Our software is called cottoncandy. It is a powerful python-based tool for accessing and storing NumPy array data in object stores (e.g AWS S3, Google Drive) with minimal disk and network overhead. This combination of hardware management and software tools has allowed us to implement a highly extensible cluster which grows with the computing and storage needs of researchers while providing high throughput access to computational and storage resources.