Sunday, November 16, 2014

OpenStack Series: Part 16 – Ceph in OpenStack

In the Ceph home page, Ceph is described as a unified, distributed storage system designed for excellent performance, reliability and scalability.
Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability. - See more at: http://ceph.com/#sthash.eFgRE0CM.dpuf
Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability. - See more at: http://ceph.com/#sthash.eFgRE0CM.dpuf
Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability. - See more at: http://ceph.com/#sthash.eFgRE0CM.dpuf
Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability. - See more at: http://ceph.com/#sthash.eFgRE0CM.dpuf

It is easy to understand distributed but what about unified?

Unified means Ceph is able to deliver object, block and file storage in one system using commodity hardware.  These commodity hardware is usually defined as Node and a cluster is Cluster is a collection of node.

Ceph terminologies can be found in here.

Ceph is an open source Software Defined Storage.  Inktank is the commercial company that delivers enterprise ready Ceph.  Inktank was bought by Red Hat in May 2014.  Dreamhost is also a major contributor to the Ceph open source software.

As a side not Ceph comes from the "cephalopod"Inktank is kind of related to because cephalopod can squirt ink.  Also the management and monitoring system for Ceph is called Calamari.

The attraction of Ceph is its ability to scale with commodity hardware and there is also build-in resiliency/High Availability. 

Ceph is deployed as Storage Clusters in which there is the RADOS (Reliable Autonomic Distributed Object Store) and Ceph software uses CRUSH (Controlled Replication Under Scalable Hashing) to determine how and where to store the data within the storage cluster.


A Ceph Storage Cluster consists of two types of daemons:
A Ceph Monitor maintains a master copy of the cluster map. A cluster of Ceph monitors ensures high availability should a monitor daemon fail. Storage cluster clients retrieve a copy of the
cluster map from the Ceph Monitor.

A Ceph OSD Daemon checks its own state and the state of other OSDs and reports back to monitors.

If CephFS is used there is also the Ceph Metadata Server (MDS).
 
Ceph Architecture
image source: http://ceph.com/docs/master/_images/stack.png

This diagram show that RADOS is the base of a Ceph Storage Cluster.  On top there are
  • LIBRADOS
  • RADOSGW
  • RBD (RADOS Block Device)
  • CephFS

Ceph and OpenStack

Ceph was integrated into OpenStack in the Folsom release.

Being a unified storage provider, Ceph is a storage solution of choice to be used in an OpenStack Infrastructure.

The diagram below shows the how OpenStack interface with Ceph:


image source: http://www.inktank.com/wp-content/uploads/2013/03/Diagram_v3.0_CEEPHOPENTSTACK11-1024x455.png

The Inktank blog has a good description on how Ceph fits into an OpenStack environment:

Block Storage for OpenStack

  • Ceph serves as a native Cinder block provider for images and volumes, and integrates with the virtualization infrastructure to connect the block devices to the VM’s.
  • The Ceph RBD block device (RBD) enables instant thin provisioning and cloning of images and volumes used by OpenStack Nova.
  • This makes booting new VM’s with highly available, fault-tolerant disks fast, easy, and efficient.
  • Volumes can also be cloned from volume snapshots.

Object Storage for OpenStack
  • Ceph Object Gateway (RGW) provides complete compatibility with the Swift API, integrates into Keystone for authentication and can be used as a backend to Glance.
  • Full compatibility with the Amazon S3 API, a more scalable and easier to manage architecture, and the ability to run a single system for object and block,

Ceph at CERN
CERN is a huge nuclear research institute in Europe.  CERN deploy OpenStack in its production environment.  CERN received the "OpenStack Superuser Award" at the OpenStack summit in Paris. Checkout their cloud infrastructure here. Being a research institute, storage is important.  Ceph is being used by CERN for image processing, storing and archiving research data as well as quick data retrieval. It has a 3 PB (petabyte) Ceph Cluster in production.

Note: 1 PB = 1000000000000000B = 1015bytes = 1000terabytes.

image source: https://pbs.twimg.com/media/B0a6e1CCAAED1Pq.png:large


Not exactly a reference architecture of Ceph but with this example we can see that Ceph has a lots of potential to be used along with OpenStack.

Ceph use cases
Ceph runs on the same Linux cluster that KVM is running on.  With OpenStack Heat to autoscale, it has all the right ingredients to be made into a hyperconvergence unit.  Recently Nutanix and Simplivity are gaining momentum in this hyperconvergence space.  One application of hyperconvergence is on VDI (Virtual Desktop Infrastructure) and big data market.

According to Mirantis, OpenStack Sahara is planing to have native Ceph support in the Kilo release.

It seems to me that due the nature of Ceph being able to support object, block and file system storage, it has huge potentials for different application and use cases.

Related Post:
OpenStack Series Part 1: How do you look at OpenStack?
OpenStack Series Part 2: What's new in the Juno Release?
OpenStack Series Part 3: Keystone - Identity Service
OpenStack Series Part 4: Nova - Compute Service
OpenStack Series Part 5: Glance - Image Service
OpenStack Series Part 6: Cinder - Block Storage Service
OpenStack Series Part 7: Swift - Object Storage Service
OpenStack Series Part 8: Neutron - Networking Service
OpenStack Series Part 9: Horizon - a Web Based UI Service
OpenStack Series Part 10: Heat - Orchestration Service
OpenStack Series Part 11: Ceilometer - Monitoring and Metering Service
OpenStack Series Part 12: Trove - Database Service
OpenStack Series Part 13: Docker in OpenStack
OpenStack Series Part 14: Sahara - Data Processing Service
OpenStack Series part 15: Messaging and Queuing System in OpenStack
OpenStack Series Part 17: Congress - Policy Service
OpenStack Series Part 18: Network Function Virtualization in OpenStack
OpenStack Series Part 19: Storage Polices for Object Storage
OpenStack Series Part 20: Group-based Policy for Neutron

Reference:
"Architecture¶." Architecture — Ceph Documentation. N.p., n.d. Web. 31 Oct. 2014.
"Home Ceph." Ceph Home Comments. N.p., n.d. Web. 31 Oct. 2014. 
"Ceph for OpenStack." Inktank Ceph for OpenStack Comments. N.p., n.d. Web. 31 Oct. 2014.

1 comment:

  1. BlueHost is ultimately the best hosting provider with plans for all of your hosting requirements.

    ReplyDelete