Thursday, November 6, 2014

OpenStack Series: Part 7 – Swift – Object Storage Service

Last time when I blog about OpenStack Cinder I have mentioned that there are 3 main categories of storage:
 I have found a good comparison of the 3 categories of storage:
image source: http://cdn.ttgtmedia.com/rms/onlineImages/sidebyside_comparison.png

Introduction to Object Storage
OpenStack Documentation defines Object Storage as a robust, highly scalable and fault tolerant storage platform for unstructured data such as objects. Objects are stored bits, accessed through a RESTful, HTTP-based interface. You cannot access data at the block or file level. Object Storage is commonly used to archive and back up data, with use cases in virtual machine image, photo, video and music storage.

If you want to know more about object storage take a look a this TechTarget SearchCloudStorage article.


OpenStack Swift Architecture
Swift is the Object Storage service in OpenStack.   I think one of the best feature for object storage is build-in redundancy and high availability objects are stored on multiple nodes.  We can also think of Swift as a replication system.

Object storage fits right into the current huge demand for storing web based content.  Everything stored in Swift is an object and each object can be referenced by a URI which makes access to the stored content easily retrievable.

By far the best article on OpenStack Swift is this one by SwiftStack which has tons of useful information about Swift.


SwiftStack offers a commercial version of Swift and is a major contributor and promoter to the OpenStack Swift project. On Oct 30 SwiftStack received $16M series B funding.

A simple way to look at Swift can be a 2 tier architecture:
image source: https://swiftstack.com/images/posts/swift-global-replication/replica-overview.png

Swift is composed of two types of nodes:
  • Proxy Nodes. This is the node that interface with "Swift clients" and to handle all the requests and processing.  Clients only interacts with the Proxy node.
  • Storage Nodes. This is the node that host the storage for the objects.
To setup a Swift cluster we need to configure both the Proxy and Storage node.  This article shows how to configure these 2 nodes.

OpenStack Swift Terminology
image source: https://developers.seagate.com/download/attachments/1769521/SwiftStack%20Current%20Architecture.jpg?version=1&modificationDate=1381516991000&api=v2
  • Partitions - A complete and non-overlapping set of key ranges such that each object, container and account is a member of exactly one partition as per the value of its key
  • Ring - Maps each partition to a set of physical devices
  • Objects - Key-value entries in the object store
  • Containers - Groups of objects
  • Accounts - Groups of containers
  • Object/Storage Server - store, retrieve and delete objects stored on local devices
  • Container Server - store the listing of objects using sqlite database.
  • Account Server - similar to the container server but it store the listing of containers.
  • Proxy Server - Scalable API request handler, determines storage node distribution of objects based on URL 
  • Replicator - utility process to handle the data replication
  • Updater - handle update that are not performed successfully so as to maintain the integrity of the data in the Swift cluster
  • Auditor - runs on each node to check for integrity of the object, container and account information.
From the above list we can see that there are 4 categories of terminologies:
  1. Data access: ring and partition
  2. Data representation: account, container and objects
  3. Servers type: proxy, object, container and account server
  4. Utility process: replicator, updater and auditor


 image source: https://www.mirantis.com/blog/object-storage-openstack-cloud-swift-ceph/

The above diagram show the relationship between the Proxy server and the Account, Object and Container server via the Ring. For a more detailed description of the various servers in Swift visit this blog post.

All objects in Swift can be accessed via the RESTful API. SwiftStack blog has this description for the API format:
  • /account
    • The account storage location is a uniquely named storage area that contains the metadata (descriptive information) about the account itself as well as the list of containers in the account.
    • Note that in Swift, an account is not a user identity. When you hear account, think storage area.
  • /account/container
    • The container storage location is the user-defined storage area within an account where metadata about the container itself and the list of objects in the container will be stored.
  • /account/container/object
    • The object storage location is where the data object and its metadata will be stored.
 
 image source: https://swiftstack.com/static/global/images/swift_architecture_aco.jpg



Example of a Swift API call will be in this format:

GET /v1/{account}/{container}/{object}
or
PUT /v1/{account}/{container}/{object}

Each object is represented by an URI: https://abc.com/v1/account/container/object_name

Swift is deployed as a cluster.  Each cluster is made up of nodes.  A node is the Linux machine where the Swift processes runs on.  Each cluster can logically be divided into regions in which there are zones.
image source: https://swiftstack.com/static/global/images/swift_architecture_regions.jpg

Usually a Region represents a geographic location.  Zones are also called the Availability Zone in which they are to be defined to isolate failure.

The heart of Swift is the - Ring

A ring is a static data structure in which object name is mapped to a partition using a "modified MD5" hashing algorithm.  Each partition maps to a list of physical devices.

image source: https://julien.danjou.info/media/images/blog/2012/riak-ring.png

Melissa Palmer (@vmiss33) has a nice article on Swift ring.

Swift Storage Policy
New in the Juno release is the Storage Policy. This makes OpenStack more "enterprise" ready by allowing users and application developers to decide how they want to store, replicate and access data across different backends and geographical regions. The Juno Release stated for Swift Storage Policy:

Storage policies give users more control over cost and performance in terms of how they want to replicate and access data across different backends and geographical regions.

The Ring is the key in making the Storage Policy works without much change to the existing API.

Related Post:
OpenStack Series Part 1:  How do you look at OpenStack?
OpenStack Series Part 2:  What's new in the Juno Release?
OpenStack Series Part 3:  Keystone - Identity Service
OpenStack Series Part 4:  Nova - Compute Service
OpenStack Series Part 5:  Glance - Image Service
OpenStack Series Part 6:  Cinder - Block Storage Service
OpenStack Series Part 8:  Neutron - Networking Service  
OpenStack Series Part 9:  Horizon - a web based UI Service
OpenStack Series Part 10: Heat - Orchestration Service 
OpenStack Series Part 11: Ceilometer - Monitoring and Metering Service
OpenStack Series Part 12: Trove - Database Service
OpenStack Series Part 13: Docker in OpenStack
OpenStack Series Part 14: Sahara - Data Processing Service
OpenStack Series part 15: Messaging and Queuing System in OpenStack
OpenStack Series Part 16: Ceph in OpenStack
OpenStack Series Part 17: Congress - Policy Service  
OpenStack Series Part 18: Network Function Virtualization in OpenStack

OpenStack Series Part 19: Storage Polices for Object Storage
OpenStack Series Part 20: Group-based Policy for Neutron

Reference:
"SwiftStackBlog." A Globally Distributed OpenStack Swift Cluster. N.p., n.d. Web. 06 Nov. 2014.
"OpenStack Swift Architecture." Software Defined Storage. N.p., n.d. Web. 06 Nov. 2014.
"OpenStack Swift - Kinetic - Developers.seagate.com." OpenStack Swift - Kinetic - Developers.seagate.com. N.p., n.d. Web. 06 Nov. 2014.

6 comments:

  1. This blog provide all information about OpenStack storage. how much important is OpenStack local storage. Thanks for sharing useful information.

    ReplyDelete
  2. Thnq for sharing your ideas with is. Its very useful for me to develope my knowledge.Nice work keep going
    DevOps training in chennai
    best DevOps training institute in chennai

    ReplyDelete
  3. Outstanding blog thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us.
    artificial intelligence and machine learning course in chennai
    machine learning with python course in Chennai
    machine learning training in velachery

    ReplyDelete