The OpenStack documentation uses "data intensive application cluster", and I think most people will use the term "big data processing" or "data analytic" in which huge amount of computer power is required for processing the raw data collected.
Often times the impression for Sahara is to have Hadoop running on OpenStack. At this time Hadoop is the main application cluster that Sahara support. Spark is also being worked on. There is the Spark Plugin in Sahara. I think it all depends on what application for Big Data is being deployed are popular in the enterprise environment where OpenStack is used.
Hadoop 2.0 by itself is just like OpenStack which is a set of services for data processing. It has 2 pillars:
- YARN - Yet Another Resource Negotiator
- HDFS -Hadoop Distributed File System
image source: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWlmlf5oMIdBHlX3M9HbdH2bedW4n6G8sgfQRPH6vMKrLXXUzwgKSJm7Xwf47qJmPsf_9d6MJfwG_BkOziXfGtK6BjqFUE_zNY21-cr7olgz5beB2AlewxoQdbh3j6G8_MWg-0U_58e0I/s1600/YARN.png
Another view of Hadoop:
image source: http://www.rosebt.com/uploads/8/1/8/1/8181762/5829807_orig.jpg
This one shows the Hadoop ecosystem with MapReduce, PIG, HIVE, HBASE ... etc.
image source: http://hortonworks.com/wp-content/uploads/2013/10/HDP2.0Stack.png
Sahara Architecture
image source: http://docs.openstack.org/developer/sahara/_images/sahara-architecture.png
OpenStack Documentation describe the various components of Sahara as:
- Auth component - responsible for client authentication & authorization, communicates with Keystone
- DAL - Data Access Layer, persists internal models in DB
- Provisioning Engine - component responsible for communication with Nova, Heat, Cinder and Glance
- Vendor Plugins - pluggable mechanism responsible for configuring and launching Hadoop on provisioned VMs; existing management solutions like Apache Ambari and Cloudera Management Console could be utilized for that matter
- EDP - Elastic Data Processing (EDP) responsible for scheduling and managing Hadoop jobs on clusters provisioned by Sahara
- REST API - exposes Sahara functionality via REST
- Python Sahara Client - similar to other OpenStack components Sahara has its own python client
- Sahara pages - GUI for the Sahara is located on Horizon
Sahara Use Cases
Sahara supports two key use cases:
- on-demand cluster provisioning
- on-demand Hadoop tasks execution (Elastic Data Processing)
Cluster
- Consist of node group
- 3 types of node groups: Master, Core Workers and Workers
- Two kinds of templates: node group templates and cluster templates
- User can override the parameters of the templates via API
- Hadoop distribution specific due to different parameters used
- Responsible for provisioning a Hadoop cluster
- One plugin for each specific Hadoop distribution (Apache Hadoop, HortonWorks)
- A list of available plugins can be found here
- To support cluster provisioning, pre-built image with an installed OS are needed
- Helps filter out images during cluster creation
- This page explain how to work with image registry
image source: http://docs.openstack.org/developer/sahara/_images/hadoop-cluster-example.jpg
Elastic Data Processing
- Work flow management
- Similar to Amazon Web Services Elastic MapReduce (EMR) which is being deployed heavily on public cloud
- Jobs can be launched either via the OpenStack Dashboard or CLI
- API to launch jobs without the user having to know the underlying Hadoop Infrastructure
image source: https://wiki.openstack.org/wiki/File:EDP_diagram.png
Related Post:
OpenStack Series Part 1: How do you look at OpenStack?
OpenStack Series Part 2: What's new in the Juno Release?
OpenStack Series Part 3: Keystone - Identity Service
OpenStack Series Part 4: Nova - Compute Service
OpenStack Series Part 5: Glance - Image Service
OpenStack Series Part 6: Cinder - Block Storage Service
OpenStack Series Part 7: Swift - Object Storage Service
OpenStack Series Part 8: Neutron - Networking Service
OpenStack Series Part 9: Horizon - a Web Based UI Service
OpenStack Series Part 10: Heat - Orchestration Service
OpenStack Series Part 11: Ceilometer - Monitoring and Metering Service
OpenStack Series Part 12: Trove - Database Service
OpenStack Series Part 13: Docker in OpenStack
OpenStack Series part 15: Messaging and Queuing System in OpenStack
OpenStack Series Part 16: Ceph in OpenStack
OpenStack Series Part 17: Congress - Policy Service
OpenStack Series Part 18: Network Function Virtualization in OpenStack
OpenStack Series Part 19: Storage Polices for Object Storage
OpenStack Series Part 20: Group-based Policy for Neutron
Reference:
"OpenStack." Architecture — Sahara. N.p., n.d. Web. 28 Oct. 2014.
"OpenStack." Getting Started — Sahara. N.p., n.d. Web. 29 Oct. 2014.
Cloud is one of the tremendous technology that any company in this world would rely on(Salesforce.com training in chennai). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Saesforce Admin Training in Chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.
ReplyDeleteYour blog has given me that thing which I never expect to get from all over the websites. Nice post guys!
ReplyDeleteExcellent blog
ReplyDeletepython interview questions
git interview questions
django interview questions
sap grc interview questions and answers
advanced excel training in bangalore
zend framework interview questions
apache kafka interview questions
Thank you so much for sharing such an awesome blog...
ReplyDeleteadvanced analytics certification
analytical skills development services
really Good blog post.provided a helpful information.I hope that you will post more updates like thisBig data hadoop online Training Bangalore
ReplyDeleteThis blog is full of Innovative ideas.surely i will look into this insight.please add more information's like this soon.
ReplyDeleteAWS Certification Training in Anna nagar
AWS Training in Ambattur
AWS Training in T nagar
AWS Certification Training in T nagar
Great Article. Thank you for sharing! Really an awesome post for every one.
ReplyDeleteA Behavior Based Trustworthy Service Composition Discovery Approach in Cloud Environment Project For CSE
A Distributed Intelligent Hungarian Algorithm for Workload Balance in Sensor Cloud Systems Based on Urban Fog Computing Project For CSE
A Novel Cloud Based Framework for the Elderly Healthcare Services Using Digital Twin Project For CSE
A Secure and Efficient Data Integrity Verification Scheme for Cloud IoT Based on Short Signature Project For CSE
A Secure G Cloud Based Framework for Government Healthcare Services Project For CSE