Sunday, November 30, 2014

From DevOps to Puppet Part 2

In part 1, we have looked at DevOps, Configuration Management Tools and a little bit of Puppet.  In this post we will continue to look into other aspect of Puppet.

Declarative vs Imperative
One important thing to know about Puppet is that it is a Declarative Language.  What it means is that user only specify the end state of the system that it is trying to configure.  User does not need to know the system specific.  A Puppet manifest can be applied n number of times and if the end state is already in the desire state, nothing will happen.

Another equally popular Configuration Management Tool - Chef, however, is an imperative language where user write out how to achieve the end result.

Puppet Terminology
Puppet looks at everything as a resource where each resource describes some aspect of a system
  • The block of Puppet code that describes a resource is called a Resource Declaration.
  • Each resource has a type, a title and a set of attributes
Example of a resource will be
  • Linux package such as MongoDB
  • Files
  • Users on a system
  • Services such as /etc/host entries; network interface or Windows registers
Example of a resource declaration:
As we can see there are 3 "file" type but the title and attributes are different.  The first example creates a file /tmp/test1 with content "Hi'.  The second one creates a file /tmp/test2 and the security mode is set to 0644 and the third example creates a symlink /tmp/test3 which links to the file /tmp/test1

As mentioned before Puppet is a declarative language.  If the file /tmp/test1 already existed and with content "Hi", nothing will happen even if user execute the manifest 100 times because the desired end state is reached.

There are built-in types for Puppet:

image source:

If you do not find what you need from the built-in types, take a look a the Puppet Forge and chances are someone else had already did the job for you and had developed a type that you can use or similar to what your needs are.

A Puppet Manifest is a Puppet program with the .pp file extension that contains the resource declaration.These are some of the operation/command that we can have on a Puppet Manifest:

puppet parser validate configWebSrv.pp

puppet apply –-noop configWebSrv.pp

puppet apply configWebSrv.pp

puppet-lint configWebSrv.pp

The "noop" parameter is a very useful command where we can test out the manifest without actually performing the action and Puppet will respond with what is the outcome when this manifest is executed.

Puppet can operate on a standalone device or it can operate on a "Master and Agent" model where the Puppet agent will pull from the Puppet Master on a regular interval.  The default pulling interval is 30 minutes. For emergency application of new configuration, user can trigger the pulling and I believe Puppet Enterprise has the option to trigger the pulling on the console.

Manifest is complied into catalog which is in the format that the Puppet Agent understand.  Puppet will configure the resource according to the content of the catalog to the desired system state.

In a standalone model it works this way:
 image source:

In a "Master and Agent" model, the manifests are stored in the Puppet Master and periodically, the agent will poll from the Puppet Master if there is any configuration changes.  If necessary, we can trigger the poll manually instead of waiting for the next poll interval.
image source:

The difference between a standalone and "Master and Agent" model is how the manifests are store and where the catalog is complied.

Currently under preview is the Puppet Server.  This is to replace the Puppet Master and is to have many advantages in this new model because of a complete re-design.

May be in Part 3 we can look into this new Puppet Server.

Saturday, November 29, 2014

From DevOps to Puppet Part 1.

DevOps is a emerging trend in the IT department. In various conferences you will find a DevOps track or sessions with standing audiences trying to listen to the speaker or to take pictures of the PowerPoint slides. 

DevOps is a new word derived from development and operations.

                      Development + Operations = DevOps

Why putting these 2 together together?

Thanksgiving 2014 just went by and yesterday was "Black Friday" plus 2 days later it will be the "Cyber Monday".  Demand on eCommerce increases tremendously during these few days and then comes Christmas which is also a high season for online shopping (although the true meaning of Christmas is not to give gifts). Consumers of the IT services is demanding for faster service to adjust to changes.  User are also demanding for faster resolutions of problems.

The response to this new IT demands is to put Development and Operations into a team and work together to get things resolves.  Developer is reward by putting in new features.  Operations is reward by providing a stable system.  Often times the goal of a developer and an operation person will conflict with each other.  This will affect how fast a new feature can be rolled out and how fast problem found in production can be resolved.

DevOps provide a new methodology in which problems are broken into smaller problem so that it can be resolved faster.  It also provide an consistent environment for development and production and this is where Configuration Management Tools comes into the picture.

There are 2 important concepts that we have know about DevOps. The first one is that DevOps is a culture.  It is about people's mentality more than a fix methodology or check list to follow.  Each organization is unique and thus how DevOps is implemented will be different. The second one is that there is no such position as a "DevOps Engineer".  There will continue be Development Engineer and there will continue to be Operations staff. DevOps is only to put them together to work as a team and change how things are done - breaking problem into smaller problems and to provide continuous integration/feedback and delivery of new feature/bug fixes in a timely manner.

image source:

I highly recommend this book "The Phoenix Project" that talks about the subject of DevOps in a novel form.  I finished the book over one weekend.  I cannot put down the book because it was so interesting.  Instead of theory or opinion about DevOps, the book talks about a story of a company's IT problem and how different people solved that problem with the practice of DevOps.  It also went through different people objecting to the idea of DevOps and eventually saw the value of DevOps. After reading this book you will feel that you have went through implementing DevOps practice in your company's IT department with first hand experience and perspective. My wife thought something was wrong with me on that weekend.

Also if interested check out these "10 Must read DevOps books" article or this "DevOps Reading List" to find out if you want to read some of these books on the subject of DevOps.  Or check out this free "DevOps for Dummies" compliments of IBM.

Configuration Management Tools
Configuration Management Tools is an essential part of DevOps.

It is being used to automate tedious and/or repetitive tasks.  This help to avoid human errors.  All Configuration Management Tools has built-in version control and change history.  When things goes wrong we can easily roll back to the previous working version.  With change history we can also trace back to the person who made the changes and to see what is the reason that a change is necessary.

Another advantage of Configuration Management Tools is that it is the up-to-date documentation of the entire system.  A new member of the Operations team can just look at the Configuration Management Tool and will be able to grasp what devices are in the system and how they can be configured.

Configuration Management Tools can also shield the user from knowing the details of different computing platform such as Red Hat and Ubuntu.

Popular Configuration Management Tools are Puppet, Chef, Ansible, Salt.

What is Puppet?
Puppet is an open source Configuration Management Tool (CMT) used to configure mostly Linux system.  In the beginning, it is not very useful to use Puppet to configure a Windows system because the Windows system needs to be rebooted several times.  Recently this has been getting better and Puppet can work the the DSC (Desired State Configuration).  Puppet also has good support for configuring VMs in Microsoft's cloud offering - Azure.  Puppet can also be used to configure Docker containers.  It can also be used to configure networking devices such as F5, Juniper or Arista Network.

Note: some of the feature may only be available in the commercial version of Puppet - Puppet Enterprise from PuppetLabs.

PuppetLabs is based on Portland Oregon with more than 300 employees.  It packages the open source Puppet into a commercial product called Puppet Enterprise.  As the name indicate this commercial product has enterprise as the target user and add enterprise oriented feature such as Graphical User Interface, Security feature - Role Based Access Control and task orchestration capability.  The first 10 nodes of Puppet Enterprise is FREE.  As of this writing Puppet Enterprise is on version 3.7

image source:

Puppet Enterprise also has the reporting capability such that user do not have to manually go through the log files to see might went wrong when there is a failure.  It can also generate a inventory of the system that is managed by Puppet Enterprise.

Will continue to look into other aspect of Puppet in the next blog post (Part 2).

Friday, November 28, 2014

Information Security Basics Part 4:Public Key Infrastructrue (PKI)

On my last post, I talked about cryptography and one of the Cryptographic algorithms is Public Key where a pair of keys are generated.

I am sure everyone would have seen this on their browser:

Usually, user will just click "I Understand the Risks" and move on.  User education is a major part of security in an enterprise or as a matter of fact any organization.  An organization can have the state of the art firewall and IPS/IDS but the users are always the weakest link for security.  Nowadays, USB comes in 4G, 8G or 16G.  Way back when most USB is 256 MB, hackers would put a 1G USB (with virus) on a cooperation's packing lot hoping some employee would pick that up and plug that onto their PC at work to checkout what is in the USB.  It is very possible to gain access to a cooperation's network this way.

Anyways, back to the topic of this post. On a web browser when we are using https we are using a digital certificate to prove the identity of the web server.  Digital certificate is usually generated in a Public Key Infrastructure(PKI). Since we get in touch with digital certificate on a daily basis I think it will be interesting to take a look PKI.

PKI is a big topic and I can only touch on the most important elements so we can have a general overview of what PKI is.  

Digital Certificate
Wikipedia defines Digital Certificate as "an electronic document used to prove ownership of a public key. The certificate includes information about the key, information about its owner's identity, and the digital signature of an entity that has verified the certificate's contents are correct. If the signature is valid, and the person examining the certificate trusts the signer, then they know they can use that key to communicate with its owner."

Digital certificates are in the X.509 format where there is a data section and a signature section.
 image source:

The above diagram breaks down the different sections of a digital certificate.  For a more detailed description of digital certificate take a look at this IBM article.

One important information included in the certificate is the public key.  When a certificate is generated a private key and a public is issued.  The private key is keep by the owner of the certificate while the public key is included in the digital certificate and thus we have the name "Public Key" Infrastructure.
Online banking and eCommerce while being the most common use case for digital certificate, there are other use cases for digital certificates such as VPN (Virtual Private Network) that we can enable remote employees to gain access to the cooperate computer resources.  Company issued digital certificates are installed on the remote employee's laptop.  With a company issued/trusted certificate, the device is able to prove identity and gain access from home or from hotel securely over certificate based VPN.  Even for SSL based VPN, certifcate are also used.

Public Key Infrastructure (PKI)
For banking and eCommerce, the digital certificates are bought from well know digital certificate vendors such as Verisign, GeoTrust or Thawte.

It is also common for organization to setup it own Public key Infrastructure for its internal use.

When we see the word "infrastructure", we will think of complicated system such as OpenStack as a cloud infrastructure.  According to this article, PKI at its core is abut certificate:
  • How they are created
  • What information they contain
  • How they are used
  • What is the level of trust 
  • What to do when the certificate is lost
Planning and writing up of of the security procedure are an essential part of setting up a Public Key Infrastructure.  Written documents are particular important for security audit and to comply to regulatory requirements.

When we think of PKI we think of a PKI hierarchy.  In a PKI hierarchy, there is the the concept of Root CA and the Issuing CA. 

In the simplest form, PKI will have one server being the Certification Authority (CA) to generate and to revoke certificates.   In a more complex environment, there is a Root CA and then there are subordinate CA  This is useful for a big cooperation to configure subordinate CA to handle the certificate issue of a particular division.

This is the concept of CA tiering.  In general there are 3 types of tiering design.

Single/One Tier Hierarchy
A single machine handles all the operations concerning certificate.  As mentioned before in a PKI hierarchy there is the Root CA and there is the Issuing CA.  In this case of a single machine, it performs both functions.  While it is the simplest way, it is not a secure way.  User will have to decide if this single tier hierarchy is sufficient to serve the organization.
image source:

Two Tier Hierarchy
In this model, the Root and Issuing CA are on a different machine.  In this tier, the Root CA is put offline so as to protect the private key of the Root CA.  In this model there can be multiple Issuing CAs and it can be distributed according to geographic or departmental need.

image source:

Three Tier Hierarchy
In this tier there is a new type of CA - Policy CA in between the Root CA and the Issuing CA.  The purpose of the Policy CA is to issue certificate to the Issuing CA according to administrative boundary and restriction.  Each Policy CA will have its own Issuing CA.  Same as the Root CA, once the PKI is setup the Policy CA is put offline for security purpose.

Another advantage of this model is that if some certificate is compromised, user can only revoke the a single Policy CA's certificate without having to affect the other certificate under a different Policy CA.  For example, if the Policy CA is configure for different remote offices based on it geographic location.  If the certificate for Branch A is compromised, we just revoke the certificate of the Policy CA for Branch A.  All other remote location are not affected.

image source:

There are a lot more to talk about PKI.  As I am interested in security, I will blog about this subject again in the near future.

Related Post:
Information Security Basics Part 1: Security Models
Information Security Basics Part 2: Defense in Depth 
Information Security Basics Part 3: Cryptography

"Public Key Certificate." Wikipedia. Wikimedia Foundation, 15 Nov. 2014. Web. 24 Nov. 2014.
"Getting Started with Public Key Infrastructure." Networklore. N.p., n.d. Web. 24 Nov. 2014.

Thursday, November 27, 2014

Information Security Basics Part 3: Cryptography

image source:

In a nutshell, this cartoon depict what cryptography is - hidden writing.

Cryptography has 2 distinct operations:
  • Encryption
  • Decryption
When we hide a message, we need a way to return it to it original form.  A cipher is an algorithm to perform the encryption and decryption.

Cryptography History
Wikipedia has a very comprehensive wiki on the history of cryptography. Way back to the ancient Egyptian Kingdom in 1900 B.C. cryptography is a very simple is being used.  Julius Caesar in 100 B.C has the Caesar Cipher to encrypt secret message to his generals. His cipher is a simple substitution method of shifting the character by 3.

As time progress, more and more complicated cipher is used.  In the 16 century, Vigenere introduced a primitive form of encryption keys.  At the beginning of the 19 century, machines was used as a cipher in the field of Cryptography.

Of course the most famous cryptography machine is the Engima machine used by the German. The movie U-571 is still one of my favorite war movies.

image source:

As a side note not the U.S. Army is very smart in using the Navajo Indian to use their native language to communicate with each other at the Pacific theater.  In essence,  the Navajo Indian is the cipher machine.  U.S. Marine is assigned to protect this asset and sadly, if necessary the marine has to "destroy" this unique cipher machine and not to let it fall into the hand of the enemy - the Japanese.

Modern Cryptography is based on complex mathematical algorithms.

Cryptography Goal
What we have discussed so far is that Cryptography is the way to encrypt and decrypt messages.  In fact there are all together 4 goals in Cryptography:
  • Confidentiality
  • Authentication
  • Integrity of Data
  • Non-repudiation
I think for confidentiality, authentication and data integrity, it is self explanatory.  As for non-repudiation, it means the sender cannot deny sending the message.  For example, my 4 year old son is playing with my laptop and he managed to send you an Email saying I will pay you $1,000 and signed the Email with my private key.  In the eye of the court, I am obligated to pay you the $1,000 because the Email is signed by my private key which no one should have.

Cryptography Types
There are 3 general types of Cryptographic algorithms:
  • Secret Key
  • Public Key
  • Hashing
Secret Key
It is also called Symmetric key cryptography because the message is encrypted and decrypted with the same key. The advantage is speed for encrypting and decrypting.  It is vulnerable for brute force attack for the key to be cracked by hackers.  Key creation and distribution to the various party is also vulnerable for the key being compromised.

The longer the key the more difficult or needs longer time to crack by hackers.  Commonly used symmetric encryption algorithms are AES (Advanced Encryption Standard), DES (Data Encryption Standare) or Triple DES.  Sometimes in the movies or TV show we can see that 1024-bit or 2048-bit encryption is used and how the hacker is able to gain access to a certain computer system or network.

Note: Double DES is not used because it is found that doing the encryption only twice does not increase the efficiency of the key size.

Public Key
It is also called Asymmetric Key cryptography because there are 2 keys one for encrypting the message and the other is used to decrypting the message.  One key is called the private key which should be kept in secret and the other is public.  When I send a message to you, I encrypted it with my private key and you will use my public key to decrypt it.  When you reply, you will use my public key to encrypt the message and then I will use my private key to decrypt it.  The draw back for this method is that it is slow due to the complex mathematics algorithm.

As mentioned in the previous section on non-repudiation, when a message is encrypted by a private key, one cannot dispute that the message is NOT send by that person holding the private key.

Example of an asymmetric-key system is the Diffie-Hellman Key Exchange.

One of the application of this method is to use the Asymmetric key to exchange a symmetric key between 2 parties and after that all the communication between these 2 parties will be encrypted with a symmetric key which is much faster.

Hashing is different then the symmetric and asymmetric key cryptography.  It is a one-way encryption.  Plain text go through a hashing function and become a cipher text.  This cipher text, however, cannot be convert back to it original form.

If it is not possible to convert the cipher text back to the original plain text, what is the use of this method.  This method is good for data integrity.  On top of keeping the message confidential, the use of hashing function can show that the message has not been altered either intentionally by another party or unintentionally due to hardware or communication error.

Example of hashing functions are MD5, SHA256/SHA-384/SHA-512

Cryptography Use Cases
The most commonly used case for cryptography is the Secure Socket Layer (SSL).  Everyone uses the web and some website choose to use https instead of http for the communication between the web browser and the web server and https is to have http run on top of SSL.

In OpenStack, as I have blogged about the messaging and queuing system as part of my OpenStack series, SSL is being used to secure RabbitMQ or Qpid.

In an Microsoft infrastructure, Kerberos is used to allow user to log on to the infrastructure and gain access the various compute resources in the infrastructure without having to perform authentication and authorization repeatedly.

Related Post:
Information Security Basics Part 1: Security Models
Information Security Basics Part 2: Defense in Depth
Information Security Basics Part 4: Public Key Infrastructure (PKI)

Wednesday, November 26, 2014

Information Security Basics Part 2: Defense in Depth

Defense in Depth is originally a military term in which multiple layer of defense is used to make the enemy more difficult to attack the target.  The best example is the castle.
There are multiple things used for defense - a draw bridge, water (may be with crocodile), a heavy iron gate.

What are we defending against?
The castle shown about is used to defense against enemy attack.  What about in the Information Security world?  Who is the enemy?

You may say the enemy is the hacker. While this is true, the exact term used should be threats.  Threat by itself is a subjective word.  Threat can be remote, threat can be big or it can be small.

In the Information Security world, 3 terms are used together:
  • Risk
  • Threat
  • Vulnerability

In fact there is a formula for these 3 terms:

          Risk = Threat X Vulnerability

For example, when you kept your front door unlock it is a vulnerability but you live in a safe neighborhood your threat is low and thus your risk will not be high. On the other hand if you keep your front door unlock and you live in a high crime area then your risk is very high.

Note: Security is all about mitigation of risk.  We can never be 100% secure.  The objective of information security is to mitigate all know risk factor to the minimal.

Defense in Depth
Defense in Depth is a security best practice.I have heard the late CEO of Apple Inc Steve Jobs had a security guard to stay within a few feet of his laptop all the time when he is speaking at conferences.  I am sure his laptop is both password protected and encrypted, adding another layer of defense will not hurt.

There are 4 kinds of Defense in Depth:
  • Uniform Protection
  • Protected Enclaves
  • Information Centric
  • Vector-Oriented
Uniform Protection
The easiest and most common form is uniform protection where all the resources or data are treated as equally important.  With this approach, it is more vulnerable for malicious insider because everything is being treated as equal, an development engineer can gain access to the data in the HR department.

Protected Enclaves
With this approach, resources/data are segmented.  It enforces the principle of least privilege so that user can only access what they need to access.  So in this case a development engineer cannot gain access to the data in the HR department.  The Pentagon has a classify network and a non classify network.  One time I was there for on site support and since I did not have security clearance, I have to step out of the room when we debug the classify network and tell the guy that has security clearance to type in the debug command and he will have to ready the output for me.  That was quite an experience debugging the classify network.

Information Centric
Data or asset are tagged with different values.  We can envision an onion which has different layers.  The most important asset will be in the center where more protection is deployed.  Each layer has it own security implemented with this defense in depth concept.
image source:

This approach identify attack vector where the threat can be present.  Similar to the Information centric approach but the emphasis is on the attach vector such as thumb drives, smartphone that can take pictures.

Role-base Access Control
While this is not usually looked at as a defense in depth model but in principle this is form having multiple ways of gaining access.  With the data/resource segmented, after a user is logged into the system with the proper credential, the user is assigned a role and this role can be in a form of a access token is to determine what resource the user is able to access giving an additional level of access control.  Microsoft's Kerberos and OpenStack uses this Role-base Access Control.

Related Post:
Information Security Basics Part 1: Security Models
Information Security Basics Part 3: Cryptography
Information Security Basics Part 4: Public Key Infrastructure (PKI) 

Tuesday, November 25, 2014

Information Security Basics Part 1: Security Models

image source:

Security means different thing to different people.

To a home user, security is antivirus protection.  The objective is to keep the electronic device free of Malware of Rootkit so that the device can operate “normally”.  It can also mean safe web site access to bank and/or other financial institutes such that their financial account will not be compromised.  Or in some case, celebrities may want to protect their private pictures or videos.  Failure to protect the user device or user’s financial account can lead to monetary loss.  Identity thief is also a major concern for home computer users.

To the government, security, I think will be the protection of sensitive data and the continuous operation of the various departments and agencies. Around November 16, 2014, the U.S. State Department website was compromised.  And it is the 4th U.S. government agency to announce a breach of their computer systems within a few weeks’ time.  Failure to protect the government’s computer systems can lead to lost of human lives.

To the private sector, security is the protection of data which can be intellectual properties or customer’s financial and private information such as health history, social security number.  Often times company has the office of Chief Security Officer responsible for the “security” and compliance of security rules of the company such as HIPAA for the health sector or Gramm-Leach-Bliley Act for the financial sector. 

Regardless what security means for us, we can always look at security with the following model.

The CIA Triad
This is the most common security model for information systems. This model is used to develop security policy, identify area of security risk and most of all to deploy measurements to mitigate the identified risk.

CIA stands for:
  • Confidentiality
  • Integrity 
  • Availability
image source:

I like this diagram because besides showing the 3 elements of the CIA triad, it is showing data in the middle.  Most of the the time security are applied to protect data.  Social Security number is a form of data, bank account is a form of data, and intellectual properties is another form of data.  Conceptually, these 3 elements are applied toward data.  In other word, we should show as:
  • Confidentiality of the data
  • Integrity of the data
  • Availability of the data
Confidentiality means data can only be accessed by the authorized entity.  The owner of the data decided who can gain access to the data.  Access means to read, to modify or to delete the data.  The most basic form of providing confidentiality will be password protection.  To gain access to a personal computer or device we need to provide the user name and password.  Password can be as simple as a 8 character text string or it can be a X.509 certificate.  Also, there can be multiple level/factor of authentication where besides a password user will have to provide an authorized token.  For multi-level authentication user will have to provide:
  • Something you know - password
  • Something you have - RSA Token
  • Something you are - biometric

Another way to provide confidentiality is encryption.  To be satisfy regulatory requirement some companies will require the company issued personal device to turn on encryption so in case the device is lost, there is one level of safe guarding the data that is in the personal device. For BYOD, come companies can provide the ability to wipe out the data on a device remotely.

Integrity means data cannot be modified by unauthorized entity as well as the reliability of the electronic device that is storing the data.  With a flip of a bit in the storage device can make a bank account to reflect the wrong amount of money that is available.  As with object storage Swift, data are by default store in 3 different devices and there is a audit task to make sure the data are in tack.  Data replication technique such as RAID (either software or hardware) is another way to provide a means to ensure the integrity of the data.

One basic form of data integrity check for the integrity of the data is to use of a hashing function.  In networking, an Ethernet frame has a CRC value at the end so that when the frame is received, it can be checked with to see if the frame is altered during transit.  System Administrator are familiar with the MD5 hashing value of an ISO image. 

Password protection and encryption also help to prevent the data to be modified by unauthorized entity.

Availability means data can be accessed when needed.  Have you ever hit the wrong button on your computer and deleted all the Emails in your inbox?  Or have you accidentally delete the files in a directory?  In these cases backup comes to the rescue.  It is important to test the recovery of the backup data. Very often data is backup but when we have to make sure the backup tape is not empty or being over written by later backups. If you are a System Administrator you must know this famous line "Test your backup regularly!"

Most companies have a disaster recovery plan such that if data is lost due to fire, earth quake or terrorist attack data can be restored according to the expected Recovery Time Objective (RTO) and Recovery Point Objective (RPO).  It is very important to test out the recovery plan just like testing the recovery of the backup data.

Data delete by unauthorized entity is of course the basic form of attack in the area of availability.  Another form will be denial-of-service (DOS) attack.  We can easily imagine what will happen to a company if consumers are not able to access a online shopping website before Christmas because some attacker launched a DOS attack on that website.

Which one is more important?
While all 3 elements of the CIA triad are important, different organization will have different element as the most critical area.  For example, in health care industry confidentiality will be the most important element.  In bank or financial institutions, integrity will be the most important element.  Lastly as stated for online shopping/e-commerce based organization, availability will be the most important element.

Related Post:
Information Security Basics Part 2: Defense in Depth
Information Security Basics Part 3: Cryptography 
Information Security Basics Part 4: Public Key Infrastructure (PKI)

Monday, November 24, 2014

Amazon Web Services Part 3: EC2 Container Service

At the AWS Re:Invent conference, Amazon announced a new feature "EC2 Container Service" - ECS

Wait, if my compute instance is Linux based, I can install Docker on that instance what does this new feature do for me?  In fact user can create and manage Docker containers in AWS Elastic Beanstalk.

If we look into this we can find that this new feature is also described as "Container Management for the AWS Cloud".  Deploying container on the cloud is easy but this is exactly why we need a management system to keep thing under control and to provide additional benefits for customers deploying container based application.  As the container technology is becoming more and more mature with the help of Docker, we need to have management tools in place.  In my opinion as with virtual machine, later on we need to have a complete monitoring and orchestration tools to provide autoscaling functionality.  And as the trend goes, policy will be defined for just like what OpenStack Congress does.

On November 13, 2014, I blogged about Docket in OpenStack and Heat is used to manage containers.  Both Google and Microsoft uses the open source Kubernetes to manage containers in their respective cloud offering.

ECS Benefits
During the product announce at the AWS Re:Invent conference keynote, there is a slide to show the benefits of this new EC2 Container Service:
image source:
If you cannot see the image, the 4 benefits are:

  • Native Docker support for AWS customers
  • Significantly easier to manage Docker apps
  • Integrated with Docker Hub
  • Enable app portability
ECS Terminologies
On the Amazon blog Jeff Barr (@jeffbarr) has an article that has a list of terminologies to help us understand EC2 Container Service:

  • Cluster - A cluster is a pool of EC2 instances in a particular AWS Region, all managed by EC2 Container Service. One cluster can contain multiple instance types and sizes, and can reside within one or more Availability Zones.
  • Scheduler - A scheduler is associated with each cluster. The scheduler is responsible for making good use of the resources in the cluster by assigning containers to instances in a way that respects any placement constraints and simultaneously drives as much parallelism as possible, while also aiming for high availability.
  • Container - A container is a packaged (or "Dockerized," as the cool kids like to say) application component. Each EC2 instance in a cluster can serve as a host to one or more containers.
  • Task Definition - A JSON file that defines a Task as a set of containers. Fields in the file define the image for each container, convey memory and CPU requirements, and also specify the port mappings that are needed for the containers in the task to communicate with each other.
  • Task - A task is an instantiation of a Task Definition consisting of one or more containers, defined by the work that they do and their relationship to each other.
  • ECS-Enabled AMI - An Amazon Machine Image (AMI) that runs the ECS Agent and dockerd. We plan to ECS-enable the Amazon Linux AMI and are working with our partners to similarly enable their AMIs.
ECS Function

From the Amazon Web Service official web site,  EC2 Container Service is a highly scalable, high performance container management service that supports Docker containers and allow user to:

  • Easily run distributed applications on a managed cluster of Amazon EC2 instances.
  • Launch and stop container-enabled applications with simple API calls, allows you to query the state of your cluster from a centralized service, and gives you access to many familiar Amazon EC2 features like security groups, EBS volumes and IAM roles.
  • Schedule the placement of containers across your cluster based on your resource needs, isolation policies, and availability requirements.
  • Eliminates the need for you to operate your own cluster management and configuration management systems or worry about scaling your management infrastructure.
The smallest unit for EC2 Container Service to manage is a cluster.  From the terminology section about, cluster is defined as a pool of Amazon resources in an AWS Region.  When we look at the product detail of ECS, it is described as a tool for "complete visibility and control of your cluster from creating and terminating Docker containers to viewing detailed cluster state information". 

Future Direction

In my opinion, in a cloud the ability to meter and monitor is an important aspect especially for public cloud where resource is being charged.  Amazon had not announced anything on this yet in AWS Re:Invent. As of this writing this feature is still in preview status - FREE.  As the container technology in Amazon Web Services become more mature, it is very possible that it will become a paid service.  After all, the purpose of AWS is to make money. 

Another area that has potential for container technology to grow is PaaS.  Red Hat is using the container technology for it PaaS offering and I think AWS will be catching up in this area also. 

Network Function Virtualization with container is a hot topic these days but it seem AWS is not doing much in the networking area. 

Related Post:
Amazon Web Services Part 1: Do you know all of these icons?
Amazon Web Services Part 2: Security Offerings

"Amazon EC2 Container Service (ECS) - Container Management for the AWS Cloud." Amazon EC2 Container Service (ECS) - Container Management for the AWS Cloud. N.p., n.d. Web. 17 Nov. 2014.
"AWS | Amazon EC2 Container Service." Amazon Web Services, Inc. N.p., n.d. Web. 17 Nov. 2014.