Application Virtualization
In a world where a Hot New Thing In Tech is manufactured by marketing departments on demand for every annual trade show in every year there's something that is is stirring up interest all by itself (though it has it's own share of marketing help) The idea of application containers in general and Docker specifically has become a big deal in the software development industry in the last year.I'm generally pretty skeptical of the hype that surrounds emerging tech and concepts (see DevOps) but I think Docker has the potential to be "disruptive" in the business sense of causing people to have to re-think how they do things in light of new (not yet well understood) possibilities.
In the next few posts (which I hope to have in rapid succession now) I plan to go into some more detail about how to create Docker containers which are suitable for composition into a working service within a Kubernetes cluster. The application I'm going to use is Pulp, a software repository mirror with life-cycle management capabilities. It's not really the ideal candidate because of some of the TBD work remaining in Docker and Kubernetes, but it is a fairly simple service that uses a database, a messaging service and shared storage. Each of these brings out capabilities and challanges intrinsic in building containerized services.
TL;DR.
Let me say at the outset that this is a long more philosophical than technical post. I'm going to get to the guts of these tools in all their gooey glory but I want to set myself some context before I start. If you want to get right to tearing open the toys, you can go straight to the sites for Docker and Kubernetes:
- Docker - Containerized applications
http://www.docker.com - Kubernetes - Clustering container hosts
https://github.com/GoogleCloudPlatform/kubernetes
The Obligatory History Lesson
For 15 years, since the introduction of VMWare Workstation in 1999 [1], the primary mover of cloud computing has been the virtual machine. Once the idea was out there a number of other hardware virtualization methods were created: Xen, and KVM for Linux and Microsoft Hyper-V on Windows. Some of this software virtualization of hardware caused some problems on the real hardware so in 2006 both Intel and AMD introduced processors with special features to improve the performance and behavior of virtual machines running on their real machines. [2]
All of these technologies have similar characteristics. They also have similar benefits and gotchas.
The computer which runs all of the virtual machines (henceforth: VMs) is known as the host. Each of the VM instances is known as a guest. The guests each use one or more (generally very large) files in the host disk space which contains the entire filesystem of the guest. While each guest is running they typically consume a single (again,very large) process on the host.Various methods are used to grant the guest VMs access to the public network both for traffic out of and into the VM. The VM process simulates and entire computer so that for most reasonable purposes it looks and behaves as if it's a real computer.
This is very different from what has become known as multi-tenant computing. This is the traditional model in which each computer has accounts and users can log into their account and share (and compete for) the disk space and CPU resources. They also often have access to the shared security information. The root account is special and it's a truism among sysadmins that if you can log into a computer you can gain root access if you try hard enough.
Sysadmins have to work very hard in multi-tenant computing environments to prevent both malicious and accidental conflicts between their users' processes and resource use. If, instead of an account on the host, you give each user a whole VM, the VM provides a nice (?) clean (?) boundary (?) between each user and the sensitive host OS.
Because VMs are just programs, it is also possible to automate the creation and management of user machines. This is what has made possible modern commercial cloud services. Without virtualization, on-demand public cloud computing would be unworkable.
There are a number of down-sides to using VMs to manage user computing. Because each VM is a separate computer, each one must have an OS installed and then applications installed and configured. This can be mitigated somewhat by creating and using disk images. This is the equivalent of the ancient practice of creating a "gold disk" and cloning it to create new machines. Still each VM must be treated as a complete OS requiring all of the monitoring and maintenance by a qualified system administrator that a bare-metal host needs. It also contains the entire filesystem of a bare-metal server and requires comparable memory from its host.
Docker
For the buzzword savvy Docker is a software containerization mechanism. Explaining what that means takes a bit of doing. It also totally misses the point, because the enabling technology is totally unimportant. What matters is what it allows us to do. But first, for the tech weenies among you....
Docker Tech: Cgroups, Namespaces and Containers
Docker takes advantage of cgroups and kernel namespaces to manipulate the view that a process has of its surroundings. A container is a view of the filesystem and operating system which is a carefully crafted subset of what an ordinary process would see. Processes in a container can be made almost totally unaware of the other processes running on the host. The container presents a limited file system tree which can entirely replace what the process would see if it were not in a container. In some ways this is like a traditional chroot environment but the depth of the control is much more profound.
So far, this does look a lot like Solaris Containers[3], but that's just the tech, there's more.
The Docker Ecosystem
The real significant contribution of Docker is the way in which containers and their contents are defined and then distributed.
It would take a Sysadmin Superman to manually create the content and environmental settings to duplicate what Docker does with a few CLI commands. I know some people who could do it, but frankly it probably wouldn't be worth the time spent even for them. Even I don't really want to get that far into the mechanics (though I could be convinced if there's interest). What you can do with it though is pretty impressive.
Note: Other people describe better than I could how to install Docker and prepare it for use. Go there, do that, come back.
Hint: on Fedora 20+ you can add your use to the "docker" line in /etc/group and avoid a lot of calls to sudo when running the docker command.
To run a Docker container you just need to know the name of the image and any arguments you want to pass to the process inside. The simplest images to run are the ubuntu and fedora images:
docker run fedora /bin/echo "Hello World" Unable to find image 'fedora' locally Pulling repository fedora 88b42ffd1f7c: Download complete 511136ea3c5a: Download complete c69cab00d6ef: Download complete Hello World
Now honestly, short of a Java app that's probably the heaviest weight "Hello World" you've ever done. What happened was, your local docker system looked for a container image named "fedora" and didn't find one. So it went to the official Docker registry at docker.io and looked for one there. It found it, downloaded it and then started the container and ran the shell command inside, returning the STDOUT to your console.
Now look at those three lines following the "Pulling repository" output from the docker run command.
A docker "image" is a fiction. Nearly all images are composed of a number of layers. The base layer or base image usually provides the minimal OS filesystem content, libraries and such. Then layers are added for application packages or configuration information. Each layer is stored as a tarball with the contents and a little bit of metadata which indicates, among other things, the list of layers below it.. Each layer is given an ID based on a hash of the tarball so that each can be uniquely identified. When an "image" is stored on the Docker registry, it is given a name and possibly a label so that it can be retrieved on demand.
In this case Docker downloaded three image layers and then composed them to make the fedora image and then ran the container and executed /bin/echo inside it.
You can view the containers that are or have been run on your system with docker ps.
docker ps -l CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 612bc60ede7a fedora:20 /bin/echo 'hello wor 7 minutes ago Exited (0) 7 minutes ago naughty_pike
You output will very likely be wrapped around unless you have a very wide terminal screen open. The -l switch tells docker only to print information about the last container created.
docker run -it fedora /bin/sh sh-4.2# ls bin etc lib lost+found mnt proc run srv tmp var dev home lib64 media opt root sbin sys usr sh-4.2# ps -ef PID TTY TIME CMD 1 ? 00:00:00 sh 8 ? 00:00:00 ps sh-4.2# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/docker-8:4-2758071-97e6230110ded813bff36c0a9a397d74d89af18718ea897712a43312f8a56805 10190136 429260 9220204 5% \ tmpfs 24725556 0 24725556 0% /dev shm 65536 0 65536 0% /dev/shm /dev/sda4 132492664 21656752 104082428 18% /etc/hosts tmpfs 24725556 0 24725556 0% /proc/kcore sh-4.2# exit
That's three simple commands inside the container. The file system at / seems to be fairly ordinary for a complete (though minimal) operating system. It shows that there appear to be only two processes running in the container, though, and the mounted filesystems are a much smaller set than you would expect.
Now that you have this base image, you can use it to create new images by adding layers of your own. You can also register with docker.io so that you can push the resulting images back out and make them available for others to use.
These are the two aspects of Docker that make it truly significant.
From Software Packaging to Application Packaging
Tarballs to RPMs (and Debs)
Back in the old days we used to pass around software using FTP and tarballs. We built it ourselves with a compiler.compress, gzip, configure and make made it lots faster but not easier. At least for me, Solaris introduced software packages, bundles of pre-compiled software which included dependency information so that you could just ask for LaTeX and you'd get all of the stuff you needed for it to work without having to either rebuild it or chase down all the broken loose ends.
Now, many people have problems with package management systems. Some people have favorites or pets, but I can tell you from first hand experience, I don't care which one I have, but I don't want not to have one. (yes, I hear you Gentoo, no thanks)
For a long time software binary packages were the only way to deliver software to an OS. You still had to install and configure the OS. If you could craft the perfect configuration and you had the right disks you could clone your working OS onto a new disk and have a perfect copy. Then you had to tweak the host and network configurations, but that was much less trouble than a complete re-install.
Automated OS Installation
Network boot mechanisms like PXE and software installation tools, Jumpstart, Kickstart/Anaconda, AutoYAST and others made the Golden Image go away. They let you define the system configuration and then would automate the installation and configuration process for you*. You no longer had to worry about cloning and you didn't have to do a bunch of archaeology on your golden disk when it was out of date and you needed to make a new one. All of your choices were encapsulated in your OS config files. You could read them, tweak them and run it again.
* yes, I didn't mention Configuration Management, but that's really an extension of the boot/install process in this case, not a fundamentally different thing.
In either case though, if you wanted to run two applications on the same host, the possibility existed that they would collide or interfere with each other in some way. Each application also presented a potential security risk to the others. If you crack the host using one app you could fairly surely gain access to everything else on the host. Even inadvertent interactions could cause problems that would be difficult to diagnose and harder to mitigate.
Virtual Disks and the Rebirth of the Clones
With the advent of virtual machines, the clone was back, but now it was called a disk image. You could just copy the disk image to a host and boot it in a VM. If you want more you make copies and tweak them after boot time.
So now we had two different delivery mechanisms: Packages for software to be installed (either on bare metal or in a VM) and disk images for completed installations to be run in a VM. That is: unconfigured application software or fully configured operating systems.
You can isolate applications on one host by placing them into different VMs. But this means you have to configure not one, but three operating systems to build an application that requires two services. That's three ways to get reliability and security wrong. Three distinct moving parts that require a qualified sysadmin to manage them and at least two things which the Developer/Operators will need to access to make the services work.
Docker offers something new. It offers the possibility of distributing just the application. *
* Yeah, there's more overhead than that, but nearly a complete VM and layers can be shared.
Containerization: Application Level Software Delivery
Docker offers the possibility of delivering software in units somewhere between the binary package and the disk image. Docker containers have isolation characteristics similar to apps running in VMs without the overhead of a complete running kernel in memory and without all of the auxiliary services that a complete OS requires.
Docker also offers the capability for developers of reasonable skill to create and customize the application images and then to compose them into complex services which can then be run on a single host, or distributed across many.
The docker registry presents a well-known central location for developers to push their images and name them so that consumers can find them, download them and use them without additional interaction. Because the application has been tested in the container, the developer can be sure that she's identified all of the configuration information that might need to be passed in and out. She can explicitly document that, removing many opportunities for misconfiguration or adverse interactions between services on the same host.
It's the dawning of a new day.
If only it were that easy.
Here There Be Dragons
When a new day dawns on an unfamiliar landscape it slowly reveals a new vista to the eye. If you're in a high place you might see far off, but nearer things could be hidden under the canopy of trees or behind a fold in the land, so that when you actually step down and begin exploring you encounter surprises.
When ever a new technology appears people tend to try to use it the same way they're used to using their older tools. It generally takes a while to figure out the best way to use a new tool and to come to terms with its differentness. There's often a lot of exploring and a fair number of false starts and retraced steps before the real best uses settle out.
Docker does have some youthful shortcomings.
Docker is marvelously good at pulling images from a specific repository (known to the world as the Docker.io Registry) and running them on a specific host to which you are logged on. It's also good at pushing new images to the docker registry. These are both very localized point-to-point transactions.
Docker has no awareness of anything other than the host it is running on and the docker registry. It's not aware of other docker hosts nearby. It's not aware of alternate registries. It's not even aware of the resources in containers on the same host that container's might want to share.
The only way to manage a specific container is to log onto its host and run the docker command to examine and manipulate it.
The first thing anyone wants to do when they create a container of any kind is to punch holes in it. What good is a container where you can't reach the contents? Sometimes people want to see in. Other times people want to insert things that weren't there in the first place. And they want to connect pipes between the containers, and from the containers to the outside world.
Docker does have ways of exposing specific network ports from a container to the host or to the host's external network interfaces.
It can import a part of the host filesystem into a container. It also has ways to share storage between two containers on the same host. What it doesn't have is a way to identify and use storage which can be shared between hosts. If you want to have a cluster of docker hosts where the containers can share storage, this is a problem.
It also doesn't have a means to get secret information from... well anywhere... safely from its hidey hole into the container. Since it's trivial for anyone to push an image to the public registry, it's really important not to put secret information into any image, even one that's going to be pushed to a private registry.
As noted, Docker does what it does really well. The developers have been very careful not to try to over reach, and I agree with most of their decisions. The issues I listed above are not flaws in Docker, they are mostly tasks that are outside Docker's scope. This keeps the Docker development drive focused on the problems they are trying to solve so they can do it well.
To use Docker on anything but a small scale you need something else. Something that is aware of clusters of container hosts, the resources available to each host and how to bind those resources to new containers regardless of which host ends up holding the container. Something that is capable of describing complex multi-container applications which can be spread across the hosts in a cluster and yet be properly and securely connected.
Read on.
Kubernetes
Who might want to run vast numbers of containerized applications spread over multiple enormous host clusters without regard to network topology or physical geography? Who else? Google.
Kubernetes is Google's response to the problem of managing Docker containers on a scale larger than a couple of manually configured hosts. It, like Docker is a young project and there are an awful lot of TBDs, but there's a working core and a lot of active development. Google and the other partners that have joined the Kubernetes effort have very strong motivation to make this work.
Kubernetes is made up of two service processes that run on each Docker host (in addition to the dockerd). The etcd binds the hosts into a cluster and distributes the configuration information. The kubelet daemon is the active agent on each container host which responds to requests to create, monitor and destroy containers. In Kubernetes parlance, a container host is known as a minion.
The etcd service is taken from CoreOS which is an attempt at application level software packaging and system management that predates Docker. CoreOS seems to be adopting Docker as its container format.
There is one other service process, the Kubernetes app-service which acts as the head node for the cluster. The app-service accepts commands from users and forwards them to the minions as needed. Any host running the Kubernetes app-server process is known as a master.
Clients communicate with the masters using the kubecfg command.
A little more terminology is in order.
As noted, container hosts are known as minions. Sometimes several containers must be run on the same minion so that they can share local resources. Kubernetes introduces the concept of a pod of containers to represent a set of containers that must run on the same host. You can't access individual containers within a pod at the moment (there are lots more caveats like this. It is a REALLY young project).
Installing Kubernetes is a bit more intense than Docker. Both Docker and Kubernetes are written in Go. Docker is mature enough that it is available as binary packages for both RPM and DEB packaged Linux distributions. (see your local package manager for docker-io and it's dependencies.)
The simplest way to get Kubernetes right now is to run it in VirtualBox VMs managed by Vagrant. I recommend the Kubernetes Getting Started Guide for Vagrant . There's a bit of assembly required.
Hint: once it's built, I create an alias to the cluster/kubecfg.sh so I don't have to put it in my path or type it out every time.
I'm not going to show very much about Kubernetes yet. It doesn't really make sense to run any interactive containers like the "Hello World" or the Fedora 20 shell using Kubernetes. It's really for running persistent services. I'll get into it deeply in a coming post. For now I'll just walk through the Vagrant startup and simple queries of the test cluster.
At this point there are only three interesting commands. They show the set of minions in the cluster, the running pods and the services that are defined. The last two aren't very interesting because there aren't any pods or services.
Kubernetes is Google's response to the problem of managing Docker containers on a scale larger than a couple of manually configured hosts. It, like Docker is a young project and there are an awful lot of TBDs, but there's a working core and a lot of active development. Google and the other partners that have joined the Kubernetes effort have very strong motivation to make this work.
Kubernetes is made up of two service processes that run on each Docker host (in addition to the dockerd). The etcd binds the hosts into a cluster and distributes the configuration information. The kubelet daemon is the active agent on each container host which responds to requests to create, monitor and destroy containers. In Kubernetes parlance, a container host is known as a minion.
The etcd service is taken from CoreOS which is an attempt at application level software packaging and system management that predates Docker. CoreOS seems to be adopting Docker as its container format.
There is one other service process, the Kubernetes app-service which acts as the head node for the cluster. The app-service accepts commands from users and forwards them to the minions as needed. Any host running the Kubernetes app-server process is known as a master.
Clients communicate with the masters using the kubecfg command.
A little more terminology is in order.
As noted, container hosts are known as minions. Sometimes several containers must be run on the same minion so that they can share local resources. Kubernetes introduces the concept of a pod of containers to represent a set of containers that must run on the same host. You can't access individual containers within a pod at the moment (there are lots more caveats like this. It is a REALLY young project).
Installing Kubernetes is a bit more intense than Docker. Both Docker and Kubernetes are written in Go. Docker is mature enough that it is available as binary packages for both RPM and DEB packaged Linux distributions. (see your local package manager for docker-io and it's dependencies.)
The simplest way to get Kubernetes right now is to run it in VirtualBox VMs managed by Vagrant. I recommend the Kubernetes Getting Started Guide for Vagrant . There's a bit of assembly required.
Hint: once it's built, I create an alias to the cluster/kubecfg.sh so I don't have to put it in my path or type it out every time.
I'm not going to show very much about Kubernetes yet. It doesn't really make sense to run any interactive containers like the "Hello World" or the Fedora 20 shell using Kubernetes. It's really for running persistent services. I'll get into it deeply in a coming post. For now I'll just walk through the Vagrant startup and simple queries of the test cluster.
$ vagrant up Bringing machine 'master' up with 'virtualbox' provider... Bringing machine 'minion-1' up with 'virtualbox' provider... Bringing machine 'minion-2' up with 'virtualbox' provider... Bringing machine 'minion-3' up with 'virtualbox' provider... ==> master: Importing base box 'fedora20'... ... master: ==> master: Summary ==> master: ------------- ==> master: Succeeded: 44 ==> master: Failed: 0 ==> master: ------------- ==> master: Total: 44 ==> master: ==> minion-1: Importing base box 'fedora20'... Progress: 90% ... ==> minion-3: Complete! ==> minion-3: * INFO: Running install_fedora_stable_post() ==> minion-3: disabled ==> minion-3: ln -s '/usr/lib/systemd/system/salt-minion.service' '/etc/systemd/system/multi-user.target.wants/salt-minion.service' ==> minion-3: INFO: Running install_fedora_check_services() ==> minion-3: INFO: Running install_fedora_restart_daemons() ==> minion-3: * INFO: Salt installed!
At this point there are only three interesting commands. They show the set of minions in the cluster, the running pods and the services that are defined. The last two aren't very interesting because there aren't any pods or services.
$ cluster/kubecfg.sh list minions minions ---------- 10.245.2.2 10.245.2.3 10.245.2.4 $ cluster/kubecfg.sh list pods Name Image(s) Host Labels ---------- ---------- ---------- ---------- $ cluster/kubecfg.sh list services Name Labels Selector Port ---------- ---------- ---------- ----------
We know about minions and pods. In Kubernetes a service is actually a port proxy for a TCP port. This allows kubernetes to place service containers arbitrarily while still allowing other containers to connect to them by a well known IP address and port. Containers that wish to accept traffic for that port will use the selector value to indicate that. They service will then forward traffic to those containers.
Right now Kubernetes accepts requests and prints reports in structured data formats, JSON or YAML. To create a new pod or service, you describe the new object using one of these data formats and then submit the description with a "create" command.
Summary
I think software containers in general and Docker in particular have a very significant future. I'm not a big bandwagon person but I think this one is going to matter.
Docker's going to need some more work itself and it's going to need a lot of infrastructure around it to make it suitable for the enterprise and for public cloud use. Kubernetes is one piece that will make using Docker on a large scale possible.
See you soon.
References
- [1] VMWare - https://en.wikipedia.org/wiki/Vmware#History
- [2] X86 Hardware Virtualization - https://en.wikipedia.org/wiki/X86_virtualization
- [2] Solaris Containers - https://en.wikipedia.org/wiki/Solaris_Containers
I'm really tempted to remove much of the historical exposition. I'm afraid that it is really superflous and that it will discourage people from reading all the way through.
ReplyDeleteThanks for writing this article, it's a good primer and with that, the history lesson should stay. It seems every conversation about emerging tech always has a history component and it helps bridge the gap. Bare metal to virt to cloud to containers, everyone has their own personal entry point. Identifying this point helps get new adopters comfortable and more willing to try. I know that's a major oversimplification of the tech, but just how I feel about leaving that part in :) Cheers
DeleteThanks for the feedback! I was worried that the length might put people off. If the context helps you stay, I'm happy.
DeleteThis is great, Mark. Thanks! If you remove the history, please consider moving it to another blog post. I found it a very succinct description of how we got to where we are, and I have recommended it to a few folks to read. I can't wait for the Pulp post. I have been toying with a puppetmaster/puppetdb/foreman/etc. containerized setup for a while, and it is a little daunting.
ReplyDeleteI think "daunting" describes many of the new things coming down the pike. Part of the reason I write what I do is that the common urge is to frost over the pointy bits and hope no one bites down hard.
DeleteThanks for the comment. I get on a roll and never know if what ends up in print is useful. You folks are the nearest thing I have to an editor, so all comments are welcome.
I'm really glad some of this is useful.
The "pulp post" is more likely going to be a series. There are a number of moving parts and a lot of aspects to be addressed.
Great post. I especially liked the "historical exposition".
ReplyDeleteOn package management:
ReplyDelete"Now, many people have problems with package management systems. Some people have favorites or pets, but I can tell you from first hand experience, I don't care which one I have, but I don't want not to have one. (yes, I hear you Gentoo, no thanks)"
I have never agreed with you more on any topic :-)
This comment has been removed by a blog administrator.
ReplyDelete