Under The Hood of Cloud Computing

A beginner's perspective on OpenStack Designate

2016-10-25T10:59:00.001-07:00

I'm at the OpenStack Summit this week in Barcelona. Beautiful place, but a conference center is a conference center.

My first session today was an introduction and discussion of Designate, the OpenStack DNS control module.

I've been working with OpenStack for VM instances and with OpenShift to run container services in the cloud. One major issues that always gets back burnered in discussions is DNS. I refer to this as publication, an unusual term but I think the best one to describe a critical aspect of IaaS and PaaS cloud services.

The point of these services is self-serve computing resources, most often services you can offer to others. If you have no way of telling others where to find your services... they cant. DNS is the way you tell people where to find your stuff.

Historically the DNS service has been managed by an IT "priesthood" who are rightfully protective. DNS is the first and most critical service on any modern network. It's largely invisible and it works so well that most sysadmins don't actually understand how it works. DNS is one of the last services to fall to the self-service mind-set of cloud computing and that's with good reason.

I was under the misconception that Designate would be solely a dynamic DNS service that would be used to publish new instances or containers within the service. I also thought perhaps it had its own front end to respond to queries. It quickly became clear that Designate is not very useful without those external front-line services.

Listening to the talk it became clear that the Designate developers also see this conservatism as a barrier to adoption. A significant portion of the talk was dedicated to creating roll-out plans that build confidence slowly, absorbing more and more of the wild barnyard of existing services.

Designate seems to be more of a control plane and database for DNS services than an actual front-line server responding to queries. You continue to run BIND or Active Directory/DNS or Infoblox to respond to queries, but the database is stored in the OpenStack service (with a back end DB?) and the database propagates to the caching or front end DNS services.

This leads to the idea of Designate eventually taking over control of all of the DNS services in an enterprise. It has the capability to define roles for users, allowing fine grained control of what actions user can take, while offering for the first time a true kiosk self-service IP naming mechanism.

I know how I plan to use Designate in my OpenShift and OpenStack service deployments. It appears I may still need to create backing DNS servers, but I'll at least get WebUI and API change management for the masses. I've used nsupdate to create dynamic DNS zones before, but it always seemed to scare other people off. With Designate I'm going to be able to deploy both my new services and containers within them and publish them with the short turn-around a cloud service demands.

OpenStack Designate

Storage Concepts in Docker: Network and Cloud Storage.

2014-10-20T17:23:00.000-07:00

This is the third and final post on storage in Docker. It's also going to be the most abstract, as most of it now is still wishful.

Shared Storage
Persistent (local) storage
Network and Cloud Storage (this post)

The previous two posts dealt with shared internal storage and persistent host storage in Docker. These two mechanisms allow you to share storage on a single host. While this has its uses, very quickly people find that they need more than local storage

Types of Storage in Containers

Lots of people are talking about storage in Docker containers. Not many are careful to qualify what they mean by that. Some of the conversation is getting confused because different people have different goals for storage in containers.

Docker Internal Storage

This is the simplest form of storage in Docker. Each container has its own space on the host. This is inside the container and it is temporary, being created when the container is instantiated and removed some time after the container is terminated. When two containers reside on the same host they can share this docker-internal storage.

Host Storage

Containers can be configured to use host storage. The space must be allocated and configured on the host so that the processes within the containers will have the necessary permissions to read and write to the host storage. Again, containers on the same host can share storage.

Network Storage

Or "Network Attached Storage" (NAS) in which I slovenly include Storage Area Networks (SAN).
I'm also including modern storage services like Gluster and Ceph. For container purposes these are the same thing: Storage which is not directly attached via the SCSI or SATA bus, but rather over an IP network but which, once mounted appears to the host as a block device.

If you are running your minions in an environment where you can configure NAS universally then you may be able to use network storage within your Kubernetes cluster.

Remember that Docker runs as root on each minion. You may find that there are issues related to differences in the user database between the containers, minions and storage. Until the cgroup user namespace work is finished and integrated with Docker, unifying UID/GID maps will be a problem that requires attention when building containers and deploying them.

Cloud Storage

Cloud storage is.. well not the other kinds. It's generally offered in a "storage as a service" model. Most people think of Amazon AWS storage (EBS and S3) but Google is growing its cloud storage and OpenStack offers the possibility of creating on-premise cloud storage services as well.

Cloud storage generally takes two forms. The first is good old-fashioned block storage. The other is newer and is known as object storage. They have different behaviors and use characteristics.

Block Storage

Once it is attached to a host, cloud block storage is indistinguishable from direct attached storage. You can use disk utilities to partition it and create filesystems. You can mount it so that the filesystem appears within the host file tree.

Block storage requires very low latency. This means that it is generally limited to relatively local networks. It works fine within the infrastructure of a cloud service such as AWS or OpenStack, but running block storage over wide area networks is often difficult and prone to failure.

Block storage is attached to the host and then the docker VOLUME mechanism is used to import the storage tree into one or more containers. If the storage is mounted automatically and uniformly on every minion (and that information is public) then it is possible to use block storage in clusters of container hosts.

Object Storage

Object storage is a relatively new idea. For files with a long life that do not change often and can be retrieved as a unit object storage is often a good They're also good to use as a repository configuration information which is too large or sensitive to be placed in an environment variable or CLI argument.

OpenStack Cinder, AWS S3 and Google Cloud Storage are examples of open source and commercial object stores.

The usage characteristics of object storage make it so that latency is not the kind of issue that it is with block storage.

One other characteristic of object storage makes it really suited to use in containers. Object storage is usually accessed by retrieval over HTTP using a RESTful protocol. This means that the container host does not need to be involved in accessing the contents. So long as the container has the software and the access information for the storage processes within the container can retrieve it. All that is required is that the container is able to reach the storage service through the host network interface(s). This makes object storage a strong choice for container storage where ever the other characteristics are acceptable.

Storage and Kubernetes

Pretty much every application will need storage in some form. To build large scale containerized applications it will be essential for Kubernetes to make it possible for the containers to access and share persistent storage. The form that the storage takes will depend on the character of the application and the environment of the cluster.

With all of the forms of NAS (remember, I'm being slovenly) the host is involved in accessing and mounting the storage so that it appears to Docker as if it is normal host storage. This means that one of three conditions must be met on the host:

All of the available storage is mounted on all minions before any containers start
The host is configured to automount the storage on the first attempt to access a path
This host is able to accept and act on mount requests from Kubernetes

This third also requires modifications to Kubernetes so that the user can specify the path to the required storage and provide any access/authentication information that will be required by the host.

For Cloud block storage the only option is #3 from above. Google has added a mechanism to mount Google Cloud Engine Persistent Disk volumes into Kubernetes clusters. The current mechanism (as of 20-Oct-2014) is hard coded. The developers understand that they will need a plugin mechanism to allow adding AWS EBS, OpenStack Cinder and others. I don't think work on any other cloud storage services has begun yet.

Object storage is the shining light. While it has limited use cases, those cases are really common and really important. Object storage access can be built into the image and the only thing the Kubernetes cluster must provide is network access to the object store service.

Summary

Generalized shared and cloud storage within Kubernetes clusters (or any cluster of container hosts) is, at this time, an unsolved problem. Everyone knows it is a top priority and everyone working on the idea of clustered container hosts is thinking about it and experimenting with solutions. I don't think it will be long before some solutions become available and I'm confident that there will be working solutions within the timeframe of ~~*mumble*~~.

For Kubernetes, there is an open issue discussing persistent storage options and how to design them into the service, both on the back end and the front end (how does one tell Kubernetes how to access storage for containers?)

I'm going to be playing with a few of the possibilities because I'm going to need them. Until they are available, I can create a Pulp service in Kubernetes, but I can't make it persistent. Since the startup cost of creating an RPM mirror is huge, it's not much use except as a demonstrator until persistent storage is available.

References

Network Attached Storage

OpenStack Cloud Storage

Cinder - block storage
Swift - object storage

AWS Cloud Storage

EBS - block storage
S3 - object storage

Google Cloud Storage

Google Cloud Engine Persistent Disks - block storage
Google Storage - object storage

Storage Concepts in Docker: Persistent Storage

2014-10-10T13:04:00.002-07:00

This is the second of three posts on storage management in Docker:

Shared Storage and the VOLUME directive
Persistent Storage: the --volume CLI option (this post)
Storage in Kubernetes

This is a side trip on my way to creating a containerized Pulp content mirroring service using Docker and Kuberentes. The storage portion is important (and confusing) enough to warrant special attention.

Persistent Storage

In the previous post I talked about the mechanisms that Docker offers for sharing storage between containers. This kind of storage is limited to containers on the same host and it does not survive after the last connected container is destroyed.

If you're running a long-lived service like a database or a file repository you're going to need storage which exists outside the container space and has a life span longer than the container which uses it.

The Dockerfile VOLUME directive is the mechanism to define where external storage will be mounted inside a container.

NOTE: I'm only discussing single host local storage. The issues around network storage are still wide open and beyond the scope of a single post.

Container Views and Context

Containers work by providing two different views of the resources on the host. Outside the container, the OS can see everything, but the processes inside are fooled into seeing only what the container writer wants them to see. The problem is not just what they see though, but how they see it.

There are a number of resources which define the view of the OS. The most significant ones for file storage are the user and group databases (in /etc/passwd and /etc/group). The OS uses numeric UID and GID values to identify users and decide how to apply permissions. These numeric values are mapped to names using the passwd and group files. The host and containers each have their own copies of these files and the entries in these files will almost certainly differ between the host and the container. The ownership and permissions on the external file tree must be set to match the expectations of the processes which will run in the container.

SELinux also controls access to file resources. The SELinux labels on the file tree on the host must be set so that system policy will allow the processes inside the container to operate on them as needed.

In this post most of my effort will be spent looking at the view from inside and adjusting the settings on the file tree outside to allow the container processes to do their work.

Dockerfile VOLUME directive Redux

As noted in the previous post, the VOLUME directive defines a boundary in the filesystem within a container. That boundary can be used as a handle to export a portion of the container file tree. It can also be used to mark a place to mount an external filesystem for import to the container.

When used with the Docker CLI --volumes-from option it is possible to create containers that share storage from one container to any number of others. The mount points defined in the VOLUME directives are mapped one-to-one from the source container to the destinations.

Importing Storage: The --volume CLI option

When starting a Docker container I can cause Docker to map any external path to an internal path using the --volume (or -v) option. This option takes two paths separated by a colon (:). The first path is the host file or directory to be imported into the container. The second is the mount point within the container.

docker run --volume <host path>:<container path> ...

Example: MongoDB persistent data

Say I want to run a database on my host, but I don't want to have to install the DB software into the system. Docker makes it possible for me to run my database in a container and not have to worry about which version the OS has installed. However, I do want the data to persist if I shut the container down and restart it, whether for maintenance or to upgrade the container.

The Dockerfile for my MongoDB container looks like this:

Lines 1 and 2 are the boilerplate you've seen to define the base image and the maintainer information.
Line 7 installs the MongoDB server package
Lines 9 - 11 create the directory for the database storage and ensures that it will not be pruned by placing a hidden file named .keep inside. They also set the permissions for that directory in the context of the container view to allow the mongodb user to write the directory.
Line 15 specifies the location of the imported volume.
Line 17 opens the firewall for inbound connections to the MongoDB
Lines 19 and 20 set the user that will run the primary process and the location where it will start.
Lines 22 and 23 define the binary to execute when the container starts and the default arguments

To run a shell in the container, use the --entrypoint option. Arguments to the docker run command will be passed directly to the mongod process, overriding the defaults.

What works?

I know that this image works when I just use the default internal Docker storage. I know that file ownership and permissions will be an issue, so the first thing to do is to look inside a working container and see what the ownership and permissions look like.

docker run -it --name mongodb --entrypoint /bin/sh markllama/mongodb

sh-4.2$ id
uid=184(mongodb) gid=998(mongodb) groups=998(mongodb)

sh-4.2$ ls -ldZ /var/lib/mongodb 
drwxr-xr-x. mongodb mongodb system_u:object_r:docker_var_lib_t:s0 /var/lib/mongodb

Now I know the UID and GID which the container process uses (UID = 184, GID = 998). I'll have to make sure that this user/group can write to the host directory which I map into the container.

I know that the default permissions are 755 (rwx, r-x, r-x), which is fairly common.

I also see that the directory has a special SELinux label: docker_var_lib_t.

Together, the directory ownership/permissions and the SELinux policy could prevent access by the container process to the host files. Both are going to require root access on the host to fix.

Interesting Note: When inside the container, attempts to determine the process SELinux context are met with a message indicating that SELinux is not enabled. Apparently, from the view point inside the container, it isn't.

Preparing the Host Directory

I could just go ahead and create a directory with the right ownership, permissions and label and attach it to my MongoDB container and say "Voila!". What fun would that be? Instead I'm going to create the target directory and mount it into a container and try writing to it from inside. When that fails I'll update the ownership, permissionad and label, (from outside) each time checking the view and capabilities (from inside) to see how it changes.

I am going to disable SELinux temporarily so I can isolate the file ownership/permissions from the SELinux labeling.

sudo setenforce 0
mkdir ~/mongodb
ls -ldZ ~/mongodb
drwxrwxr-x. mlamouri mlamouri unconfined_u:object_r:user_home_t:s0 /home/mlamouri/mongodb

I'm also going to create a file inside the directory (from the view of the host) so that I can verify (from the view in the container) that I've mounted the correct directory.

touch ~/mongodb/from_outside
ls -lZ ~/mongodb/from_outside 
-rw-rw-r--. mlamouri mlamouri unconfined_u:object_r:user_home_t:s0 /home/mlamouri/mongodb/from_outside

Note the default ownership and permissions on that file.

Now I'm ready to try mounting that into the mongodb container (knowing that write access will fail)

Starting the Container with a Volume Mounted

I want to be able to examine the runtime environment inside the container before I let it fly with a mongod process. I'll set the entrypoint on the CLI to run a shell instead and use the -it options so it runs interactively and terminates when I exit the shell.

The external path to the volume is /home/mlamouri/mongodb and the internal path is /var/lib/mongodb.

docker run -it --name mongodb --volume ~/mongodb:/var/lib/mongodb --entrypoint /bin/sh markllama/mongodb

sh-4.2$ id
uid=184(mongodb) gid=998(mongodb) groups=998(mongodb)

sh-4.2$ pwd
/var/lib/mongodb

sh-4.2$ ls
from_outside

sh-4.2$ ls -ld /var/lib/mongodb
drwxrwxr-x. 2 15149 15149 4096 Oct  9 21:04 /var/lib/mongodb

sh-4.2$ touch from_inside
touch: cannot touch 'from_inside': Permission denied

As expected, From inside the container, I can't write the mounted volume. I can read it (with SELinux disabled) because I have te directory permissions open to the world for read and execute. Now I'll change the ownership of the directory from the outside.

Adjusting The Ownership

sudo chown 184:998 ~/mongodb
ls -ld ~/mongodb
drwxrwxr-x. 2 mongodb polkitd 4096 Oct  9 20:46 /home/bos/mlamouri/mongodb

It turns out I have the mongo-server package installed on my host and it has assigned the same UID to the monogodb user as the container has. However, the group for mongodb inside the container corresponds to the polkitd group on the host.

Now I can try writing a file there from the inside again. From the (still running) container shell:

sh-4.2$ ls -l
total 0
-rw-rw-r--. 1 mongodb mongodb 0 Oct  9 20:46 from_outside

sh-4.2$ touch from_inside

sh-4.2$ ls -l
total 0
-rw-r--r--. 1 mongodb mongodb 0 Oct 10 01:55 from_inside
-rw-rw-r--. 1 mongodb mongodb 0 Oct  9 20:46 from_outside

sh-4.2$ ls -Z
-rw-r--r--. mongodb mongodb system_u:object_r:user_home_t:s0 from_inside
-rw-rw-r--. mongodb mongodb unconfined_u:object_r:user_home_t:s0 from_outside

Re-Enabling SELinux (and causing fails again)

There are two access control barriers for files. The Linux file ownership and permissions are one. The second is SELinux and I have to turn it back on. This will break things again until I also set the SELinux label on the directory on the host.

sudo setenforce 1

Now when I try to read the directory inside the container or create a file, the request is rejected with permission denied.

sh-4.2$ ls 
ls: cannot open directory .: Permission denied
sh-4.2$ touch from_inside_with_selinux
touch: cannot touch 'from_inside_with_selinux': Permission denied

Just to refresh, here's the SELinux label for the directory as seen from the host:

ls -dZ mongodb
drwxrwxr-x. mongodb polkitd unconfined_u:object_r:user_home_t:s0 mongodb

SELinux Diversion: What's Happening?

In the end I'm just going to apply the SELinux label which I found on the volume directory when I used Docker internal storage. I'm going to step aside for a second here though and look at how I can find out more about what SELinux is rejecting

When SELinux rejects a request it logs that request. The logs go into /var/log/audit/audit.log. These are definitely cryptic and can be daunting but they're not entirely inscrutable.

First I can use what I know to filter out things I don't care about. I know I want AVC messages (AVC is an abbreviation for Access Vector Cache. Yeah. Not useful). These messages are indicated by type=AVC in the logs. Second, I know that I am concerned with attempts to access files labeled user_home_t. These two will help me narrow down the messages I care about.

These are very long lines so you may have to scroll right a bit to see the important parts.

sudo grep type=AVC /var/log/audit/audit.log  | grep user_home_t
type=AVC msg=audit(1412948687.224:8235): avc:  denied  { add_name } for  pid=11135 comm="touch" name="from_inside" scontext=system_u:system_r:svirt_lxc_net_t:s0:c687,c763 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=dir permissive=1
type=AVC msg=audit(1412948687.224:8235): avc:  denied  { create } for  pid=11135 comm="touch" name="from_inside" scontext=system_u:system_r:svirt_lxc_net_t:s0:c687,c763 tcontext=system_u:object_r:user_home_t:s0 tclass=file permissive=1
type=AVC msg=audit(1412948876.731:8257): avc:  denied  { write } for  pid=12800 comm="touch" name="mongodb" dev="sda4" ino=7749584 scontext=system_u:system_r:svirt_lxc_net_t:s0:c687,c763 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=dir permissive=0
type=AVC msg=audit(1412948898.965:8258): avc:  denied  { write } for  pid=11108 comm="sh" name=".bash_history" dev="sda4" ino=7751785 scontext=system_u:system_r:svirt_lxc_net_t:s0:c687,c763 tcontext=system_u:object_r:user_home_t:s0 tclass=file permissive=0
type=AVC msg=audit(1412948898.965:8259): avc:  denied  { append } for  pid=11108 comm="sh" name=".bash_history" dev="sda4" ino=7751785 scontext=system_u:system_r:svirt_lxc_net_t:s0:c687,c763 tcontext=system_u:object_r:user_home_t:s0 tclass=file permissive=0
type=AVC msg=audit(1412948898.965:8260): avc:  denied  { read } for  pid=11108 comm="sh" name=".bash_history" dev="sda4" ino=7751785 scontext=system_u:system_r:svirt_lxc_net_t:s0:c687,c763 tcontext=system_u:object_r:user_home_t:s0 tclass=file permissive=0
type=AVC msg=audit(1412949007.595:8289): avc:  denied  { read } for  pid=14158 comm="sh" name=".bash_history" dev="sda4" ino=7751785 scontext=system_u:system_r:svirt_lxc_net_t:s0:c184,c197 tcontext=system_u:object_r:user_home_t:s0 tclass=file permissive=0
type=AVC msg=audit(1412949674.712:8307): avc:  denied  { write } for  pid=14369 comm="touch" name="mongodb" dev="sda4" ino=7749584 scontext=system_u:system_r:svirt_lxc_net_t:s0:c184,c197 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=dir permissive=0

I found something I hadn't really expected. Every time I try to type a command in the shell within the container, the shell tries to write to the .bash_history file. This Is only an issue when I'm testing the container with a shell. Remember in the Dockerfile I set the WORKDIR directive to the top of the MongoDB data directory. That means when I start the shell in the container, the current working directory is /var/log/mongodb. Which is the directory I'm trying to import. This won't matter when I'm running the daemon properly as there won't be any shell.

The important thing this shows me is the SELinux context of the shell process within the container: system_u:system_r:svirt_lxc_net_t:s0 . (note that I dropped off the MVC context, the "cc87,c763" on the end). That is the process which is being denied access to the working directory.

Given that list of AVCs I can feed them to audit2allow and get a big-hammer policy change to stop the AVCs.

sudo grep type=AVC /var/log/audit/audit.log  | grep user_home_t| audit2allow


#============= svirt_lxc_net_t ==============
allow svirt_lxc_net_t user_home_t:dir { write remove_name add_name };
allow svirt_lxc_net_t user_home_t:file { write read create unlink open append };

This is a nice summary of what is happening and what fails. You could use this output to create a policy module which would allow this activity. DON'T DO IT. It's tempting to use audit2allow to just open things up when SELinux prevents things. Without understanding what your changing and why you risk creating holes you didn't mean to.

Instead I'm going to proceed by assigning a label to the directory tree which indicates what I mean to use it for (content for Docker containers). That is, by labeling the directory to allow Docker to mount and write it, it becomes evident to someone looking at it later what I meant to do.

Labeling the MongoDB directory for use by Docker

The processes running within Docker appear to have the SELinux context system_u:system_r:svirt_lxc_net_t. From the example using the Docker internal storage for /var/lib/mongodb I know that the directory is labled system_u:object_r:docker_var_lib_t:s0. If I apply that label to my working directory, the processes inside the container should be able to write to the directory and its children.

The SELinux tool for updating object (file) labels is chcon (for change context). It works much like chown or chmod. Because I'm changing security labels that I don't own, I need to use sudo to make the change.

sudo chcon -R system_u:object_r:docker_var_lib_t:s0 ~/mongodb

ls -dZ mongodb/
drwxrwxr-x. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 mongodb/

ls -Z mongodb/
-rw-r--r--. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 from_inside
-rw-rw-r--. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 from_outside

getenforce
Enforcing

Now the directory and all its contents have the correct ownership, permissions and SElinux label. SELinux is enforcing. I can try writing from inside the container again.

sh-4.2$ touch from_inside_with_selinux
sh-4.2$ ls -l
total 0
-rw-r--r--. 1 mongodb mongodb 0 Oct 10 13:44 from_inside
-rw-r--r--. 1 mongodb mongodb 0 Oct 10 15:54 from_inside_with_selinux
-rw-rw-r--. 1 mongodb mongodb 0 Oct  9 20:46 from_outside

That's it. Time to try running mongod inside the container.

Running the Mongodb Container

First I shut down and remove my existing mongod container. Then I can start one up for real. I Switch from interactive (-it) to daemon (-d) mode and remove the --entrypoint argument.

sh-4.2$ exit
exit

docker rm mongodb
mongodb

docker run -d --name mongodb --volume ~/mongodb:/var/lib/mongodb markllama/mongodb
9e203806b4f07962202da7e0b870cd567883297748d9fe149948061ff0fa83f0

I should now have a running mongodb container

docker ps
CONTAINER ID        IMAGE                      COMMAND                CREATED             STATUS              PORTS               NAMES
9e203806b4f0       markllama/mongodb:latest   "/usr/bin/mongod --c   34 seconds ago      Up 33 seconds       27017/tcp           mongodb

I can check the container logs to see if the process is running and indicates a good startup.

docker logs mongodb
note: noprealloc may hurt performance in many applications
Fri Oct 10 16:01:25.560 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/var/lib/mongodb 64-bit host=9e203806b4f0
Fri Oct 10 16:01:25.562 [initandlisten] 
Fri Oct 10 16:01:25.562 [initandlisten] ** WARNING: You are running on a NUMA machine.
Fri Oct 10 16:01:25.562 [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
Fri Oct 10 16:01:25.562 [initandlisten] **              numactl --interleave=all mongod [other options]
Fri Oct 10 16:01:25.562 [initandlisten] 
Fri Oct 10 16:01:25.562 [initandlisten] db version v2.4.6
Fri Oct 10 16:01:25.562 [initandlisten] git version: nogitversion
Fri Oct 10 16:01:25.562 [initandlisten] build info: Linux buildvm-12.phx2.fedoraproject.org 3.10.9-200.fc19.x86_64 #1 SMP Wed Aug 21 19:27:58 UTC 2013 x86_64 BOOST_LIB_VERSION=1_54
Fri Oct 10 16:01:25.563 [initandlisten] allocator: tcmalloc
Fri Oct 10 16:01:25.563 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", nohttpinterface: "true", noprealloc: "true", quiet: true, smallfiles: "true" }
Fri Oct 10 16:01:25.636 [initandlisten] journal dir=/var/lib/mongodb/journal
Fri Oct 10 16:01:25.636 [initandlisten] recover : no journal files present, no recovery needed
Fri Oct 10 16:01:27.469 [initandlisten] preallocateIsFaster=true 27.58
Fri Oct 10 16:01:29.329 [initandlisten] preallocateIsFaster=true 28.04

It looks like the daemon is running.

I can use docker inspect to find the assigned IP address for the container. With that I can connect the mongo client to the service and test database access.

docker inspect --format '{{.NetworkSettings.IPAddress}}' mongodb
172.17.0.110

echo show dbs | mongo 172.17.0.110
MongoDB shell version: 2.4.6
connecting to: 172.17.0.110/test
local 0.03125GB
bye

I know the database is running and answering queries. The last check is to look inside the directory I created for the database. It should have the test files I'd created as well as the database and journal files which mongod will create on startup.

ls -lZ ~mongodb
-rw-r--r--. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 from_inside
-rw-r--r--. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 from_inside_with_selinux
-rw-rw-r--. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 from_outside
drwxr-xr-x. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 journal
-rw-------. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 local.0
-rw-------. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 local.ns
-rwxr-xr-x. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 mongod.lock
drwxr-xr-x. mongodb polkitd system_u:object_r:docker_var_lib_t:s0 _tmp

There they are.

Summary

It took a little work to get a Docker container running a system service using persistent host storage for the database files.

I had to get the container running without extra storage first and examine the container to see what it expected. The file ownership, permissions and the SELinux context all affect the ability to write files.

Tweaking for Storage

On the host I had to create a directory with the right characteristics. The UID and GID on the host may not match those inside the container. If the container service creates a user and group they will almost certainly not exist on a generic Docker container host.

The Docker service uses a special set of SELinux contexts and labels to run. Docker runs as root and it does lots of potentially dangerous things. The SELinux policies for Docker are designed to prevent contained processes from escaping, at least through the resources SELinux can control.

Setting the directory ownership and the SELinux context require root access. This isn't a really big deal as Docker also requires root (or at least membership in the docker group) but its another wart. It does mean that the ideal of running service containers in user space is an illusion. Once the directory is set up and running it will require root access to remove it as well. It's probably best not to place it in a user home directory as I did.

Scaling up: Multiple Hosts and Network Storage?

It is possible to run Docker service containers with persistent external storage from the host. This won't scale up to multiple hosts. Kubernetes has no way of making the required changes to the host. It might be possible to use network filesystems like NFS, Gluster or Ceph so long as the user accounts are made consistent.

The other possibility for shared storage is cloud storage. I'll talk about that some in the next post, though it's not ready for Docker and Kubernetes yet.

Pending Features: User Namespaces (SELinux Namespaces?)

The user mapping may be resolved by a pending feature addition to Linux namespaces and Docker: User namespaces. This would allow a UID inside a container to be mapped to a different UID on the host. The same would be true for GIDs. This would allow me to run a container which uses the mongodb UID inside the container but is able to access files owned by my UID on the host. I don't have a timeline for this feature and the developers still raise their eyebrows in alarm when I ask about it, but it is work in progress.

A feature which does not exist to my knowledge is SELinux namespaces. This is the idea that an SELinux label inside a container might be mapped to a different label outside. This would allow the docker_var_lib_dir_t label inside to be mapped to user_home_t outside. I suspect this would break lots of things and open up nasty holes so I don't expect it soon.

Next Up: Network (Cloud) Storage

Next up is some discussion (but not any demonstration at all) of the state of network storage

References

Docker

MongoDB
SELinux

Storage Concepts in Docker: Shared Storage and the VOLUME directive

2014-10-07T07:46:00.002-07:00

In the next few posts I'm going to take a break from the concrete work of creating images for Pulp in Docker. The next step in my project requires some work with storage and it's going to take a bit of time for exploration and then some careful planning. Note that when I get to moving them to Kubernetes I'll have to revisit some of this, as Kubernetes Pods place some constraints (and provide some capabilities) that Docker alone doesn't.

This is going to take at least three posts:

Shared Storage in Docker (this post)
Persistent Storage in Docker
Persistent Storage in Kubernetes

It could take a fourth post for Persistent Storage in Kubernetes, but that would be a fairly short post because the answer right now is "you really can't do that yet". People are working hard to figure out how to get persistent storage into a Kubernetes cluster, but it's not ready yet.

For now I'm going to take them one large bite at a time.

Storage in a Containerized World

The whole point of containers is that they don't leak. Nothing should escape or invade. The storage that is used by each container has a life span only as long as the container itself. Very quickly though one finds that truly closed containers aren't very useful. To make them do real work you have to punch some holes.

The most common holes are network ports, both inbound and out. A daemon in a container listens for connections and serves responses to queries. It may also initiate new outbound queries to gather information to do its job. Network connections are generally point-to-point and ephemeral. New connections are created and dropped all the time. If a connection fails during a transaction, no problem, just create a new connection and resend the message. Sometimes though, what you really need is something that lasts.

Shared and Persistent Storage

Sometimes a process doesn't just want to send messages to other processes. Sometimes it needs to create an artifact and put it someplace that another process can find and use it. In this case network connections aren't really appropriate for trading that information. It needs disk storage. Both processes need access to the same bit of storage. The storage must be shared.

Another primary characteristic of Docker images (and containerized applications in general) is that they are 100% reproducible. This also makes them disposable. If it's trivial to make arbitrary numbers of copies of an image, then there's no problem throwing one away. You just make another.

When you're dealing with shared storage the life span of a container can be a problem too. If the two containers which share the storage both have the same life span then the storage can be "private", shared just between them. When either container dies, they both do and the storage can be reclaimed. If the contents of the storage has a life span longer than the containers, or if they container processes have different life spans then the storage needs to be persistent.

Pulp and Docker Storage

The purpose of the Pulp application is to act as a repository for long-term storage. The payload are files mirrored from remote repositories and offered locally. This can minimize long-haul network traffic and allow for network boundary security (controlled proxies) which might prohibit normal point-to-point connections between a local client and a remote content server.

Two processes work with the payload content directly. The Pulp worker process is responsible for scanning the remote repositories, detecting new content and initiating a sync to the local mirror. The Apache process publishes the local content out to the clients which are the customers for the Pulp service. It consumes the local mirror content that has been provided by the Pulp workers. These two processes must both have access to the same storage to do their jobs.

For demonstration purposes, shared storage is sufficient. The characteristics of shared storage in Docker and Kubernetes is complex enough to start without trying to solve the problem of persistence as well. In fact, persistent storage is still a largely an unsolved problem. This is because local persistent storage isn't very useful as soon as you try to run containers on different hosts. At that point you need a SAN/NAS or some other kind of network storage like OpenStack Cinder or AWS/EBS or Google Cloud Storage.

So, this post is about the care and feeding of shared storage in Docker applications.

Docker Image: the VOLUME directive

The Dockerfile has a number of directives which specify ways to poke holes in containers. The VOLUME directive is used to indicate that a container wants to use external or shared storage.

Dockerfile: VOLUME directive

The diagram above shows the effect of a VOLUME directive when creating a new image. It indicates that this image has two mount points which can be attached to external (to the container) storage.

Here's the complete Dockerfile for the pulp-content image.

Here's where the window metaphor breaks down. The VOLUME directive indicates a node in a file path where an external filesystem may be mounted. It's a dividing line, inside and outside. What happens to the files in the image that would fall on the outside?

Docker places those files into their own filesystem as well. If the container is created without specifying an external volume to mount there, this default filesystem is mounted. The VOLUME directive defines a place where files can be imported or exported.

So what happens if you just start a container with that image, but don't specify an external mount?

Defaulted Volumes

To continue with the flawed metaphor, every window has two sides. The VOLUME directive only specifies the boundary. It says "some filesystem may be provided to mount here". But if I don't provide a file tree to mount there (using the -v option) Docker mounts the file tree that was inside the image when it was built. I can run the pulp-content image with a shell and inspect the contents. I'll look at it both from the inside and the outside.

I'm going to start an interactive pulp-content container with a shell so I can inspect the contents.

docker run -it --name volume-demo markllama/pulp-content /bin/sh
sh-4.2# mount
/dev/mapper/docker-8:4-2758071-77b5c9ba618358600e5b59c3657256d1a748aac1c14e2be3d9c505adddc92ce3 on / type ext4 (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c585,c908",discard,stripe=16,data=ordered)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,context="system_u:object_r:svirt_sandbox_file_t:s0:c585,c908",mode=755)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c585,c908",size=65536k)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c585,c908",gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime,seclabel)
/dev/sda4 on /etc/resolv.conf type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda4 on /etc/hostname type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda4 on /etc/hosts type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda4 on /var/lib/pulp type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda4 on /var/www type ext4 (rw,relatime,seclabel,data=ordered)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,context="system_u:object_r:svirt_sandbox_file_t:s0:c585,c908",mode=755)

So there is a filesystem mounted on those two mount points. But what's in them?

sh-4.2# find /var/www
/var/www
/var/www/pub
/var/www/html
/var/www/cgi-bin

sh-4.2# find /var/lib/pulp
/var/lib/pulp
/var/lib/pulp/published
/var/lib/pulp/published/yum
/var/lib/pulp/published/yum/https
/var/lib/pulp/published/yum/http
/var/lib/pulp/published/puppet
/var/lib/pulp/published/puppet/https
/var/lib/pulp/published/puppet/http
/var/lib/pulp/uploads
/var/lib/pulp/celery
/var/lib/pulp/static
/var/lib/pulp/static/rsa_pub.key

That's what it looks like on the inside. But what's the view from outside? I can find out using docker inspect.

docker inspect --format '{{.Config.Volumes}}' volume-demo
map[/var/lib/pulp:map[] /var/www:map[]]

First I ask what the volume configuration is for the container. That result tells me that I didn't provide any mapping for the two volumes. Next I check what volumes are actually provided.

docker inspect --format '{{.Volumes}}' volume-demo
map[
   /var/lib/pulp:/var/lib/docker/vfs/dir/3a11750bd3c31a8025f0cba8b825e568dafff39638fa1a45a17487df545b0f6a
   /var/www:/var/lib/docker/vfs/dir/0a86bd1085468f04feaeb47cc32cfdb0c05fd10e5c7b470790042107d9c02b70
]

These are the volumes that are actually mounted on the container. I can see that /var/lib/pulp and /var/www have something mounted on them and that the volumes are actually stored in the host filesystem under /var/lib/docker/vfs/dir. Graphically, here's what that looks like:

Default mounts with VOLUME directive

So now I have a container running with some storage that is, in a sense "outside" the container. I need to mount that same storage into another container. This is where the Docker --volumes-from option picks up.

Shared Volumes in Docker

Once a container exists with marked volumes it is possible to mount those volumes into other containers. Docker provides an option which allows all of the volumes from an existing container to be mounted one-to-one into another container.

For this demonstration I'm going to just create another container from the pulp-content image, but this time I'm going to tell it to mount the volumes from the existing container:

docker run -it --name volumes-from-demo --volumes-from volume-demo markllama/pulp-content /bin/sh

If you're following along you can use mount to show the internal mount points, and observe that they match those of the original container. From the outside I can use docker inspect to show that both containers are sharing the same volumes.

docker inspect --format '{{.Volumes}}' volume-demo
map[
 /var/lib/pulp:/var/lib/docker/vfs/dir/3a11750bd3c31a8025f0cba8b825e568dafff39638fa1a45a17487df545b0f6a
 /var/www:/var/lib/docker/vfs/dir/0a86bd1085468f04feaeb47cc32cfdb0c05fd10e5c7b470790042107d9c02b70
]

docker inspect --format '{{.Volumes}}' volumes-from-demo
map[
 /var/lib/pulp:/var/lib/docker/vfs/dir/3a11750bd3c31a8025f0cba8b825e568dafff39638fa1a45a17487df545b0f6a
 /var/www:/var/lib/docker/vfs/dir/0a86bd1085468f04feaeb47cc32cfdb0c05fd10e5c7b470790042107d9c02b70
]

These two containers have the same filesystems mounted on their declared volume mount points.

Shared Storage and Pulp

The next two images I need to create for a Pulp service are going to require shared storage. The Pulp worker process places files in /var/lib/pulp and symlinks them into /var/www to make them available to the web server. The Apache server needs to be able to read both the web repository in /var/www and the Pulp content in /var/lib/pulp so that it can resolve the symlinks and serve the content to clients. I can build the images using the VOLUME directive to create the "windows" I need and then use a content image to hold the files. Both the worker and apache containers will use the --volumes-from directive to mount the storage from the content container.

Here's what that will looks like in Docker:

Pulp Content Storage (Docker)

The content container will be created first. The content image uses the pulp-base as its parent so the file structure, ownership and permissions for the volume content will be initialized correctly. The worker and Apache containers will get their volumes from the content container.

Summary

In this post I learned what Docker does with a VOLUME directive if no external volume is provided for the container at runtime. I also learned how to share storage between two or more running containers.

In the next post I'll show how to mount (persistent) host storage into a container.

In the final post before going back to building a Pulp service I'll demonstrate how to create a pod with storage shared between the containers and if there's space, how to mount host storage into the pod as well.

References

Docker

Dockerfile VOLUME directive
Docker CLI run command (see --volume and --volumes-from options)
Docker CLI inspect command

Pulp

Docker: Re-using a custom base image - Pulp Resource Manager image.

2014-09-29T17:51:00.000-07:00

Here's the next step in the ongoing saga of containerizing the Pulp service in Docker for use with Kubernetes.

In the last post I spent a bunch of effort creating a base image for a set of Pulp service components. Then I only implemented one, the Celery beat server. In this (hopefully much shorter) post I'll create a second image from that base. This one is going to be the Pulp Resource Manager service.

A couple of recap pieces to start.

The Pulp service is made up of several independent processes that communicate using AMQP messaging (through a QPID message bus) and by access to a MongoDB database. The QPID services and the MongoDB services are entirely independent of the Pulp service processes and communicate only over TCP/IP. There are also a couple of processes that are tightly coupled, both requiring access to shared data. These will come later. What's left is the Pulp Resource Manager process and the Pulp Admin REST service.

I'm going to take these in two separate posts to make them a bit more digestible than the last one was.

Extending the Base - Again

As in the case with the Pulp Beat service, the Resource Manager process is a singleton. Each pulp service has exactly one. (Discussions of HA and SPOF will be held for later). The Resource Manager process communicates with the other components solely through the QPID message broker and the MongoDB over TCP. There is no need for persistent storage.

In fact the only difference between the Beat service and the Resource Manager is the invocation of the Celery service. This means that the only difference between the Docker specifications is the name and two sections of the run.sh file.

The Dockerfile is in fact identical in content to that for the Pulp Beat container:

Now to the run.sh script.

The first difference in the run.sh is simple. The Beat service is used to initialize the database. The Resource Manager doesn't have to do that.

The second is also pretty simple: The exec line at the end starts the Celery service use the resource_manager entry point instead of the beat service.

I do have one other note to myself. It appears that the wait_for_database() function will be needed in every derivative of the pulp-base image. I should probably refactor that but I'm not going to do it yet.

One Image or Many?

So, if I hadn't been using shell functions, this really would come down to two lines different between the two. Does it really make sense to create two images? It is possible to pass a mode argument to the container on startup. Wouldn't that be simpler?

It actually might be. It is possible to use the same image and pass an argument. The example from which mine are derived used that method.

I have three reasons for using separate images. One is for teaching and the other two are development choices. Since one of my goals is to show how to create custom base images and then use derived images to create customizations I used this opportunity to show that.

The deeper reasons have to do with human nature and the software development life cycle.

People expect to be able to compose service by grabbing images off the shelf and plugging them together. Adding modal switches to the images means that they are not strongly differentiated by function. You can't just say "Oh, I need 5 functional parts, let me check the bins". You have to know more about each image than just how it connects to others. You have to know that this particular image can take more than one role within the service. I'd like to avoid that if I can. Creating images with so little difference feels like inefficiency, but only when viewed from the standpoint of the person producing the images. To the consumer it maintains the usage paradigm. Breaks in the paradigm can lead to mistakes or confusion.

The other reason to use distinct images has to do with what I expect and hope will be a change in the habits of software developers.

Developers of complex services currently feel a tension, when they are creating and packaging their software, between putting all of the code, binaries and configuration templates into a single package. You only create a new package if the function is strongly different. This makes it simpler to install the software and configure it once. On traditional systems where all of the process components would be running on the same host there was no good reason to separate the code for distinct processes based on their function. There are clear cases where the separation does happen in host software packaging, notably in client and server software. These clearly will run on different hosts. Other cases though are not clear cut.

The case of the Pulp service is in a gray area. Much of the code is common to all four Celery based components (beat, resource manager, worker and admin REST service). It is likely possible to refactor the unique code into separate packages for the components, though the value is questionable at this point.

I want to create distinct images because it's not very expensive, and it allows for easy refactoring should the Pulp packaging ever be decomposed to match the actual service components. Any changes would happen when the new images are built, but the consumer would not need to see any change. This is a consideration to keep in mind when ever I create a new service with different components from the same service RPM.

Running and Verifying the Resource Manager Image

The Pulp Resource Manager process makes the same connections that the Pulp Beat process does. It's a little harder to detect the Resource Manager access to the database since the startup doesn't make radical changes like the DB initialization. I'm going to see if I can find some indications that the resource manager is running though. The QPID connection will be much easier to detect. The Resource Manager creates its own set of queues which will be easy to see.

The resource manager requires the database service and an initialized database. Testing this part will start where the previous post left off, with running QPID and MongoDB and with the Pulp Beat service active.

NOTE: there's currently (20140929) a bug in Kubernetes where, during the period between waiting for the image to download and when it actually starts, kubecfg list pods will indicate that the pods have terminated. If you see this, give it another minute for the pods to actually start and transfer to the running state.

Testing in Docker

All I need to do using Docker directly is to verify that the container will start and run. The visibility in Kubernetes still isn't up to general dev and debugging.

docker run -d --name pulp-resource-manager \
  -v /dev/log:/dev/log \
  -e PULP_SERVER_NAME=pulp.example.com \
  -e SERVICE_HOST=10.245.2.2 \
  markllama/pulp-resource-manager
0e8cbc4606cf8894f8be515709c8cd6a23f37b3a58fd84fecf0d8fca46c64eed

 docker ps
CONTAINER ID        IMAGE                                    COMMAND             CREATED             STATUS              PORTS               NAMES
0e8cbc4606cf        markllama/pulp-resource-manager:latest   "/run.sh"           9 minutes ago       Up 9 minutes                            pulp-resource-manager

Once it's running I can check the logs to verify that everything has started as needed and that the primary process has been executed at the end.

docker logs pulp-resource-manager
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ start_resource_manager
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery worker -c 1 -n resource_manager@pulp.example.com --events --app=pulp.server.async.app --umask=18 --loglevel=INFO -Q resource_manager --logfile=/var/log/pulp/resource_manager.log'

If you fail to see it start especially with "file not found" or "no access" errors, check the /dev/log volume mount and the SERVICE_HOST value.

I also want to check that the QPID queues have been created.

qpid-config queues -b guest@10.245.2.4
Queue Name                                       Attributes
======================================================================
04f58686-35a6-49ca-b98e-376371cfaaf7:1.0         auto-del excl 
06fa019e-a419-46af-a555-a820dd86e66b:1.0         auto-del excl 
06fa019e-a419-46af-a555-a820dd86e66b:2.0         auto-del excl 
0c72a9c9-e1bf-4515-ba4b-0d0f86e9d30a:1.0         auto-del excl 
celeryev.ed1a92fd-7ad0-4ab1-935f-6bc6a215f7d3    auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}
e70d72aa-7b9a-4083-a88a-f9cc3c568e5c:0.0         auto-del excl 
e7e53097-ae06-47ca-87d7-808f7042d173:1.0         auto-del excl 
resource_manager                                 --durable --argument passive=False --argument exclusive=False --argument arguments=None
resource_manager@pulp.example.com.celery.pidbox  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments=None
resource_manager@pulp.example.com.dq             --durable auto-del --argument passive=False --argument exclusive=False --argument arguments=None

Line 8 looks like the Celery Beat service queue and lines 11, 12, and 13 are clearly associated with the resource manager. So far, so good.

Testing in Kubernetes

I had to reset the database between starts to test the Pulp Beat container. This image doesn't change the database structure, so I don't need to reset. I can just create a new pod definition and try it out.

Again, the differences from the Pulp Beat pod definition are pretty trivial.

So here's what it looks like when I start the pod:

kubecfg -c pods/pulp-resource-manager.json create pods
I0930 00:00:24.581712 16159 request.go:292] Waiting for completion of /operations/14
ID                      Image(s)                          Host                Labels                       Status
----------              ----------                        ----------          ----------                   ----------
pulp-resource-manager   markllama/pulp-resource-manager   /                   name=pulp-resource-manager   Waiting

kubecfg list pods
ID                      Image(s)                          Host                    Labels                       Status
----------              ----------                        ----------              ----------                   ----------
pulpdb                  markllama/mongodb                 10.245.2.2/10.245.2.2   name=db                      Running
pulpmsg                 markllama/qpid                    10.245.2.2/10.245.2.2   name=msg                     Running
pulp-beat               markllama/pulp-beat               10.245.2.4/10.245.2.4   name=pulp-beat               Terminated
pulp-resource-manager   markllama/pulp-resource-manager   10.245.2.4/10.245.2.4   name=pulp-resource-manager   Terminated

kubecfg get pods/pulp-resource-manager
ID                      Image(s)                          Host                    Labels                       Status
----------              ----------                        ----------              ----------                   ----------
pulp-resource-manager   markllama/pulp-resource-manager   10.245.2.4/10.245.2.4   name=pulp-resource-manager   Running

There are two things of note here. Line 13 shows the pulp-resource-manager pod as terminated. Remember the bug note from above. The pod isn't terminated, it's between the pause container which downloads the image for a new container and the execution.

One line 15 I requested the information for that pod by name using the get command, rather than listing them all. This time it shows running. as it should.

When you use get all you get by default is a one line summary. If you want details you have to consume them as JSON and they're complete. In fact they use the same schema as the JSON used to create the pods in the first place (with a bit more detail filled in). While this could be hard for humans to swallow, it makes it AWESOME to write programs and scripts to process the output. Every command should offer some form of structured data output. Meanwhile, I wish Kubernetes would offer a --verbose option with nicely formatted plaintext. It will come (or I'll write it if I get frustrated enough).

Get ready... Here it comes.

kubecfg --json get pods/pulp-resource-manager | python -m json.tool
{
    "apiVersion": "v1beta1",
    "creationTimestamp": "2014-09-30T00:00:24Z",
    "currentState": {
        "host": "10.245.2.4",
        "hostIP": "10.245.2.4",
        "info": {
            "net": {
                "detailInfo": {
                    "Args": null,
                    "Config": null,
                    "Created": "0001-01-01T00:00:00Z",
                    "Driver": "",
                    "HostConfig": null,
                    "HostnamePath": "",
                    "HostsPath": "",
                    "ID": "",
                    "Image": "",
                    "Name": "",
                    "NetworkSettings": null,
                    "Path": "",
                    "ResolvConfPath": "",
                    "State": {
                        "ExitCode": 0,
                        "FinishedAt": "0001-01-01T00:00:00Z",
                        "Paused": false,
                        "Pid": 0,
                        "Running": false,
                        "StartedAt": "0001-01-01T00:00:00Z"
                    },
                    "SysInitPath": "",
                    "Volumes": null,
                    "VolumesRW": null
                },
                "restartCount": 0,
                "state": {
                    "running": {}
                }
            },
            "pulp-resource-manager": {
                "detailInfo": {
                    "Args": null,
                    "Config": null,
                    "Created": "0001-01-01T00:00:00Z",
                    "Driver": "",
                    "HostConfig": null,
                    "HostnamePath": "",
                    "HostsPath": "",
                    "ID": "",
                    "Image": "",
                    "Name": "",
                    "NetworkSettings": null,
                    "Path": "",
                    "ResolvConfPath": "",
                    "State": {
                        "ExitCode": 0,
                        "FinishedAt": "0001-01-01T00:00:00Z",
                        "Paused": false,
                        "Pid": 0,
                        "Running": false,
                        "StartedAt": "0001-01-01T00:00:00Z"
                    },
                    "SysInitPath": "",
                    "Volumes": null,
                    "VolumesRW": null
                },
                "restartCount": 0,
                "state": {
                    "running": {}
                }
            }
        },
        "manifest": {
            "containers": null,
            "id": "",
            "restartPolicy": {},
            "version": "",
            "volumes": null
        },
        "podIP": "10.244.3.4",
        "status": "Running"
    },
    "desiredState": {
        "host": "10.245.2.4",
        "manifest": {
            "containers": [
                {
                    "env": [
                        {
                            "key": "PULP_SERVER_NAME",
                            "name": "PULP_SERVER_NAME",
                            "value": "pulp.example.com"
                        }
                    ],
                    "image": "markllama/pulp-resource-manager",
                    "name": "pulp-resource-manager",
                    "volumeMounts": [
                        {
                            "mountPath": "/dev/log",
                            "name": "devlog",
                            "path": "/dev/log"
                        }
                    ]
                }
            ],
            "id": "pulp-resource-manager",
            "restartPolicy": {
                "always": {}
            },
            "uuid": "c73a89c0-4834-11e4-aba7-0800279696e1",
            "version": "v1beta1",
            "volumes": [
                {
                    "name": "devlog",
                    "source": {
                        "emptyDir": null,
                        "hostDir": {
                            "path": "/dev/log"
                        }
                    }
                }
            ]
        },
        "status": "Running"
    },
    "id": "pulp-resource-manager",
    "kind": "Pod",
    "labels": {
        "name": "pulp-resource-manager"
    },
    "resourceVersion": 20,
    "selfLink": "/api/v1beta1/pods/pulp-resource-manager"
}

So there you go.

I won't repeat the QPID queue check here because if everything's going well it looks the same.

Summary

As designed there isn't really much to say. The only real changes were to remove the DB setup and change the exec line to start the resource manager process. That's the idea of cookie cutters.

The next one won't be as simple. It uses the Pulp software package, but it doesn't run a Celery service. Instead it runs an Apache daemon and a WSGI web service to offer the Pulp Admin REST protocol. It connects to the database and the messaging service. It also needs SSL and a pair of external public TCP connections.

References

Docker
Containerized Applications
Kubernetes
Orchestration for Docker applications
Pulp
Enterprise OS and configuration content management
Celery
A distributed job management framework
QPID
AMQP Message service
MongoDB
NoSQL Database

Docker: Building and using a base image for Pulp services in Kubernetes

2014-09-26T09:09:00.001-07:00

My stated goal in this series of posts is to create a working containerized Pulp service running in a Kubernetes cluster. After, what is it, 5 posts, I'm finally actually ready to do something with pulp itself.

The Pulp service proper is made up of a single Celery beat process, a single resource manager process, and some number of pulp worker processes. These together do the work of Pulp, mirroring and managing the content that is Pulp's payload. The service also requires at least one Apache HTTP server to deliver the payload but that comes later.

All of the Pulp processes are actually built on Celery. They all require the the same set of packages and much of the same configuration information. They all need use the MongoDB and the QPID services. The worker processes all need access to some shared storage, but the beat and resource manager do not.

To build the Docker images for these different containers, rather than duplicating the common parts, the best practice is to put those parts into a base image and then add one last layer to create each of the variations.

In this post I'll demonstrate creating a shared base image for Pulp services and then I'll create the first image that will consume the base to create the Pulp beat service.

The real trick is to figure out what the common parts are. Some are easy though so I'll start there.

Creating a Base Image

For those of you who are coders, a base image is a little like an abstract class. It defines some important characteristics that are meant to be re-used, but it leaves others to be resolved later. The Docker community already provides a set of base images like the Fedora:20 image which have been hand-crafted to provide a minimal OS. Docker makes it easy to use the same mechanism for building our own images.

The list below enumerates the things that all of the Pulp service images will share. When I create the final images I'll add the final tweaks. Some of these will essentially be stubs to be used later.

Pulp Repo file
Pulp is not yet standard in the RHEL, CentOS or Fedora distributions
Pulp Server software
Communications Software (MongoDB and QPID client libraries)
Configuration tools: Augeas

There is also some configuration scripting that will be required by all the pulp service containers:

A script to apply the customization/configuration for the execution environment
A test script to ensure that the database is available before starting the celery services
A test script to ensure that the message service is available

Given that start, here's what I get for the Dockerfile

Lines 1 and 2 should be familiar already. There are no new directives here but a couple of things need explaining.

Line 1: The base image
Line 2: Contact information
Line 4: A usage comment
Pulp uses syslog. For a process inside a container to write to syslog you either have to have a syslogd running or you have to have write access to the host's /dev/log file. I'll show this gets done when
I create a real app image from this base and run it.
Line 6: Create a yum repo for the Pulp package content.
You can add files using a URL for the source.
Lines 9-12: Install the Pulp packages, QPID client software and Augeas to help configuration.
Lines 15-17: COMMENTED: Install and connect the Docker content plugin
This is commented out at the moment. It hasn't been packaged yet and there are some issues with dependency resolution. I left it here to remind me to put it back when the problems are resolved.
Line 20: Add an Augeas lens definition to manage the Pulp server.conf file
Augeas is suitet for managing config values, when a lens exists. More detail below
Line 23: Add a script to execute the configuration
This will be used by the derived images, but it works the same for all of them
Line 27: Add a script which can test for access to the MongoDB
Pulp will just blindly try to connect, but will just hang if the DB is unavailable. This script allows me to decide to wait or quit if the database isn't ready. If I quit, Kubernetes will re-spawn a new container to try again.

The Pulp Repo

The Pulp server software is not yet in the standard Fedora or EPEL repositories. The packages are available from the contributed repositories on the Fedora project. The repo file is also there, accessible through a URL.

The docker RUN directive can take a URL as well as a local relative file path.

Line 4 pulls the Pulp repo file down and places it so that it can be used in the next step.

Pulp Packages (dependencies and tools)

The Pulp software is easiest installed as a YUM group. I use a Dockerfile RUN directive to install the Pulp packages into the base image. This will install most of the packages needed for the service, but there are a couple of additional packages that aren't part of the package group.

Pulp can serve different types of repository mirrors. These are controlled by content plugins. I add the RPM plugin, python-pulp-rpm-common. I also add a couple of Python QPID libraries. However you can't run both groupinstall and the normal package install command in the same invocation so the additional Python QPID libaries are installed in a second command.

I also want to install Augeas. This is a tool that enables configuration editing using a structured API or CLI command.

Augeas Lens for Pulp INI files

Augeas is an attempt to wrangle the flat file databases that make up the foundation of most *NIX application configuration. It offers a way to access individual key/value pairs within well known configuration files without resorting to tools like sed or perl and regular expressions. With augeas each key/value pair is assigned a path and can be queried and updated using that path. It offers both API and CLI interfaces though it's not nearly as commonly used as it should be.

The down side of Augeas is that it doesn't include a description (lens in Augeas terminology) for Pulp config files. Pulp is too new. The upside is that the Pulp config files are fairly standard INI format, and it's easy to adapt the stock IniFile lens for Pulp.

I won't include the lens text inline here, but I put it in a gist if you want to look at it.

The ADD directive on line 20 of the Dockerfile places the lens file in the Augeas library where it will be found automatically.

Pulp Server Configuration Script

All of the containers that use this base image will need to set a few configuration values for Pulp. These reside in /etc/pulp/server.conf which is an INI formatted text file. These settings indicate the identity of the pulp service itself and how the pulp processes communicate with the database and message bus.

If you are starting a Docker container by hand you could either pass these values in as environment variables using the -e (--env) option or by accepting additional positional arguments through the CMD. You'd have to establish the MongoDB and QPID services then get their IP addresses from Docker and feed the values into the Pulp server containers.

Since Kubernetes is controlling the database and messaging pods and has the Service objects defined, it knows how to tell the Pulp containers where to find these services. It sets a few environment variables for every new container that starts after the service object is created. A new container can use these values to reach the external services it needs.

Line 23 of the Dockerfile adds a short shell script which can accept the values from the environment variables that Kubernetes provides and configure them into the Pulp configuration.

The script gathers the set of values it needs from the variables (or sets reasonable defaults) and then, using augtool (The CLI tool for Augeas) it updates the values in the pulp.conf file.

This is the snippet from the beginning of the configure_pulp_server.sh script which sets the environment variables.

# Take settings from Kubernetes service environment unless they are explicitly
# provided
PULP_SERVER_CONF=${PULP_SERVER_CONF:=/etc/pulp/server.conf}
export PULP_SERVER_CONF

PULP_SERVER_NAME=${PULP_SERVER_NAME:=pulp.example.com}
export PULP_SERVER_NAME

SERVICE_HOST=${SERVICE_HOST:=127.0.0.1}
export SERVICE_HOST

DB_SERVICE_HOST=${DB_SERVICE_HOST:=${SERVICE_HOST}}
DB_SERVICE_PORT=${DB_SERVICE_PORT:=27017}
export DB_SERVICE_HOST DB_SERVICE_PORT

MSG_SERVICE_HOST=${MSG_SERVICE_HOST:=${SERVICE_HOST}}
MSG_SERVICE_PORT=${MSG_SERVICE_PORT:=5672}
MSG_SERVICE_USER=${MSG_SERVICE_USER:=guest}
export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME

These are the values that the rest of the script will set into /etc/pulp/server.conf

UPDATE: As of the middle of October 2014 the SERVICE_HOST variable has been removed. Now each service gets its own IP address, so the generic SERVICE_HOST variable no longer makes sense. Each service variable must be provided explicitly when testing. Also, for testing the master host will provide a proxy to the service. However, as of this update the mechanism isn't working yet. I'll update this post when is working properly. If you are building from git source you can use a commit prior to 10/14/2014 and you can still use SERVICE_HOST test against the minions.

Container Startup and Remote Service Availability

When the Pulp service starts up it will attempt to connect to a MongoDB and to a QPID message broker. If the database isn't ready, the Pulp service may just hang.

Using Kubernetes it's best not to assume that the containers will arrive in any particular order. If the database service is unavailable, the pulp containers should just die. Kubernetes will notice and attempt to restart them periodically. When the database service is available the next client container will connect successfully and... not. die.

I have added a check script to the base container which can be used to test the availability (and the correct access information) for the MongoDB. It also uses the environment variables provided by Kubernetes when the container starts.

This script merely returns a shell true (return value: 0) if the database is available and false (return value: 1) if it fails to connect. This allows the startup script for the actual pulp service containers to check before attempting to start the pulp process and to cleanly report an error if the database is unavailable before exiting.

I haven't included a script to test the QPID connectivity. So far I haven't seen a pulp service fail to start because the QPID service was unavailable when the client container starts.

Scripts are not executed in the base image

The scripts listed above are provided in the base image, but the the base image has no ENTRYPOINT or CMD directives. It is not meant to be run on its own.

Each of the Pulp service images that uses this base will need to have a run script which will call these common scripts to set up the container environment before invoking the Pulp service processes. That's next.

Using a Base Image: The Pulp-Beat Component

The Pulp service is based on Celery. Celery is a framework for creating distributed task-based services. You extend the Celery framework to add the specific tasks that your application needs.

The task management is controlled by a "beat" process. Each Celery based service has to have exactly one beat server which is derived from the Celery scheduler class.

The beat server is a convenient place to do some of the service setup. Since there can only be one beat server and because it must be created first, I can use the beat service container startup to initialize the database.

The Docker development best-practices encourage image composition by layering. Creating a new layer means creating a new build space with a Dockerfile and any files that will be pulled in when the image is built.

In the case of the pulp-base image all of the content is there. The customizations for the pulp-beat service are just the run script which configures and initializes the the service before starting. The Dockerfile is trivially simple:

The real meat is in the run script, though even that is pretty anemic

The main section starts at line 44 and it's really just four steps. Two are defined in the base image scripts and two more are defined here.

Apply the configuration customizations from the environment
These include setting the PULP_SERVER_NAME and the access parameters for the MongoDB and QPID services
Verify that the MongoDB is up and accessable
With Kubernetes you can't be dependent on ordering of the pod startups. This check allows some time for the DB to start and become available. Kubernetes will restart the beat pod if this fails but the checks here prevent some thrashing.
Initialize the MongoDB
This should only happen once. Within a pulp service the beat server is a singleton. I put the initialization step here so that it won't be confused later.
Execute the master process
This is a celery beat process customized with the Pulp master object

Even though the script line for each operation is fairly trivial I still put them into their own functions. This makes it easier for a reader to understand the logical progression and intent before going back to the function and examining the details. It also makes it easier to comment out a single function for testing and debugging.

Testing the Beat Image (stand-alone)

Since Kubernetes currently gives so little real access debug information for the container startup process I'm going to test the Pulp beat container first as a regular Docker container. I have my Kubernetes cluster running in Vagrant and I know the IP addresses of the MongoDB and QPID services.

The other reason to test in plain Docker is that I want to manually verify the code which picks up and uses the configuration environment variables. There are four variables that will be required and two others that will likely default.

PULP_SERVER_NAME
SERVICE_HOST
DB_SERVICE_HOST
MSG_SERVICE_HOST

The defaulted ones will be

DB_SERVICE_PORT
MSG_SERVICE_PORT

DB_SERVICE_HOST and MSG_SERVICE_HOST can be provided directly or can pick up the value of SERVICE_HOST. I want to test both paths.

To test this I'm going to be running the Kubernetes Vagrant cluster on Virtualbox to provide the MongoDB and QPID servers. Then I'll run the Pulp beat server in Docker on the host. I know how to tell the beat server how to reach the services in the Kubernetes cluster (on 10.245.2.{2-4]}).

I'm going to assume that both the pulp-base and pulp-beat images are already built. I'm also going to start the container the first time using /bin/sh so I can manually start the run script and observe what it does.

docker run -d --name pulp-beat -v /dev/log:/dev/log \
>   -e PULP_SERVER_NAME=pulp.example.com \
>   -e SERVICE_HOST=10.245.2.2 markllama/pulp-beat
f16a6f2278e20e0b039cb665bc5f55de39b13a1045f00e25cdab5219652f1d80

This starts the container as a daemon and mounts /dev/log so that syslog will work. It also sets the PULP_SERVER_NAME and SERVICE_HOST variables.

docker logs pulp-beat
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ initialize_database
+ runuser apache -s /bin/bash /bin/bash -c /usr/bin/pulp-manage-db
Loading content types.
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
Applying pulp.server.db.migrations version 1
Migration to pulp.server.db.migrations version 1 complete.
...
Applying pulp_rpm.plugins.migrations version 16
Migration to pulp_rpm.plugins.migrations version 16 complete.
Database migrations complete.
+ run_celerybeat
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery beat --workdir=/var/lib/pulp/celery --scheduler=pulp.server.async.scheduler.Scheduler -f /var/log/pulp/celerybeat.log -l INFO'

This shows why I set the -x at the beginning of the run script. It causes the shell to emit each line as it is executed. You can see the environment variables as they are set. Then they are used to configure the pulp server.conf values. The database is checked and then initialized. Finally it executes the celery beat process which replaces the shell and continues executing.

When this script runs it should have several side effects that I can check. As noted, it creates and initializes the pulp database. It also connects to the QPID server and creates several queues. I can check them in the same way I did when I created the MongoDB and QPID images in the first place.

The database has been initialized

echo show dbs | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/test
local 0.03125GB
pulp_database 0.03125GB
bye

And the celery beat service has added a few queues to the QPID service

qpid-config queues -b guest@10.245.2.4
Queue Name                                     Attributes
======================================================================
0b78268e-256f-4832-bbcc-50c7777a8908:1.0       auto-del excl 
411cc98f-eed3-45f9-b455-8d2e5d333262:0.0       auto-del excl 
aaf61614-919e-49ea-843f-d83420e9232f:1.0       auto-del excl 
celeryev.de500902-4c88-4d5c-90f4-1b4db366613d  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}

But what if I do it wrong?

You can see that the output from a correct startup is pretty lengthy. When I'm happy that the image is stable I'll remove the shell -x setting (and make it either an argument or environment switch for later). There are several other paths to test.

Fail to provide Environment Variables

PULP_SERVER_NAME
SERVICE_HOST
DB_SERVICE_HOST
MSG_SERVICE_HOST

Fail to import /dev/log volume

Each of these will have slightly different failure modes. I suggest you try each of them and observe how it fails. Think of others, I'm sure I've missed some.

For the purposes of this post I'm going to treat these as exercises for the reader and move on.

Testing the Beat Image (Kubernetes)

Now things get interesting. I have to craft a Kubernetes pod description that creates the pulp-beat container, gives it access to logging and connects it to the database and messaging services.

Defining the Pulp Beat pod

Because of the way I crafted the base image and run scripts, this isn't actually as difficult or as complicated as you might think. It turns out that the only environment variable I have to actually pass in is the PULP_SERVER_NAME. The rest of the environment values are going to be provided by the kubelet as defined by the Kubernetes service objects (and served by the MongoDB and QPID containers behind them).

The only really significant thing here is the volume imports.

Pulp uses the python logging mechanism and that in turn by default requires the syslog service. On Fedora 20, syslog is no longer a separate process. It's been absorbed into the systemd suite of low level services and is known now as journald. (cat flamewars/systemd/{pro,con} >/dev/null).

For me this means that for Pulp to run properly it needs the ability to write syslog messages. In Fedora 20 this amounts to being able to write to a special file /dev/log. This file isn't available in containers without some special magic. For Docker that magic is -v /dev/log:/dev/log. This imports the host's /dev/log into the container at the same location. For Kubernetes this is a little bit more involved.

The Kubernetes pod construct has some interesting side-effects. The purpose of pods is to allow the creation of sets of containers that share resources. The JSON reflects this in how the shared resources are declared.

In the pod spec, lines 14-20 are inside the container hash for the container named pulp-beat. They indicate that a volume named "devlog" (line 15) will be mounted read/write (line 16) on /dev/log inside the container (line 17).

Note that this section does not define the named volume or indicate where it will come from. That's defined at the pod level not the container.

Now look at lines 20-23. these are at the pod level (the list of containers has been closed on line 19). The volumes array contains a set of volume definitions. I only define one, named "devlog" (line 21) and indicate that it comes from the host and that the source path is /dev/log.

All that to replace the docker argument -v /dev/log:/dev/log.

Right now this seems like a lot of work for a trivial action. Later this distinction will become very important. The final pod for Pulp will be made up of at least two containers. The pod will import two different storage locations from the host and both containers will mount them.

One last time for clarity: the volumes list is at the pod level. It defines a set of external resources that will be made available to the containers in the pod. The volumeMounts list is at the container level. It maps entries from the volumes section in the pod to mount points inside the container using the value of the name as the connecting handle.

Starting the Pulp Beat Pod

Starting the pulp beat pod is just like starting the MongoDB and QPID pods was. At this point it does require that the Service objects have been created and that the service containers are running, so if you're following along and haven't done that, go do it. Since I'd run my pulp beat container manually and it had modified the mongodb, I also removed the pulp_database before proceeding.

echo 'db.dropDatabase()' | mongo 10.245.2.2/pulp_database
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/pulp_database
{ "dropped" : "pulp_database", "ok" : 1 }
bye
echo show dbs | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/test
local 0.03125GB
bye

To start the pulp beat pod we go back to kubecfg (remember, I aliased kubecfg=~/kubernetes/cluster/kubecfg.sh).

kubecfg -c pods/pulp-beat.json create pods
ID                  Image(s)              Host                Labels              Status
----------          ----------            ----------          ----------          ----------
pulp-beat           markllama/pulp-beat   /                   name=pulp-beat      Waiting

kubecfg get pods/pulp-beat
ID                  Image(s)              Host                    Labels              Status
----------          ----------            ----------              ----------          ----------
pulp-beat           markllama/pulp-beat   10.245.2.2/10.245.2.2   name=pulp-beat      Waiting

Now I know that the pod has been assigned to 10.245.2.2 (minion-1) I can log in there directly and examine the docker container.

vagrant ssh minion-1
Last login: Fri Dec 20 18:02:34 2013 from 10.0.2.2
sudo docker ps | grep pulp-beat
2515129f2c7e        markllama/pulp-beat:latest   "/run.sh"              54 seconds ago      Up 53 seconds                                k8s--pulp_-_beat.a6ba93e9--pulp_-_beat.etcd--d2a60369_-_458d_-_11e4_-_b682_-_0800279696e1--0b799f3d   
sudo docker logs 2515129f2c7e
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ initialize_database
+ runuser apache -s /bin/bash /bin/bash -c /usr/bin/pulp-manage-db
Loading content types.
Content types loaded.
Ensuring the admin role and user are in place.
Admin role and user are in place.
Beginning database migrations.
Applying pulp.server.db.migrations version 1
Migration to pulp.server.db.migrations version 1 complete.
...
Applying pulp_rpm.plugins.migrations version 16
Migration to pulp_rpm.plugins.migrations version 16 complete.
Database migrations complete.
+ run_celerybeat
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery beat --workdir=/var/lib/pulp/celery --scheduler=pulp.server.async.scheduler.Scheduler -f /var/log/pulp/celerybeat.log -l INFO'

If this is the first time running the image it may take a while for Kubernetes/Docker to pull it from the Docker hub. There may be a delay as the kubernetes pause container does the pull.

I can now run the same tests I did earlier on the MongoDB and QPID services to reassure myself that the pulp beat service is connected.

echo show dbs | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.2/test
local 0.03125GB
pulp_database 0.03125GB
bye

qpid-config queues -b guest@10.245.2.4
Queue Name                                     Attributes
======================================================================
613f4b89-e63e-4230-9620-e932f5a777e5:0.0       auto-del excl 
c990ea7b-3d7f-4603-80e5-176ebc649ff1:1.0       auto-del excl 
celeryev.ffbc537b-1161-4049-b425-723487135fc2  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}
e0155372-12ee-4c9a-9c4d-8f4863601b3a:1.0       auto-del excl

After all that thought and planning the end result is actually kinda boring. Just the way I like it.

What's next?

The pulp-beat service is just the first real pulp component. It runs in isolation from the other components, communicating only through the messaging and database. There is another component like that, the pulp-resource-manager. This is another Celery process and the it is created, started and tested just like the pulp-beat service. I'm going to do one much-shorter post on that for completeness before tackling the next level of complexity.

The two remaining different components are the content pods, which require shared storage and which will have two cooperating containers running inside the pod. One will manage the content mirroring and the other will serve the content out to clients.

I think before that though I will tackle the Pulp Admin service. This is a public facing REST service which accepts pulp admin commands to create and manage the content repositories.

Both of these will require the establishment of encryption, which means placing x509 certificates within the containers. These are the upcoming challenges.

References

Docker - Containerized applications
Kubernetes - Orchestration for creating containerized services
MongoDB - A Non-relational database
QPID - an AMQP messaging service

Pulp - An enterprise OS content mirroring system
Celery - A Distributed Task Queue Framework
Augeas - Structured queries and updates to (largely) unstructured configurations
INI Files - A simple format for simple configurations

Kubernetes Under The Hood: Etcd

2014-09-23T13:19:00.001-07:00

Kubernetes is an effort which originated within Google to provide an orchestration layer above Docker containers. Docker operation is limited to actions on a single host. Kubernetes attempts to provide a mechanism to manage large sets of containers on a cluster of container hosts. Above that will eventually be job management services like Mesos or Aurora.

Anatomy of Kubernetes Cluster

A Kubernetes cluster is made up of three major active components

Kubernetes app-service
Kubernetes kubelet agent
etcd distributed key/value database

The app-service is the front end of the Kubernetes cluster. It accepts requests from clients to create and manage containers, services and replication controllers within the cluster. This is the control interface of Kubernetes.

The kubelet is the active agent. It resides on a Kubernetes cluster member host. It polls for instructions or state changes and acts to execute them on the host.

The etcd services are the communications bus for the Kubernetes cluster. The app-service posts cluster state changes to the etcd database in response to commands and queries. The kubelets read the contents of the etcd database and act on any changes they detect.

There's also a kube-proxy process which does the Service network proxy work but that's not relevant to the larger operations.

This post is going to describe and play with the etcd.

OK, so what is Etcd?

Etcd (or etcd) is a service created by the CoreOS team to create a shared distributed configuration database. It's a replicated key/value store. The data are accessed using ordinary HTTP(S) GET and PUT queries. The status, metadata and payload are returned as members of a JSON data structure.

Etcd has a companion CLI client for testing and manual interaction. This is called etcdctl. Etcdctl is merely a wrapper that hides the HTTP interactions and the raw JSON that is used as status and payload.

Installing and Running Etcd

Etcd (and etcdctl, the CLI client) aren't yet available in RPM format from the standard repositories, or if they are they're very old. If you're running on 64 bit Linux you can pull the most recent binaries from the Github repository for CoreOS. Download them, unpack the tar.gz file and place the binaries in your path.

curl -s -L https://github.com/coreos/etcd/releases/download/v0.4.6/etcd-v0.4.6-linux-amd64.tar.gz | tar -xzvf -
etcd-v0.4.6-linux-amd64/
etcd-v0.4.6-linux-amd64/etcd
etcd-v0.4.6-linux-amd64/etcdctl
etcd-v0.4.6-linux-amd64/README-etcd.md
etcd-v0.4.6-linux-amd64/README-etcdctl.md
cd etcd-v0.4.6-linux-amd64

Once you have the binaries, check out the Etcd and Etcdctl github pages for basic usage instructions. I'll duplicate here a little bit just to get moving.

Etcd doesn't run as a traditional daemon. It remains connected to STDOUT and logs activity. I'm not going to demonstrate here how to turn it into a proper daemon. Instead I'll run it in one terminal session and use another to access it.

NOTE 1: Etcd does not use standard longopts conventions. All of the options use single leading hyphens.
NOTE 2: Etcdctl does follow the longopt conventions. Go figure.

./etcd
[etcd] Sep 23 10:36:04.655 WARNING   | Using the directory myhost.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Sep 23 10:36:04.656 INFO      | myhost is starting a new cluster
[etcd] Sep 23 10:36:04.658 INFO      | etcd server [name myhost, listen on :4001, advertised url http://127.0.0.1:4001]
[etcd] Sep 23 10:36:04.658 INFO      | peer server [name myhost, listen on :7001, advertised url http://127.0.0.1:7001]
[etcd] Sep 23 10:36:04.658 INFO      | myhost starting in peer mode
[etcd] Sep 23 10:36:04.658 INFO      | myhost: state changed from 'initialized' to 'follower'.
[etcd] Sep 23 10:36:04.658 INFO      | myhost: state changed from 'follower' to 'leader'.
[etcd] Sep 23 10:36:04.658 INFO      | myhost: leader changed from '' to 'myhost'.

As you can see the daemon listens by default to the localhost interface on port 4001/TCP for client interactions and on port 7001/TCP for clustering communications. See the output of etcd -help for detailed options. You can also see the process whereby the new daemon attempts to connect to peers and determine its place within the cluster. Since there are no peers, this one elects itself leader.

That output looks as if the etcd is running. I can check by querying the daemon version and some other information.

curl -s http://127.0.0.1:4001/version
etcd 0.4.6

I can also get some stats from the daemon directly as well:

curl -s -L http://127.0.0.1:4001/v2/stats/self | python -m json.tool
{
    "leaderInfo": {
        "leader": "myhost",
        "startTime": "2014-09-23T10:37:04.839453766-04:00",
        "uptime": "5h10m13.053046076s"
    },
    "name": "myhost",
    "recvAppendRequestCnt": 0,
    "sendAppendRequestCnt": 0,
    "startTime": "2014-09-23T10:37:04.83945236-04:00",
    "state": ""
}

So now I know it's up and responding.

Playing with Etcd

Etcd responds to HTTP(S) queries both to set and retrieve data. All of the data are organized into a hierarchical key set (which for normal people means that the keys look like files in a tree of directories). The values are arbitrary strings. This makes it very easy to test and play with etcd using ordinary CLI web query tools like curland wget. The binary releases also include a CLI client called etcdctl which simplifies the interaction, allowing the caller to focus on the logical operation and the result rather than the HTTP/JSON interaction. I'll show both methods where they are instructive, choosing the best one for each example.

The examples here are adapted from the CoreOS examples on Github. There's also a complete protocol document for it as well

Once the etcd is running I can begin working with it.

Etcd is a hierarchical key=value store. This means that each piece of stored data has a key which uniquely identifies it within the database. The key is hierarchical in that the key is composed of a set of elements that form a path from a fixed known starting point for the database known as the root. Any given element in the database can either be a branch (directory) or a leaf (value). Directories contain other keys and are used to create the hierarchy of data.

This is all formal gobbledy-gook for "it looks just like a filesystem". In fact a number of the operations that etcdctl offers are exact analogs of filesystem commands: mkdir, rmdir, ls, rm.

The first operation is to look at the contents of the root of the database. Expect this to be boring because there's nothing there yet.

./etcdctl ls /

See? There's nothing there. Boring.

It looks a little different when you pull it using curl.

curl -s http://127.0.0.1:4001/v2/keys/ | python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/"
    }
}

The return payload is JSON. I use the python json.tool module to pretty print it.

I can see that this is the response to a GET request. The node hash describes the query and result. I asked for the root key (/) and it's an (empty) directory.

Life will be a little more interesting if there's some data in the database. I'll add a value and I'm going to put it well down in the hierarchy to show how the tree structure works.

./etcdctl set /foo/bar/gronk "I see you"
I see you

Now when I ask etcdctl for the contents of the root directory I at least get some output:

./etcdctl ls /
/foo

But that's much more interesting when I look using curl.

curl -s http://127.0.0.1:4001/v2/keys/ | python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/",
        "nodes": [
            {
                "createdIndex": 7,
                "dir": true,
                "key": "/foo",
                "modifiedIndex": 7
            }
        ]
    }
}

This looks very similar to the previous response with the addition of the nodes array. I can infer that this list contains the set of directories and values that the root contains. In this case it contains one other subdirectory named /foo.
Creating a new value is also more fun using curl:

curl -s http://127.0.0.1:4001/v2/keys/fiddle/faddle -XPUT -d value="popcorn" | python -m json.tool
{
    "action": "set",
    "node": {
        "createdIndex": 8,
        "key": "/fiddle/faddle",
        "modifiedIndex": 8,
        "value": "popcorn"
    }
}

The return payload is the REST acknowledgement response to the PUT query. It looks similar to the GET query response, but not identical. The action is (appropriately enough) set. Only a single node is returned, not the node list you get when querying a directory and the value is provided as well. The REST protocol (and the etcdctl command) allow for a number of modifiers for queries. Two I'm going to use a lot are sort and recursive.

If I want to see the complete set of nodes underneath a directory I can use etcdctl ls with the --recursive option:

./etcdctl ls / --recursive
/foo
/foo/bar
/foo/bar/gronk
/fiddle
/fiddle/faddle

That's a nice pretty listing. As you can imagine, this gets a bit messier if you use curl for the query. This is probably the last time I'll use curl for a query here.

curl -s http://127.0.0.1:4001/v2/keys/?recursive=true| python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/",
        "nodes": [
            {
                "createdIndex": 7,
                "dir": true,
                "key": "/foo",
                "modifiedIndex": 7,
                "nodes": [
                    {
                        "createdIndex": 7,
                        "dir": true,
                        "key": "/foo/bar",
                        "modifiedIndex": 7,
                        "nodes": [
                            {
                                "createdIndex": 7,
                                "key": "/foo/bar/gronk",
                                "modifiedIndex": 7,
                                "value": "I see you"
                            }
                        ]
                    }
                ]
            },
            {
                "createdIndex": 8,
                "dir": true,
                "key": "/fiddle",
                "modifiedIndex": 8,
                "nodes": [
                    {
                        "createdIndex": 8,
                        "key": "/fiddle/faddle",
                        "modifiedIndex": 8,
                        "value": "popcorn"
                    }
                ]
            }
        ]
    }
}

Clustering Etcd

Etcd is designed to allow database replication and the formation of clusters. When two etcds connect, they use a different port from the normal client access port. An etcd that intends to participate listens on that second port and also connects to a list of peer processes which also are listening.

You can set up peering (replication) using the command line arguments --peer-addr and --peers or you can set the values in the configuration file /etc/etcd/etcd.conf

Complete clustering documentation can be found on Github.

Etcd and Security

Etcd communications can be encrypted using SSL, but there is no authentication or access control. This makes it simple to use, but it makes it critical that you be careful never to place sensitive information like passwords or private keys into Etcd. It also means that you assume when using etcd that there are no malicious actors in the network space which has access. Any process with network access can both read and write any keys and values within the etcd. It is absolutely essential that access to etcd be protected at the network level because there's nothing else restricting access.

Instructions for enabling SSL to encrypt etcd traffic is also on Github

Etcd can be configured to restrict access to queries which use a client certificate but this provides very limited access control. Clients are either allowed full access or denied. There is no concept of a user, or authentication or access control policy once a connection has been allowed.

Additional Capabilities of Etcd

Don't make the mistake of thinking that Etcd is a simple networked filesystem with an HTTP/REST protocol interface. Etcd has a number of other important capabilities related to its role in configuration and cluster management.

Each directory or leaf node can have a Time To Live or TTL value associated with it. The TTL indicates the lifespan of they key/value pair in seconds. When a value is set, if the TTL is also set then that key/value pair will expire when the TTL drops to zero. After that the value will no longer be available.

It is also possible to create hidden nodes. These are nodes that will not appear in directory listings. To access them the query must specify the correct path explicitly. Any node name which begins with an underscore character (_) will be hidden from directory queries.

Most importantly it is possible for clients to wait for changes to a key. If I issue a GET query on a key with the wait flag set then the query will block, leaving the query incomplete and the TCP session open. Assuming that the client doesn't time out the query will remain open and unresolved until the etcd detects (and executes) a change request on that key. At that point the waiting query will also complete and return the new value. This can be used as an event management or messaging system to avoid unnecessary polling.

Etcd in Kubernetes

Etcd is used by Kubernetes as both the cluster state database and as the communications mechanism between the app-server and the kubelet processes on the minion hosts. The app-server places values into the etcd in response to requests from the users for things like new pods or services, and it queries values from it to get status on the minions, pods and services.

The kubelet processes also both query and update the contents of the database. They poll for desired state changes and create new pods and services in response. They also push status information back to the etcd to make it available to client queries.

The root of the Kubernetes data tree within the etcd database is /registry. Let's see what's there.

./etcdctl ls /registry --recursive
/registry/services
/registry/services/specs
/registry/services/specs/db
/registry/services/specs/msg
/registry/services/endpoints
/registry/services/endpoints/db
/registry/services/endpoints/msg
/registry/pods
/registry/pods/pulpdb
/registry/pods/pulpmsg
/registry/pods/pulp-beat
/registry/pods/pulp-resource-manager
/registry/hosts
/registry/hosts/10.245.2.3
/registry/hosts/10.245.2.3/kubelet
/registry/hosts/10.245.2.4
/registry/hosts/10.245.2.4/kubelet
/registry/hosts/10.245.2.2
/registry/hosts/10.245.2.2/kubelet

I'm running the Vagrant cluster on Virtualbox with three minions. These are listed under the hosts subtree.

I've also defined two services, db and msg which are found under the services subtree. The service data is divided into two parts. The specs tree contains the definitions I provided for the two services. The endpoints subtree contains records which indicate the actual locations of the containers labeled to accept the service connections.

Finally I've defined four pods which make up the service I'm building (which happens to be a Pulp service). Each host is listed by its IP address at the moment. Work is on-going to allow the minions to be referred to by their host-name but that requires control of the nameservice which is available inside the containers. Without a universal nameservice for containers, IP addresses are the only way for processes inside a container to find hosts outside.

Some of the values here will look familiar to someone who has created pods and services using the kubecfg client. They are nearly identical to the JSON query and response payloads from the Kubernetes app-server.

I don't recommend making any changes or additions to the etcd database in a running Kubernetes cluster. I haven't looked deeply enough yet into how the app-server and kubelet interact with etcd and it would be very easy I think to upset them. For now I'm able to query etcd and confirm that my commands have or have not been initiated and compare what I see to what I expect.

Summary

Etcd is a neat tool for storing and sharing configuration data. It's only useful (so far) in limited cases where there are no malicious or careless users, but it's a very young project. I am speculating that etcd is a a temporary component of Kubernetes. It provides the features needed to facilitate the development of the app-server and kubelet which are the core functions of Kubernetes. Once those are stable, if others feel the need to use a more secure or scalable component then it can be done. The configuration payload can remain and only the communications mechanism will need to be replaced.

References

CoreOS - A Docker hosting environment
Etcd - A distributed replicated key/value database with a REST access protocol

Releases
Documentation

Docker - Host based containerized application hosting
Kubernetes - Orchestration tools for Docker
Pulp - An enterprise class file repository and mirror system

Kubernetes: Simple Containers and Services

2014-09-04T13:29:00.000-07:00

From previous posts I now have a MongoDB image and another which runs a QPID AMQP broker. I intend for these to be used by the Pulp service components.

What I'm going to do this time is to create the subsidiary services that I'll need for the Pulp service within a Kubernetes cluster.

UPDATE 12/16/2014: recently the kubecfg command has been deprecated and replaced with kubectl. I've updated this post to reflect the CLI call and output from kubectl.

Pre-Launch

A Pulp service stores it's persistent data in the database. The service components, a Celery Beat server and a number of Celery workers, as well as one or more Apache web server daemons all communicate using the AMQP message broker. They store and retrieve data from the database.

In a traditional bare-bare metal or VM based installation all of these services would likely be run on the same host. If they are distributed, then the IP addresses and credentials of the support services would have to be configured into Pulp servers manually or using some form of configuration management. Using containers the components can be isolated but the task of tracking them and configuring the consumer processes remains.

Using just Docker, the first impulse of an implementer would be similar, to place all of the containers on the same host. This would simplify the management of the connectivity between the parts, but it also defeats some of the benefit of containerized applications: portability and non-locality. This isn't a failing of Docker. It is the result of conscious decisions to limit the scope of what Docker attempts to do, avoiding feature creep and bloat. And this is where a tool like Kubernetes comes in.

As mentioned elsewhere, Kubernetes is a service which is designed to bind together a cluster of container hosts, which can be regular hosts running the etcd and kubelet daemons or they can be specialized images like Atomic or CoreOS. They can be private or public services such as Google Cloud

For Pulp, I need to place a MongoDB and a QPID container within a Kubernetes cluster and create the infrastructure so that clients can find it and connect to it. For each of these I need to create a Kubernetes Service and a Pod (group of related containers).

Kicking the Tires

It's probably a good thing to explore a little bit before diving in so that I can see what to expect from Kubernetes in general. I also need to verify that I have a working environment before I start trying to bang on it.

Preparation

If you're following along, at this point I'm going to assume that you have access to a running Kubernetes cluster. I'm going to be using the Vagrant test cluster as defined in the github repository and described in the Vagrant version of the Getting Started Guides.

I'm also going to assume that you've built the kubernetes binaries. I'm using the shell wrappers in the cluster sub-directory, especially cluster/kubectl.sh. If you try that and you haven't built the binaries you'll get a message that looks like this:

cluster/kubectl.sh 
It looks as if you don't have a compiled kubectl binary.

If you are running from a clone of the git repo, please run
'./build/run.sh hack/build-cross.sh'. Note that this requires having
Docker installed.

If you are running from a binary release tarball, something is wrong. 
Look at http://kubernetes.io/ for information on how to contact the 
development team for help.

If you see that, do as it says. If that fails, you probably haven't installed the golang package.

For convenience I alias the kubectl.sh wrapper so that I don't need the full path.

alias kubectl=~/kubernetes/cluster/kubectl.sh

Like most CLI commands now if you invoke it with no arguments it prints usage.

kubectl --help 2>1 | more
Usage of kubectl:

Usage: 
  kubectl [flags]
  kubectl [command]

Available Commands: 
  version                                             Print version of client and server
  proxy                                               Run a proxy to the Kubernetes API server
  get [(-o|--output=)json|yaml|...] <resource> [<id>] Display one or many resources
  describe <resource> <id>                            Show details of a specific resource
  create -f filename                                  Create a resource by filename or stdin
  createall [-d directory] [-f filename]              Create all resources specified in a directory, filename or stdin
  update -f filename                                  Update a resource by filename or stdin
  delete ([-f filename] | (<resource> <id>))          Delete a resource by filename, stdin or resource and id

The full usage output can be found in the CLI documentation in the Kubernetes Github repository.

kubectl has one oddity that makes a lot of sense once you understand why it's there. The command is meant to produce output which is consumable by machines using UNIX pipes. The output is structured data formatted using JSON or YAML. To avoid strange errors in the parsers, the only output to STDOUT is structured data. This means that all of the human readable output goes to STDERR. This isn't just the error output though. This includes the help output. So if you want to run the help and usage output through a pager app like more(1) or less(1), you have to first redirect STDERR to STDOUT as I did above.

Exploring the CLI control objects

You can see in the REST API line the possible operations: get, list, create, delete, update . That line also shows the objects that the API can manage: minions, pods, replicationControllers, servers.

Minions

A minion is a host that can accept containers. It runs an etcd and a kubelet daemon in addition to the Docker daemon.For our purposes a minion is where containers can go.

I can list the minions in my cluster like this:

kubectl get minions
NAME                LABELS
10.245.2.4          <none>
10.245.2.2          <none>
10.245.2.3          <none>

The only valid operation on minions using the REST protocol are the list and get actions. The get response isn't very interesting.

Until I add some of the other objects this is the most interesting query. It indicates that there are three minions connected and ready to accept containers.

Pods

A pod is the Kubernetes object which describes a set of one or more containers to be run on the same minion. While the point of a cluster is to allow containers to run anywhere within the cluster, there are times when a set of containers must run together on the same host. Perhaps they share some external filesystem or some other resource. See the golang specification for the Pod struct.

kubectl get pods
NAME                IMAGE(S)            HOST                    LABELS              STATUS

See? not very interesting.

Replication Controllers

I'm going to defer talking about replication controllers in detail for now. It's enough to note their existence and purpose.

Replication controllers are the tool to create HA or load balancing systems. Using a replication controller you can tell Kubernetes to create multiple running containers for a given image. Kubernetes will ensure that if one container fails or stops that a new container will be spawned to replace it.

I can list the replication controllers in the same way as minions or pods, but there's nothing to see yet.

Services

I think the term service is an unfortunate but probably unavoidable terminology overload.

In Kubernetes, a service defines a TCP or UDP port reservation. It provides a way for applications running in containers to connect to each other without requiring that each one be configured with the end-point IP addresses. This both allows for abstracted configuration and for mobility and load balancing of the providing containers.

When I define a Kubernetes service, the service providers (the MongoDB and QPID containers) will be labeled to receive traffic and the service consumers (the Pulp components) will be given the access information in the environment so that they can reach the providers. More about that later.

I can list the services in the same way as I would minions or pods. And it turns out that creating a couple of Kubernetes services is the first step I need to take to prepare the Pulp support service containers.

Creating a Kubernetes Service Object

In a cloud cluster one of the most important considerations is being able to find things. The whole point of the cloud is to promote non-locality. I don't care where things are, but I still have to be able to find them somehow.

A Kubernetes Service object is a handle that allows my MongoDB and QPID clients find the servers without them having to know where they really are. It defines a port to listen on and a way for clients to indicate that they want to accept the traffic that comes in. Kubernetes arranges for the traffic to be forwarded to the servers.

Kubernetes both accepts and produces structured data formats for input and reporting. The two currently supported formats are JSON and YAML. The Service structure is relatively simple but it has elements which are shared by all of the top level data structures. Kubernetes doesn't yet have any tooling to make the creation of an object description easier than hand-crafting a snipped of JSON or YAML. Each of the structures is documented in the godoc for Kubernetes. For now that's all you get.

There are a couple of provided examples and these will have to do for now. The guestbook example demonstrates using ReplicationServers and master/slave implementation using Redis. The second shows how to perform a live update of the pods which make up an active service within a Kubernetes cluster. These are actually a bit more advanced than I'm ready for and don't give the detailed break-down of the moving parts that I mean to do.

This is a complete description of the service. Lines 5-8 define the actual content.

Line 2 indicates that this is a Service object.
Line 3 indicates the object schema version.
v1beta1 is current
(note: my use of the term 'schema' is a loose one)
Line 4 identifies the Service object.
This must be unique within the set of services
Line 5 is the TCP port number that will be listening
Line 6 is for testing. It tells the proxy on the minion with that IP to listen for inbound connections.
I'll also use the publicIPs value to expose the HTTP and HTTPS services for Pulp
Lines 7-9 set the Selector
The selector is used to associate this Service object with containers that will accept the inbound traffic.
This will match with one of the label items assigned to the containers.

When a new service is created Kubernetes establishes a listener on an available IP address (one of the minions addresses). While the service object exists any new containers will start with a new set of environment variables which provide access information. The value of the selector (converted to upper case) is used as the prefix for these environment variables so that containers can be designed to pick them up and use them for configuration.

For now I just need to establish the service so that when I create the DB and QPID containers they have something to be bound to.

The QPID service is identical to the MongoDB service, replacing the port (5672) and the selector (msg)

Querying a Service Object

I've just created a Service object. I wonder what Kubernetes thinks of it? I can list the services as seen above. I can also get the object information using kubectl.

kubectl get services db
NAME                LABELS              SELECTOR            IP                  PORT
db                                name=db             10.0.41.48          27017

That's nice. I know the important information now. But what does it look like really.

kubectl get --output=json services db
{
    "kind": "Service",
    "id": "db",
    "uid": "c040da3d-8536-11e4-a18b-0800279696e1",
    "creationTimestamp": "2014-12-16T15:18:12Z",
    "selfLink": "/api/v1beta1/services/db?namespace=default",
    "resourceVersion": 13,
    "apiVersion": "v1beta1",
    "namespace": "default",
    "port": 27017,
    "protocol": "TCP",
    "selector": {
        "name": "db"
    },
    "publicIPs": [
        "10.245.2.2"
    ],
    "containerPort": 0,
    "portalIP": "10.0.41.48"
}

Clearly Kubernetes has filled out some of the object fields. Note the --output=json flag for structured data.

I'll be using this method to query information about the other elements as I go along.

Describing a Container (Pod) in Kubernetes

We've seen how to run a container on a Docker host. With Kubernetes we have to create and submit a description of the container with all of the required variables defined.

Kubernetes has an additional abstraction called a pod. While Kubernetes is designed to allow the operator to ignore the location of containers within the cluster, there are times when a set of containers needs to be co-located on the same host. A pod is Kubernetes' way of grouping containers when needed. When starting a single container it will still be referred to as a member of a pod.

Here's the description of a pod containing the MongoDB service image I created earlier.

This is actually a set of nested structures, maps and arrays.

Lines 1-21 define a Pod.
Lines 2-4 are elements of an inline JSONBase structure
Lines 5-7 are a map (hash) of strings assigned to the Pod struct element named Labels.
Lines 8-20 define a PodState named DesiredState.
The only required element is the ContainerManifest, named Manifest in the PodState.
A Podstate has a required Version and ID, though it is not a subclass of JSONBase.
It also has a list of Containers and an optional list of Volumes
Lines 12-18 define the set of containers (only one in this case) that will reside in the pod.
A Container has a name and an image path (in this case to the previously defined mongodb image).
Lines 15-17 are a set of Port specifications.
These indicate that something inside the container will be listening on these ports.

You can see how learning the total schema means fishing through each of these structure definitions in the documentation. If you work at it you will get to know them. To be fair they are really meant to be generated and consumed by machines rather than humans. Kubernetes is still the business end of the service. Pretty dashboards will be provided later. The only visibility I really need is for development and diagnostics. There are gaps here too, but finding them is what experiments like this are about.

A note on Names and IDs

There are several places where there is a key named "name" or "id". I could give them all the same value, but I'm going to deliberately vary them so I can expose which ones are used for what purpose. Names can be arbitrary strings. I believe that IDs are restricted somewhat (no hyphens).

Creating the first Pod

Now I can get back to business.

Once I have the Pod definition expressed in JSON I can submit that to kubectl for processing.

kubectl create -f pods/mongodb.json 
pulpdb

TADA! I now have a MongoDB running in Kubernetes.

But how do I know?

Now that I actually have a pod, I should be able to query the Kubernetes service about it and get more than an empty answer.

kubectl get pods pulpdb
NAME                IMAGE(S)            HOST                    LABELS              STATUS
pulpdb              markllama/mongodb   10.245.2.3/10.245.2.3   name=db             Running

Familiar and Boring. But I can get more from kubectl by asking for the raw JSON return from the query.

{
    "kind": "Pod",
    "id": "pulpdb",
    "uid": "4bac8381-8537-11e4-a18b-0800279696e1",
    "creationTimestamp": "2014-12-16T15:22:06Z",
    "selfLink": "/api/v1beta1/pods/pulpdb?namespace=default",
    "resourceVersion": 22,
    "apiVersion": "v1beta1",
    "namespace": "default",
    "labels": {
        "name": "db"
    },
    "desiredState": {
        "manifest": {
            "version": "v1beta2",
            "id": "",
            "volumes": [
                {
                    "name": "devlog",
                    "source": {
                        "hostDir": {
                            "path": "/dev/log"
                        },
...
            "pulp-db": {
                "state": {
                    "running": {
                        "startedAt": "2014-12-16T15:27:04Z"
                    }
                },
                "restartCount": 0,
                "image": "markllama/mongodb",
                "containerID": "docker://8f21d45e49b18b37b98ea7556346095261699bc
3664b52813a533edccee55a63"
            }
        }
    }
}

It's really long. So I'm not going to include it inline. Instead I put it into a gist.

If you fish through it you'll find the same elements I used to create the pod, and lots, lots more. The structure now contains both a desiredState and a currentState sub-structure, with very different contents.

Now a lot of this is just noise to us, but lines 59-72 are of particular interest. These show the effects of the Service object that was created previously. These are the environment variables and network ports declared. These are the values that a client container will use to connect to this service container.

Testing the MongoDB service

If you've read my previous blog post on creating a MongoDB Docker image you'll be familiar with the process I used to verify the basic operation of the service.

In that case I was running the container using Docker on my laptop. I knew exactly where the container was running and I had direct access to the Docker CLI so that I could ask Docker about my new container.

I'd opened up the MongoDB port and told Docker to bind it to a random port on the host and I could connect directly to that port.

In a Kubernetes cluster there's no way to know a priori where the MongoDB container will end up. You have to ask Kubernetes where it is. Further you don't have direct access to the Docker CLI.

This is where that publicIPs key in the mongodb-service.json file comes in. I set the public IP value of the db service to an external IP address of one of the Kubernetes minions: 10.245.2.2. This causes the proxy on that minion to accept inbound connections and forward them to the db service pods where ever they are.

The minion host is accessible from my desktop so I can test the connectivity directly.

echo "show dbs" | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.4/test
local 0.03125GB
bye

And now for QPID?

As with the Service object, creating and testing the QPID container within Kubernetes requires the same process. Create a JSON file which describes the QPID service and another for the pod. Submit them and test as before.

Summary

Now I have two running network services inside the Kubernetes cluster. This consists of a Kubernetes Service object and a Kubernetes Pod which is running the image I'd created for each service application.

I can prove to myself that the application services are running and accessible, though for some of the detailed tests I have to go under the covers of Kuberntes still.

I have the information I need to craft images for the other Pulp services so that they can consume the database and messenger services.

Next Up

In the next post I mean to create the first Pulp service image, the Celery Beat server. There are elements that all of the remaining images will have in common, so I'm going to first build a base image and then apply the last layer to differentiate the beat server from the Pulp resource manager and the pulp workers.

References

Docker
https://docker.com/
Kubernetes
https://github.com/GoogleCloudPlatform/kubernetes/
Kubernetes Source Code Documentation
https://godoc.org/github.com/GoogleCloudPlatform/kubernetes
Pulp
http://www.pulpproject.org/
Celery
http://www.celeryproject.org/
JSON
http://json.org/
YAML
http://yaml.org/
Pretty Printing JSON with Python
http://stackoverflow.com/questions/352098/how-can-i-pretty-print-json

Docker: A QPID Message Broker Container

2014-09-01T17:30:00.002-07:00

OK I lied. I realized I can't just move on to working with Pulp in Kubernetes without building the other sub-service Pulp needs.

This one is merely going to be an exposition of the QPID container and it's actually simpler than the MongoDB container, so this will be a short one. A QPID service is even simpler than a MongoDB because (so long as you don't care about store-and-forward messages) you don't need persistent storage.

Like the MongoDB container, I need to define the package set that will be installed on top of the base image. I also need to declare a TCP port for the QPID service. Finally I need to define the primary process that will be started when the container starts. This will be an invocation of the QPID service daemon.

QPID Dockerfile

Here's the Dockerfile for QPID on Fedora 20.

Let's walk through the Dockerfile directives quickly.

Line 1: FROM - Just as in the MongoDB image, I'm using the stock Fedora 20 image as the base

Line 2: MAINTAINER - Indicate who to contact with problems (AND THANKS!)

Yeah, that's me.

Line 7: RUN - Install the QPID packages

I think there are several QPID servers. I'm using the one written in C++, hence the package names: qpid-cpp-server and qpidd-cpp-server-store.

Line 10: ADD - Create a location for the daemon to run. If you specify a file to add but there is no matching file in the build context directory, then Docker will create the target in the container as an empty directory.

I'm creating /.qpidd for the daemon to run in.

Line 12: WORKDIR - Set the location where the initial process will run. Here is where I tell Docker to run the daemon in the directory I created with the previous ADD directive.

Line 14: EXPOSE - QPID uses port 5672/TCP. This line opens the firewall for that port and causes Docker to bind it to a host port.

Line 16: ENTRYPOINT - This indicates the binary or script that will be called when the container runs.

The ENTRYPOINT and CMD directives are used to craft the invocation of the primary process of the container.

Explaining ENTRYPOINT and CMD

I got some help for this from a Stackoverflow article: What is the difference between CMD and ENTRYPOINT

When I docker container is run, a single process is started inside the container. This process may spawn others, but it remains as the anchor process for all of the others.

The invocation of the container primary process is created by combining the values of the ENTRYPOINT and CMD directives. The ENTRYPOINT, if it is set, becomes the path of the binary to be executed. The value of the CMD directive is used as the arguments to the primary process.

There are two twists on this.

If no ENTRYPOINT is provided, then the CMD directive is run using /bin/sh -c.
Also if the docker run command has any positional arguments following the regular docker arguments, these will replace the CMD value.

By setting the ENTRYPOINT to the QPID command, then the arguments to the daemon can be passed directly on the docker run line.

If an image has an ENTRYPOINT directive then it can be overridden with the --entrypoint option to docker run.

Building the Image

docker build -t markllama/qpid images/qpid
Sending build context to Docker daemon  2.56 kB
Sending build context to Docker daemon 
Step 0 : FROM fedora:20
 ---> 88b42ffd1f7c
Step 1 : MAINTAINER Mark Lamourine 
 ---> Using cache
 ---> 95516239225e
Step 2 : RUN yum install -y qpid-cpp-server qpid-cpp-server-store python-qpid-qmf python-qpid &&      yum clean all
 ---> Running in 7fc6b6ed2128
Resolving Dependencies
--> Running transaction check
---> Package python-qpid.noarch 0:0.26-2.fc20 will be installed
--> Processing Dependency: python-qpid-common = 0.26-2.fc20 for package: python-qpid-0.26-2.fc20.noarch
---> Package python-qpid-qmf.x86_64 0:0.26-2.fc20 will be installed
--> Processing Dependency: qpid-qmf(x86-64) = 0.26-2.fc20 for package: python-qpid-qmf-0.26-2.fc20.x86_64
--> Processing Dependency: libqmf2.so.1()(64bit) for package: python-qpid-qmf-0.26-2.fc20.x86_64
---> Package qpid-cpp-server.x86_64 0:0.26-11.fc20 will be installed
--> Processing Dependency: qpid(client)(x86-64) = 0.26 for package: qpid-cpp-server-0.26-11.fc20.x86_64
--> Processing Dependency: qpid-proton-c(x86-64) >= 0.5 for package: qpid-cpp-server-0.26-11.fc20.x86_64
...                                     
  python-qpid-common.noarch 0:0.26-2.fc20                                       
  qpid-cpp-client.x86_64 0:0.26-11.fc20                                         
  qpid-proton-c.x86_64 0:0.7-3.fc20                                             
  qpid-qmf.x86_64 0:0.26-2.fc20                                                 

Complete!
Cleaning repos: fedora updates
Cleaning up everything
 ---> d7e61654fb92
Removing intermediate container 7fc6b6ed2128
Step 3 : ADD . /.qpidd
 ---> 10c44a5719a5
Removing intermediate container a8a37c5986a5
Step 4 : WORKDIR /.qpidd
 ---> Running in 2833da1629d9
 ---> 1963a2551db8
Removing intermediate container 2833da1629d9
Step 5 : EXPOSE 5672
 ---> Running in d0d92a1e58ad
 ---> 425ba5994308
Removing intermediate container d0d92a1e58ad
Step 6 : ENTRYPOINT ["/usr/bin/qpidd", "-t", "--auth=no"]
 ---> Running in e678dc1a4b66
 ---> ae30e626e215
Removing intermediate container e678dc1a4b66
Successfully built ae30e626e215

Verifying the Image

With respect to docker, verifying the image is the same as it was for the MongoDB image.

docker run -d --name qpid1 --publish-all markllama/mongodb
1b513bee6d8d5d4328059a059f9520c469ff405228b88370b91bb85ef659b708

Process information

docker ps
CONTAINER ID        IMAGE                      COMMAND                CREATED             STATUS              PORTS                      NAMES
1b513bee6d8d        markllama/qpid:latest      /usr/sbin/qpidd -t -   7 seconds ago       Up 5 seconds        0.0.0.0:49157->5672/tcp    qpid1

Docker logs

docker logs qpid1
2014-09-01 23:32:34 [Model] trace Mgmt create memory. id:amqp-broker
2014-09-01 23:32:34 [Broker] info Management enabled
2014-09-01 23:32:34 [Management] info ManagementAgent generated broker ID: dc7d2
473-58e4-4eea-a21b-46105345054e
...
2014-09-01 23:32:34 [Management] debug ManagementAgent added class org.apache.qp
id.broker:queueThresholdExceeded
2014-09-01 23:32:34 [Model] trace Mgmt create system. id:cfaf5a0f-1291-41e5-b0c0
-e5eb07c77c1e
2014-09-01 23:32:34 [Model] trace Mgmt create broker. id:amqp-broker
2014-09-01 23:32:34 [Model] trace Mgmt create vhost. id:org.apache.qpid.broker:b
roker:amqp-broker,/
2014-09-01 23:32:34 [Security] notice SSL plugin not enabled, you must set --ssl
-cert-db to enable it.
2014-09-01 23:32:34 [Broker] info Loaded protocol AMQP 1.0
2014-09-01 23:32:35 [Store] notice Journal "TplStore": Created
2014-09-01 23:32:35 [Store] notice Store module initialized; store-dir=//.qpidd
2014-09-01 23:32:35 [Store] info > Default files per journal: 8
2014-09-01 23:32:35 [Store] info > Default journal file size: 24 (wpgs)
2014-09-01 23:32:35 [Store] info > Default write cache page size: 32 (KiB)
2014-09-01 23:32:35 [Store] info > Default number of write cache pages: 32
2014-09-01 23:32:35 [Store] info > TPL files per journal: 8
2014-09-01 23:32:35 [Store] info > TPL journal file size: 24 (wpgs)
2014-09-01 23:32:35 [Store] info > TPL write cache page size: 4 (KiB)
2014-09-01 23:32:35 [Store] info > TPL number of write cache pages: 64
2014-09-01 23:32:35 [Model] trace Mgmt create exchange. id:
...
2014-09-01 23:32:36 [Model] trace Mgmt create exchange. id:qmf.default.direct
2014-09-01 23:32:36 [Broker] notice SASL disabled: No Authentication Performed
2014-09-01 23:32:36 [Security] info Policy file not specified. ACL Disabled, no 
ACL checking being done!
2014-09-01 23:32:36 [Security] trace Initialising SSL plugin
2014-09-01 23:32:36 [Network] info Listening to: 0.0.0.0:5672
2014-09-01 23:32:36 [Network] info Listening to: [::]:5672
2014-09-01 23:32:36 [Network] notice Listening on TCP/TCP6 port 5672
2014-09-01 23:32:36 [Store] info Enabling management instrumentation for the sto
re.
...
2014-09-01 23:32:36 [Model] trace Mgmt create store. id:org.apache.qpid.broker:b
roker:amqp-broker
2014-09-01 23:32:36 [Management] debug Management object (V1) added: org.apache.
qpid.legacystore:store:org.apache.qpid.broker:broker:amqp-broker
2014-09-01 23:32:36 [Broker] notice Broker running

The QPID logs will continue accumulating. With the default debug level it reports a lot of connection information.

Connectivity

To test connectivity to the QPID services I use the qpid-config command from the qpid-utils package on Fedora. Install that package to get the command.

qpid-config queues -b guest@127.0.0.1:49157
Queue Name                                Attributes
=================================================================
7783123e-9589-4814-8b7b-b976a576c853:0.0  auto-del excl

This command lists the queues present on the broker. It connects using the guest account and specifies the localhost IPv4 address and the port indicated by the output of the docker ps or docker ports commands.

This is a very simple connectivity test. The single queue is the default for an unused AMQP server. Once the Pulp components connect they will create additional queues.

Running A Shell in an image with an ENTRYPOINT

Using an ENTRYPOINT directive has a couple of effects that you want to be aware of.

On the plus side you can add arguments to the entrypoint binary just by adding them after the image name on the invocation.

One gotcha is that you can't just put /bin/sh after the image to get a shell as you otherwise would. It is very common and convenient to examine an image by running it with a shell, overriding the CMD. Docker provides the --entrypoint option to allow overriding when necessary.

docker run -it --entrypoint /bin/sh markllama/qpid 
sh-4.2# pwd
/.qpidd
sh-4.2# ls
Dockerfile
sh-4.2# exit
exit

Now I have images for both of the secondary services that Pulp needs.

Time to start playing with Kubernetes a bit.

References:

Docker: A simple service container example with MongoDB

2014-08-30T09:54:00.001-07:00

In my previous post I said I was going to build, over time a Pulp repository using a set of containerized service components and host it in a Kubernetes cluster.

A complete Pulp service

The Pulp service is composed of a number of sub-services:

A MongoDB database
A QPID AMQP message broker
A number of Celery processes

1 Celery Beat process
1 Pulp Resource Manager (Celery worker) process
>1 Pulp worker (Celery worker) process

>1 Apache HTTPD - serves mirrored content to clients
>1 Crane service - Docker plugin for Pulp

This diagram illustrates the components and connectivity of a Pulp service as it will be composed in Kubernetes using Docker containers.

Pulp Service Component Structure

The simplest images will be those for the QPID and MongoDB services. I'm going to show how to create the MongoDB image first.

There are several things I will not be addressing in this simple example:

HA and replication
In production the MongoDB would be replicated
In production the QPID AMQP service would have a mesh of brokers
Communications Security
In production the links between components to the MongoDB and the QPID message broker would be encrypted and authenticated.

Key management is actually a real problem with Docker at the moment and will require its own set of discussions.

A Docker container for MongoDB

This post essentially duplicates the instructions for creating a MongoDB image which are provided on the Docker documentation site. I'm going to walk through them here for several reasons. First is for completeness and for practice on the basics of creating a simple image. Second, the Docker example uses Ubuntu for the base image. I am going to use Fedora. In later posts I'm going to be doing some work with Yum repos and RPM installation. Finally I'm going to make some notes which are relevant to the suitability of a container for use in a Kubernetes cluster.

Work Environment

I'm working on Fedora 20 with the docker-io package installed and the docker service enabled and running. I've also added my username to the docker group in /etc/group so I don't need to use sudo to issue docker commands. If your work environment differs you'll probably have to adapt some.

Defining the Container: Dockerfile

New docker images are defined in a Dockerfile. Capitalization matters in the file name. The Dockerfile must reside in a directory of its own. Any auxiliary files that the Dockerfile may reference will reside in the same directory.

The syntax for a Dockerfile is documented on the Docker web site.

This is the Dockerfile for the MongDB image in Fedora 20:

That's really all it takes to define a new container image. The first two lines are the only ones that are mandatory for all Dockerfiles. The rest form the description of the new container.

Dockerfile: FROM

Line 1 indicates the base image to begin with. It refers to an existing image on the official public Docker registry. This image is offered and maintained by the Fedora team. I specify the Fedora 20 version. If I had left the version tag off, the Dockerfile would use the latest tagged image available.

Dockerfile: MAINTAINER

Line 2 gives contact information for the maintainer of the image definition.

Diversion:

Lines 4 and 5 are an unofficial comment. It's a fragment of JSON which contains some information about how the image is meant to be used.

Dockerfile: RUN

Line 7 is where the real fun begins. The RUN directive indicates that what follows is a command to be executed in the context of the base image. It will make changes or additions which will be captured and used to create a new layer. In fact, every directive from here on out creates a new layer. When the image is run, the layers are composed to form the final contents of the container before executing any commands within the container.

The shell command which is the value of the RUN directive must be treated by the shell as a single line. If the command is too long to fit in an 80 character line then shell escapes (\<cr>) and conjunctions (';' or '&&' or '||') are used to indicate line continuation just as if you were writing into a shell on the CLI.

This particular line installs the mongodb-server package and then cleans up the YUM cache. This last is required because any differences in the file tree from the begin state will be included in the next image layer. Cleaning up after YUM prevents including the cached RPMs and metadata from bloating the layer and the image.

Line 10 is another RUN statement. This one prepares the directory where the MongoDB storage will reside. Ordinarily this would be created on a host when the MongoDB package is installed with a little more during the startup process for the daemon. They're here explicitly because I'm going to punch a hole in the container so that I can mount the data storage area from host. The mount process can overwrite some of the directory settings. Setting them explicitly here ensures that the directory is present and the permissions are correct for mounting the external storage.

Dockefile: ADD

Line 14 adds a file to the container. In this case it's a slightly tweaked mongodb.conf file. It adds a couple of switches which the Ubuntu example from the Docker documentation applies using CLI arguments to the docker run invocation. The ADD directive takes the input file from the directory containing the Dockerfile and will overwrite the destination file inside the container.

Lines 16-22 don't add new content but rather describe the run-time environment for the contents of the container.

Dockerfile: VOLUME

Line 16 officially declares that the directory /var/lib/mongodb will be used as a mountpoint for external storage.

Dockerfile: EXPOSE

Line 18 declares that TCP port 21017 will be exposed. This will allow connections from outside the container to access the mongodb inside.

Dockerfile: USER

Line 20 declares that the first command executed will be run as the mongodb user.

Dockerfile: WORKDIR

Line 22 declares that the command will will run in /var/lib/mongodb, the home directory for the mongodb user.

Dockerfile: CMD

The last line of the Dockerfile traditionally describes the default command to be executed when the container starts.

Line 24 uses the CMD directive. The arguments are an array of strings which make up the program to be invoked by default on container start.

Building the Docker Image

With the Dockerfile and the mongodb.conf template in the image directory (in my case, the directory is images/mongodb) I'm ready to build the image. The transcript for the build process is pretty long. This one I include in its entirety so you can see all of the activity that results from the Dockerfile directives.

docker build -t markllama/mongodb images/mongodb
Sending build context to Docker daemon 4.096 kB
Sending build context to Docker daemon 
Step 0 : FROM fedora:20
Pulling repository fedora
88b42ffd1f7c: Download complete 
511136ea3c5a: Download complete 
c69cab00d6ef: Download complete 
 ---> 88b42ffd1f7c
Step 1 : MAINTAINER Mark Lamourine 
 ---> Running in 38db2e5fffbb
 ---> fc120ab67c77
Removing intermediate container 38db2e5fffbb
Step 2 : RUN  yum install -y mongodb-server &&      yum clean all
 ---> Running in 42e55f18d490
Resolving Dependencies
--> Running transaction check
---> Package mongodb-server.x86_64 0:2.4.6-1.fc20 will be installed
--> Processing Dependency: v8 for package: mongodb-server-2.4.6-1.fc20.x86_64
...
Installed:
  mongodb-server.x86_64 0:2.4.6-1.fc20                                          
...

Complete!
Cleaning repos: fedora updates
Cleaning up everything
 ---> 8924655bac6e
Removing intermediate container 42e55f18d490
Step 3 : RUN  mkdir -p /var/lib/mongodb &&      touch /var/lib/mongodb/.keep &&      chown -R mongodb:mongodb /var/lib/mongodb
 ---> Running in 88f5f059c3ff
 ---> f8e4eaed6105
Removing intermediate container 88f5f059c3ff
Step 4 : ADD mongodb.conf /etc/mongodb.conf
 ---> eb358bbbaf75
Removing intermediate container 090e1e36f7f6
Step 5 : VOLUME [ "/var/lib/mongodb" ]
 ---> Running in deb3367ff8cd
 ---> f91654280383
Removing intermediate container deb3367ff8cd
Step 6 : EXPOSE 27017
 ---> Running in 0c1d97e7aa12
 ---> 46157892e3fe
Removing intermediate container 0c1d97e7aa12
Step 7 : USER mongodb
 ---> Running in 70575d2a7504
 ---> 54dca617b94c
Removing intermediate container 70575d2a7504
Step 8 : WORKDIR /var/lib/mongodb
 ---> Running in 91759055c498
 ---> 0214a3fbcafc
Removing intermediate container 91759055c498
Step 9 : CMD [ "/usr/bin/mongod", "--quiet", "--config", "/etc/mongodb.conf", "run"]
 ---> Running in 6b48f1489a3e
 ---> 13d97f81beb4
Removing intermediate container 6b48f1489a3e
Successfully built 13d97f81beb4

You can see how each directive in the Dockerfile corresponds to a build step, and you can see the activity that each directive generates.

When docker processes a Dockerfile what it really does first is to put the base image in a container and run it but execute a command in that container based on the first Docker file directive. Each directive causes some change to the contents of the container.

A Docker container is actually composed of a set of file trees that are layered using a read-only union filesystem with a read/write layer on the top. Any changes go into the top layer. When you unmount the underying layers, what remains in the read/write layer are the changes caused by the first directive. When building a new image the changes for each directive are archived into a tarball and checksummed to produce the new layer and the layer's ID.

This process is repeated for each directive, accumulating new layers until all of the directives have been processed. The intermediate containers are deleted, the new layer files are saved and tagged. The end result is a new image (a set of new layers).

Running the Mongo Container

This simplest test is to for the new container is to try running it and observing what happens.

docker run --name mongodb1 --detach --publish-all  markllama/mongodb
a90b275d00d451fde4edd9bc99798a4487815e38c8efbe51bfde505c17d920ab

This invocation indicates that docker should run the image named markllama/mongodb. When it does, it should detach (run as a daemon) and make all of the network ports exposed by the container available to the host. (that's the --publish-all). It will name the newly created container mongodb1 so that you can distinguish it from other instances of the same image. It also allows you to refer to the container by name rather than needing the ID hash all the time. If you don't provide a name, docker will assign one from some randomly selected words.

The response is a hash which is the full ID of the new running container. Most times you'll be able to get away with a shorter version of the hash (as presented by docker ps. See below) or by the container name.

Examining the Running Container(s)

So the container is running. There's a MongoDB waiting for for a connection. Or is there? How can I tell and how can I figure out how to connect to it?

Docker offers a number of commands to view various aspects of the running containers.

Listing the Running Containers.

To list the running containers use docker ps.

docker ps
CONTAINER ID        IMAGE                      COMMAND                CREATED             STATUS              PORTS                      NAMES
a90b275d00d4        markllama/mongodb:latest   /usr/bin/mongod --qu   5 mins  ago         Up 5 min            0.0.0.0:49155->27017/tcp   mongodb1

This line will likely wrap unless you have a wide screen.

In this case there is only one running container. Each line is a summary report on a single container. The important elements for now are the name, id and the ports summary. This last tells me that I should be able to connect from the host to the container MongoDB using localhost:49155 which is forward to the container's exposed port 27017

What did it do on startup?

A running container has one special process which is sort of like the init process on a host. That's the process indicated by the CMD or ENTRYPOINT directive in the Dockerfile.

When the container starts, the STDOUT of the initial process is connected to the the docker service. I can retrieve the output by requesting the logs.

For Docker commands which apply to single containers the final argument is either the ID or name of a container. Since I named the mongodb container I can use the name to access it.

docker logs mongodb1
Thu Aug 28 20:38:08.496 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/var/lib/mongodb 64-bit host=a90b275d00d4
Thu Aug 28 20:38:08.498 [initandlisten] db version v2.4.6
Thu Aug 28 20:38:08.498 [initandlisten] git version: nogitversion
Thu Aug 28 20:38:08.498 [initandlisten] build info: Linux buildvm-12.phx2.fedoraproject.org 3.10.9-200.fc19.x86_64 #1 SMP Wed Aug 21 19:27:58 UTC 2013 x86_64 BOOST_LIB_VERSION=1_54
Thu Aug 28 20:38:08.498 [initandlisten] allocator: tcmalloc
Thu Aug 28 20:38:08.498 [initandlisten] options: { command: [ "run" ], config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", nohttpinterface: "true", noprealloc: "true", quiet: true, smallfiles: "true" }
Thu Aug 28 20:38:08.532 [initandlisten] journal dir=/var/lib/mongodb/journal
Thu Aug 28 20:38:08.532 [initandlisten] recover : no journal files present, no recovery needed
Thu Aug 28 20:38:10.325 [initandlisten] preallocateIsFaster=true 26.96
Thu Aug 28 20:38:12.149 [initandlisten] preallocateIsFaster=true 27.5
Thu Aug 28 20:38:14.977 [initandlisten] preallocateIsFaster=true 27.58
Thu Aug 28 20:38:14.977 [initandlisten] preallocateIsFaster check took 6.444 secs
Thu Aug 28 20:38:14.977 [initandlisten] preallocating a journal file /var/lib/mongodb/journal/prealloc.0
Thu Aug 28 20:38:16.165 [initandlisten] preallocating a journal file /var/lib/mongodb/journal/prealloc.1
Thu Aug 28 20:38:17.306 [initandlisten] preallocating a journal file /var/lib/mongodb/journal/prealloc.2
Thu Aug 28 20:38:18.603 [FileAllocator] allocating new datafile /var/lib/mongodb/local.ns, filling with zeroes...
Thu Aug 28 20:38:18.603 [FileAllocator] creating directory /var/lib/mongodb/_tmp
Thu Aug 28 20:38:18.629 [FileAllocator] done allocating datafile /var/lib/mongodb/local.ns, size: 16MB,  took 0.008 secs
Thu Aug 28 20:38:18.629 [FileAllocator] allocating new datafile /var/lib/mongodb/local.0, filling with zeroes...
Thu Aug 28 20:38:18.637 [FileAllocator] done allocating datafile /var/lib/mongodb/local.0, size: 16MB,  took 0.007 secs
Thu Aug 28 20:38:18.640 [initandlisten] waiting for connections on port 27017

This is just what I'd expect for a running mongod.

Just the Port Information please?

If I know the name of the container or its ID I can request the port information explicitly. This is useful when the output must be parsed, perhaps by a program that will create another container needing to connect to the database.

docker port mongodb1 27017
0.0.0.0:49155

But is it working?

Docker thinks there's something running. I have enough information now to try connecting to the database itself. From the host I can try connecting to the database itself.

The ports information indicates that the container port 27017 is forward to the host "all interfaces" port 49155. If the host firewall allows connections in on that port the database could be used (or attacked) from outside.

echo "show dbs" | mongo localhost:49155
MongoDB shell version: 2.4.6
connecting to: localhost:49155/test
local 0.03125GB
bye

What next?

At this point I have verified that I have a running MongoDB accessible from the host (or outside if I allow).

There's lots more that you can do and query about the containers using the docker CLI command, but there's no need to detail it all here. You can learn more from the Docker documentation web site

Before I start on the Pulp service proper I also need a QPID service container. This is very similar to the MongoDB container so I won't go into detail.

Since the point of the exercise is to run Pulp in Docker with Kubernetes, the next step will be to run the MongoDB and QPID containers using Kubernetes.

Intro to Containerized Applications: Docker and Kubernetes

2014-08-27T16:47:00.000-07:00

Application Virtualization

In a world where a Hot New Thing In Tech is manufactured by marketing departments on demand for every annual trade show in every year there's something that is is stirring up interest all by itself (though it has it's own share of marketing help) The idea of application containers in general and Docker specifically has become a big deal in the software development industry in the last year.

I'm generally pretty skeptical of the hype that surrounds emerging tech and concepts (see DevOps) but I think Docker has the potential to be "disruptive" in the business sense of causing people to have to re-think how they do things in light of new (not yet well understood) possibilities.

In the next few posts (which I hope to have in rapid succession now) I plan to go into some more detail about how to create Docker containers which are suitable for composition into a working service within a Kubernetes cluster. The application I'm going to use is Pulp, a software repository mirror with life-cycle management capabilities. It's not really the ideal candidate because of some of the TBD work remaining in Docker and Kubernetes, but it is a fairly simple service that uses a database, a messaging service and shared storage. Each of these brings out capabilities and challanges intrinsic in building containerized services.

TL;DR.

Let me say at the outset that this is a long more philosophical than technical post. I'm going to get to the guts of these tools in all their gooey glory but I want to set myself some context before I start. If you want to get right to tearing open the toys, you can go straight to the sites for Docker and Kubernetes:

Docker - Containerized applications
http://www.docker.com
Kubernetes - Clustering container hosts
https://github.com/GoogleCloudPlatform/kubernetes

The Obligatory History Lesson

For 15 years, since the introduction of VMWare Workstation in 1999 [1], the primary mover of cloud computing has been the virtual machine. Once the idea was out there a number of other hardware virtualization methods were created: Xen, and KVM for Linux and Microsoft Hyper-V on Windows. Some of this software virtualization of hardware caused some problems on the real hardware so in 2006 both Intel and AMD introduced processors with special features to improve the performance and behavior of virtual machines running on their real machines. [2]

All of these technologies have similar characteristics. They also have similar benefits and gotchas.

The computer which runs all of the virtual machines (henceforth: VMs) is known as the host. Each of the VM instances is known as a guest. The guests each use one or more (generally very large) files in the host disk space which contains the entire filesystem of the guest. While each guest is running they typically consume a single (again,very large) process on the host.Various methods are used to grant the guest VMs access to the public network both for traffic out of and into the VM. The VM process simulates and entire computer so that for most reasonable purposes it looks and behaves as if it's a real computer.

This is very different from what has become known as multi-tenant computing. This is the traditional model in which each computer has accounts and users can log into their account and share (and compete for) the disk space and CPU resources. They also often have access to the shared security information. The root account is special and it's a truism among sysadmins that if you can log into a computer you can gain root access if you try hard enough.

Sysadmins have to work very hard in multi-tenant computing environments to prevent both malicious and accidental conflicts between their users' processes and resource use. If, instead of an account on the host, you give each user a whole VM, the VM provides a nice (?) clean (?) boundary (?) between each user and the sensitive host OS.

Because VMs are just programs, it is also possible to automate the creation and management of user machines. This is what has made possible modern commercial cloud services. Without virtualization, on-demand public cloud computing would be unworkable.

There are a number of down-sides to using VMs to manage user computing. Because each VM is a separate computer, each one must have an OS installed and then applications installed and configured. This can be mitigated somewhat by creating and using disk images. This is the equivalent of the ancient practice of creating a "gold disk" and cloning it to create new machines. Still each VM must be treated as a complete OS requiring all of the monitoring and maintenance by a qualified system administrator that a bare-metal host needs. It also contains the entire filesystem of a bare-metal server and requires comparable memory from its host.

Docker

For the buzzword savvy Docker is a software containerization mechanism. Explaining what that means takes a bit of doing. It also totally misses the point, because the enabling technology is totally unimportant. What matters is what it allows us to do. But first, for the tech weenies among you....

Docker Tech: Cgroups, Namespaces and Containers

Ordinary Linux processes have a largely unobstructed view of the resources available from the operating system. They can view the entire file system (subject to user and group access control). They have access to memory and to the network interfaces. They also have access to at least some information about the other processes running on the host.

Docker takes advantage of cgroups and kernel namespaces to manipulate the view that a process has of its surroundings. A container is a view of the filesystem and operating system which is a carefully crafted subset of what an ordinary process would see. Processes in a container can be made almost totally unaware of the other processes running on the host. The container presents a limited file system tree which can entirely replace what the process would see if it were not in a container. In some ways this is like a traditional chroot environment but the depth of the control is much more profound.

So far, this does look a lot like Solaris Containers[3], but that's just the tech, there's more.

The Docker Ecosystem

The real significant contribution of Docker is the way in which containers and their contents are defined and then distributed.

It would take a Sysadmin Superman to manually create the content and environmental settings to duplicate what Docker does with a few CLI commands. I know some people who could do it, but frankly it probably wouldn't be worth the time spent even for them. Even I don't really want to get that far into the mechanics (though I could be convinced if there's interest). What you can do with it though is pretty impressive.

Note: Other people describe better than I could how to install Docker and prepare it for use. Go there, do that, come back.

Hint: on Fedora 20+ you can add your use to the "docker" line in /etc/group and avoid a lot of calls to sudo when running the docker command.

To run a Docker container you just need to know the name of the image and any arguments you want to pass to the process inside. The simplest images to run are the ubuntu and fedora images:

docker run fedora /bin/echo "Hello World"
Unable to find image 'fedora' locally
Pulling repository fedora
88b42ffd1f7c: Download complete
511136ea3c5a: Download complete
c69cab00d6ef: Download complete
Hello World

Now honestly, short of a Java app that's probably the heaviest weight "Hello World" you've ever done. What happened was, your local docker system looked for a container image named "fedora" and didn't find one. So it went to the official Docker registry at docker.io and looked for one there. It found it, downloaded it and then started the container and ran the shell command inside, returning the STDOUT to your console.

Now look at those three lines following the "Pulling repository" output from the docker run command.

A docker "image" is a fiction. Nearly all images are composed of a number of layers. The base layer or base image usually provides the minimal OS filesystem content, libraries and such. Then layers are added for application packages or configuration information. Each layer is stored as a tarball with the contents and a little bit of metadata which indicates, among other things, the list of layers below it.. Each layer is given an ID based on a hash of the tarball so that each can be uniquely identified. When an "image" is stored on the Docker registry, it is given a name and possibly a label so that it can be retrieved on demand.

In this case Docker downloaded three image layers and then composed them to make the fedora image and then ran the container and executed /bin/echo inside it.

You can view the containers that are or have been run on your system with docker ps.

docker ps -l
CONTAINER ID IMAGE          COMMAND             CREATED        STATUS                     PORTS  NAMES
612bc60ede7a fedora:20     /bin/echo 'hello wor 7 minutes ago  Exited (0) 7 minutes ago          naughty_pike

You output will very likely be wrapped around unless you have a very wide terminal screen open. The -l switch tells docker only to print information about the last container created.

You can also run a shell inside the container so you can poke around. The -it switches indicate that the container will be run interactively and that it should be terminated when the primary process exits.

docker run -it fedora /bin/sh
sh-4.2# ls
bin etc lib lost+found mnt proc run srv tmp var
dev home lib64 media    opt root sbin sys usr
sh-4.2# ps -ef
 PID TTY     TIME     CMD
   1 ?       00:00:00 sh
   8 ?       00:00:00 ps
sh-4.2# df -k
Filesystem 1K-blocks   Used     Available Use% Mounted on
/dev/mapper/docker-8:4-2758071-97e6230110ded813bff36c0a9a397d74d89af18718ea897712a43312f8a56805 10190136 429260 9220204 5% \
tmpfs      24725556    0         24725556   0% /dev
shm        65536       0            65536   0% /dev/shm
/dev/sda4  132492664   21656752 104082428  18% /etc/hosts
tmpfs      24725556    0         24725556   0% /proc/kcore
sh-4.2# exit

That's three simple commands inside the container. The file system at / seems to be fairly ordinary for a complete (though minimal) operating system. It shows that there appear to be only two processes running in the container, though, and the mounted filesystems are a much smaller set than you would expect.

Now that you have this base image, you can use it to create new images by adding layers of your own. You can also register with docker.io so that you can push the resulting images back out and make them available for others to use.

These are the two aspects of Docker that make it truly significant.

From Software Packaging to Application Packaging

Another historic diversion. Think about this: how do we get software?

Tarballs to RPMs (and Debs)

Back in the old days we used to pass around software using FTP and tarballs. We built it ourselves with a compiler.compress, gzip, configure and make made it lots faster but not easier. At least for me, Solaris introduced software packages, bundles of pre-compiled software which included dependency information so that you could just ask for LaTeX and you'd get all of the stuff you needed for it to work without having to either rebuild it or chase down all the broken loose ends.

Now, many people have problems with package management systems. Some people have favorites or pets, but I can tell you from first hand experience, I don't care which one I have, but I don't want not to have one. (yes, I hear you Gentoo, no thanks)

For a long time software binary packages were the only way to deliver software to an OS. You still had to install and configure the OS. If you could craft the perfect configuration and you had the right disks you could clone your working OS onto a new disk and have a perfect copy. Then you had to tweak the host and network configurations, but that was much less trouble than a complete re-install.

Automated OS Installation

Network boot mechanisms like PXE and software installation tools, Jumpstart, Kickstart/Anaconda, AutoYAST and others made the Golden Image go away. They let you define the system configuration and then would automate the installation and configuration process for you*. You no longer had to worry about cloning and you didn't have to do a bunch of archaeology on your golden disk when it was out of date and you needed to make a new one. All of your choices were encapsulated in your OS config files. You could read them, tweak them and run it again.

* yes, I didn't mention Configuration Management, but that's really an extension of the boot/install process in this case, not a fundamentally different thing.

In either case though, if you wanted to run two applications on the same host, the possibility existed that they would collide or interfere with each other in some way. Each application also presented a potential security risk to the others. If you crack the host using one app you could fairly surely gain access to everything else on the host. Even inadvertent interactions could cause problems that would be difficult to diagnose and harder to mitigate.

Virtual Disks and the Rebirth of the Clones

With the advent of virtual machines, the clone was back, but now it was called a disk image. You could just copy the disk image to a host and boot it in a VM. If you want more you make copies and tweak them after boot time.

So now we had two different delivery mechanisms: Packages for software to be installed (either on bare metal or in a VM) and disk images for completed installations to be run in a VM. That is: unconfigured application software or fully configured operating systems.

You can isolate applications on one host by placing them into different VMs. But this means you have to configure not one, but three operating systems to build an application that requires two services. That's three ways to get reliability and security wrong. Three distinct moving parts that require a qualified sysadmin to manage them and at least two things which the Developer/Operators will need to access to make the services work.

Docker offers something new. It offers the possibility of distributing just the application. *

* Yeah, there's more overhead than that, but nearly a complete VM and layers can be shared.

Containerization: Application Level Software Delivery

Docker offers the possibility of delivering software in units somewhere between the binary package and the disk image. Docker containers have isolation characteristics similar to apps running in VMs without the overhead of a complete running kernel in memory and without all of the auxiliary services that a complete OS requires.

Docker also offers the capability for developers of reasonable skill to create and customize the application images and then to compose them into complex services which can then be run on a single host, or distributed across many.

The docker registry presents a well-known central location for developers to push their images and name them so that consumers can find them, download them and use them without additional interaction. Because the application has been tested in the container, the developer can be sure that she's identified all of the configuration information that might need to be passed in and out. She can explicitly document that, removing many opportunities for misconfiguration or adverse interactions between services on the same host.

It's the dawning of a new day.

If only it were that easy.

Here There Be Dragons

When a new day dawns on an unfamiliar landscape it slowly reveals a new vista to the eye. If you're in a high place you might see far off, but nearer things could be hidden under the canopy of trees or behind a fold in the land, so that when you actually step down and begin exploring you encounter surprises.

When ever a new technology appears people tend to try to use it the same way they're used to using their older tools. It generally takes a while to figure out the best way to use a new tool and to come to terms with its differentness. There's often a lot of exploring and a fair number of false starts and retraced steps before the real best uses settle out.

Docker does have some youthful shortcomings.

Docker is marvelously good at pulling images from a specific repository (known to the world as the Docker.io Registry) and running them on a specific host to which you are logged on. It's also good at pushing new images to the docker registry. These are both very localized point-to-point transactions.

Docker has no awareness of anything other than the host it is running on and the docker registry. It's not aware of other docker hosts nearby. It's not aware of alternate registries. It's not even aware of the resources in containers on the same host that container's might want to share.

The only way to manage a specific container is to log onto its host and run the docker command to examine and manipulate it.

The first thing anyone wants to do when they create a container of any kind is to punch holes in it. What good is a container where you can't reach the contents? Sometimes people want to see in. Other times people want to insert things that weren't there in the first place. And they want to connect pipes between the containers, and from the containers to the outside world.

Docker does have ways of exposing specific network ports from a container to the host or to the host's external network interfaces.

It can import a part of the host filesystem into a container. It also has ways to share storage between two containers on the same host. What it doesn't have is a way to identify and use storage which can be shared between hosts. If you want to have a cluster of docker hosts where the containers can share storage, this is a problem.

It also doesn't have a means to get secret information from... well anywhere... safely from its hidey hole into the container. Since it's trivial for anyone to push an image to the public registry, it's really important not to put secret information into any image, even one that's going to be pushed to a private registry.

As noted, Docker does what it does really well. The developers have been very careful not to try to over reach, and I agree with most of their decisions. The issues I listed above are not flaws in Docker, they are mostly tasks that are outside Docker's scope. This keeps the Docker development drive focused on the problems they are trying to solve so they can do it well.

To use Docker on anything but a small scale you need something else. Something that is aware of clusters of container hosts, the resources available to each host and how to bind those resources to new containers regardless of which host ends up holding the container. Something that is capable of describing complex multi-container applications which can be spread across the hosts in a cluster and yet be properly and securely connected.

Read on.

Kubernetes

Who might want to run vast numbers of containerized applications spread over multiple enormous host clusters without regard to network topology or physical geography? Who else? Google.

Kubernetes is Google's response to the problem of managing Docker containers on a scale larger than a couple of manually configured hosts. It, like Docker is a young project and there are an awful lot of TBDs, but there's a working core and a lot of active development. Google and the other partners that have joined the Kubernetes effort have very strong motivation to make this work.

Kubernetes is made up of two service processes that run on each Docker host (in addition to the dockerd). The etcd binds the hosts into a cluster and distributes the configuration information. The kubelet daemon is the active agent on each container host which responds to requests to create, monitor and destroy containers. In Kubernetes parlance, a container host is known as a minion.

The etcd service is taken from CoreOS which is an attempt at application level software packaging and system management that predates Docker. CoreOS seems to be adopting Docker as its container format.

There is one other service process, the Kubernetes app-service which acts as the head node for the cluster. The app-service accepts commands from users and forwards them to the minions as needed. Any host running the Kubernetes app-server process is known as a master.

Clients communicate with the masters using the kubecfg command.

A little more terminology is in order.

As noted, container hosts are known as minions. Sometimes several containers must be run on the same minion so that they can share local resources. Kubernetes introduces the concept of a pod of containers to represent a set of containers that must run on the same host. You can't access individual containers within a pod at the moment (there are lots more caveats like this. It is a REALLY young project).

Installing Kubernetes is a bit more intense than Docker. Both Docker and Kubernetes are written in Go. Docker is mature enough that it is available as binary packages for both RPM and DEB packaged Linux distributions. (see your local package manager for docker-io and it's dependencies.)

The simplest way to get Kubernetes right now is to run it in VirtualBox VMs managed by Vagrant. I recommend the Kubernetes Getting Started Guide for Vagrant . There's a bit of assembly required.

Hint: once it's built, I create an alias to the cluster/kubecfg.sh so I don't have to put it in my path or type it out every time.

I'm not going to show very much about Kubernetes yet. It doesn't really make sense to run any interactive containers like the "Hello World" or the Fedora 20 shell using Kubernetes. It's really for running persistent services. I'll get into it deeply in a coming post. For now I'll just walk through the Vagrant startup and simple queries of the test cluster.

$ vagrant up
Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'minion-1' up with 'virtualbox' provider...
Bringing machine 'minion-2' up with 'virtualbox' provider...
Bringing machine 'minion-3' up with 'virtualbox' provider...

==> master: Importing base box 'fedora20'...
...
master:
==> master: Summary
==> master: -------------
==> master: Succeeded: 44
==> master: Failed:     0
==> master: -------------
==> master: Total:     44
==> master:
==> minion-1: Importing base box 'fedora20'...
Progress: 90%
...
==> minion-3: Complete!
==> minion-3: * INFO: Running install_fedora_stable_post()
==> minion-3: disabled
==> minion-3: ln -s '/usr/lib/systemd/system/salt-minion.service' '/etc/systemd/system/multi-user.target.wants/salt-minion.service'
==> minion-3: INFO: Running install_fedora_check_services()
==> minion-3: INFO: Running install_fedora_restart_daemons()
==> minion-3: * INFO: Salt installed!

At this point there are only three interesting commands. They show the set of minions in the cluster, the running pods and the services that are defined. The last two aren't very interesting because there aren't any pods or services.

$ cluster/kubecfg.sh list minions
minions
----------
10.245.2.2
10.245.2.3
10.245.2.4

$ cluster/kubecfg.sh list pods

Name                Image(s)            Host                Labels

----------          ----------          ----------          ----------


$ cluster/kubecfg.sh list services

Name                Labels              Selector            Port

----------          ----------          ----------          ----------

We know about minions and pods. In Kubernetes a service is actually a port proxy for a TCP port. This allows kubernetes to place service containers arbitrarily while still allowing other containers to connect to them by a well known IP address and port. Containers that wish to accept traffic for that port will use the selector value to indicate that. They service will then forward traffic to those containers.

Right now Kubernetes accepts requests and prints reports in structured data formats, JSON or YAML. To create a new pod or service, you describe the new object using one of these data formats and then submit the description with a "create" command.

Summary

I think software containers in general and Docker in particular have a very significant future. I'm not a big bandwagon person but I think this one is going to matter.

Docker's going to need some more work itself and it's going to need a lot of infrastructure around it to make it suitable for the enterprise and for public cloud use. Kubernetes is one piece that will make using Docker on a large scale possible.

See you soon.

References

[1] VMWare - https://en.wikipedia.org/wiki/Vmware#History
[2] X86 Hardware Virtualization - https://en.wikipedia.org/wiki/X86_virtualization
[2] Solaris Containers - https://en.wikipedia.org/wiki/Solaris_Containers

Hey! I'm Back (and the Cloud is Bigger than Ever)

2014-08-18T08:20:00.000-07:00

After a few months trying to do Businessy things that I don't think I'm very good at it looks like I could be back in the software dev/sysadmin space again for a while.

You'll notice a name change on the blog: It's no longer just about OpenShift. Red Hat is getting into a number of related and extremely innovative and promising projects and trying to make them work together. I'm working to assist on a number of these projects where an extra hand is needed and I get to learn all kinds of cool stuff in the process.

The projects all revolve around one form of "virtualization" or another and all of the efforts are on taking these tools and using them to create enterprise class services.

OpenStack

OpenStack is essentially Amazon Web Services(r) for on-premise use. To put it another way, it's an attempt to mechanize all of the functions of all of the groups in a typical enterprise IT department: networking, data center host management, OS and application provisioning, storage management, database services, user management and policies and more.

Merely replacing all of the people in an organization that do these things would be boring (and counterproductive). What OpenStack really offers is the ability to push control of the resources closer to the real user, offering self-service access to things which used to require coordination between experts and representatives from a number of different groups with the expected long lead times. The ops people can focus on making sure there are sufficient resources to work, and the users, the developers and the applications admins can just take what they need (subject to policy) to do their work.

Now that's nice for the end user. They get a snazzy dashboard and near-instant response to requests. But the life of the sysadmin hasn't really changed, just the parts they run. The sysadmin still has to create, monitor and support multiple complex services on real hardware. She also can't easily delegate the parts to the old traditional silos. The sysadmin can't be just concerned with hardware and OS and NIC configuration. The whole network fabric (storage too) all has to be understood by everyone on the design, deployment and operations team(s). Message to sysadmins: Don't worry one bit about job security, so long as you keep learning like crazy.

Docker

Docker (and more generally "containerization") is the current hot growth topic.

Many people are now familiar with Virtual Machines. A virtual machine is a process running on a host machine which simulates another (possibly totally different) computer. The virtual machine software simulates a whole computer right down to mimicking hardware responses. From inside the virtual machine it looks like you have a complete real computer at your disposal.

The downside is that VMs require the installation and management of a complete operating system withing the virtual machine. VMs allow isolation but have a lot of heft to them. The host machine has to be powerful enough to contain whole other computers (sometimes many of them) while still doing it's own job.

Docker uses some newish ideas to offer a middle ground between traditional multi-tenent computing, where a number of unrelated (and possibly conflicting) services run as peers on a single computer and the total isolation (and duplication) that VMs require.

The enabling technology is known as cgroups and specifically kernel namespaces. The names are unimportant really. What namespaces do is to allow the host operating system to provide each process with a distinct carefully tuned view of the parts of the host that the process needs to do its job. The view is called a container and any processes which run in the container can interact with each other as normal. However they are entirely unaware of any other processes running on the host. In a sense containers act as blinders, protecting processes running on the same host from each other by preventing them from even seeing each other.

Docker is a container service which standardizes and manages the development, creation and deployment of the containers and their contents in a clear and unified way. It provides a means to create a single-purpose container for, say, a database service and then allows the

Kubernetes

While Docker itself is cool, it really focuses on the environment on a single host and on individual images and containers. Kubernetes is a project initiated at Google but adopted by a number of other software and service vendors. Kubernetes aims to provide a way for application developers to define and then deploy complex applications composed of a number of Docker containers and potentially spread over a number of container hosts.

I think Kubernetes (or something like it) is going to have a really strong influence on the acceptance and use of containerized applications. It's likely to be the face most application operations teams see on the apps they deploy. It's going to be critical both for both the Dev and Ops elements because it's going to be critical to the design and deployment of complex applications.

As a sysadmin this is where my strongest interest is. Docker and Atomic are parts, Kubernetes is the glue.

Project Atomic

And where do you put all those fancy complex applications you've created using Docker and defined using Kubernetes? Project Atomic is a Red Hat project to create a hosting environment specifically for containerized applications.

Rather than running (I mean: installing, configuring and maintaining) a general purpose computer running the Docker daemon and a Kubernetes agent and all of the other attendant internals, Project Atomic will provide a host definition tuned for use as a container host. A general purpose OS installation often has a number of service components which aren't necessary and may even pose a hazard to the container services. Project Atomic is building an OS image designed to do one thing: Run containers.

Atomic is itself a stripped down general purpose OS. It can run on bare metal, or on OpenStack or even on public cloud services like AWS or Rackspace or Google Cloud.

Go(lang)

It's been a long time since I worked in a system level language. Go (or golang to distinguish it from the venerable Chinese strategy board game) is a new environment created by a couple of the luminaries of early Unix, Robert Griesemer, Rob Pike, and Ken Thompson at Google. It aims to address some of the shortcomings of C in the age of distributed and concurrent programming, neither of which really existed when C was created.

Docker and several other significant new applications are written in Go and it's catching on with system level developers. I quickly bumped up on my scripting language habits when I started getting into Go and I was reminded of why system languages are still important. It's refreshing to know I can still think at that level.

I think Go is going to spread quickly in the next few years and I'm going to learn to work with it along with the common scripting environments.

Look Up: There's more than one kind of cloud.

In the past I've been focused on one product and one aspect of Cloud Computing. Make no mistake, Cloud Computing is still in it's infancy and we're still learning what kind of thing it wants to grow up into. The range of enterprise deployment models is getting bigger. Applications can be delivered as traditional software, as VM images for personal or enterprise use (VirtualBox and Vagrant to OpenStack to AWS) and now as containers which sit somewhere in between. Each has its own best uses and we're still exploring the boundaries.

So now I'm going to branch out too and look at each of these and look at all of them. My focus is still going to be what's going on inside, the place where you can stick your hand in and lose fingers. Lots of other people are talking about the glossy paint job and the snazzy electronic dashboard. I'll leave that to them.

Tut Tut... it looks like rain....(but I like the rain)

References

OpenShift - "Platform as a Service" - Developer/App Ops environment
https://www.openshift.com/
OpenStack - Automated Self/Service "Everything your IT Departement Does"
http://www.openstack.org/
Docker - Linux Application and Service containers - "intermediate virtualization"?
https://www.docker.com/
Project Atomic - A minimal tuned Linux image for running containerized applications
http://www.projectatomic.io/
Kubernetes - Deployment orchestration for containerized applications
https://github.com/GoogleCloudPlatform/kubernetes
The Foreman - OS deployment (and much more!) service
http://theforeman.org/

Pulp - Enterprise software content mirroring
http://www.pulpproject.org/
Katello - Enterprise OS management
http://www.katello.org/

Go(Lang) - A modern system-level programming language
http://golang.org/
Vagrant - managing a complex virtual development environment
http://www.vagrantup.com/

Hanging up my creeper and closing up shop.

2014-01-17T13:54:00.002-08:00

I dunno if I have any fans of these blog posts, but if I do, I want to let you know that there aren't likely to be any more here. I've recently been moved into another group and I won't be working directly with OpenShift anymore. Never fear, the rest of the team is working like crazy as they always have to bring you great stuff and there are some real cool things coming down the line. Feel free to ask questions about what's here. I can certainly answer those until they become stale because the software has moved on. It's been fun. - Mark

OpenShift Service Development: Building a Build Box

2013-12-13T07:03:00.000-08:00

I found this week that I needed to have build box so that I could repeatedly run the dev/build/install/test cycle. I've messed around with it on and off since i started working on OpenShift but I looked back and realized that I've never written a procedure for creating the build box. So here it is.

The build process takes source code from a git repository and transforms it into packages. Finally it places the packages into an install repository so that they will be available to the target hosts via yum. The yum repository is published by a small web server. It doesn't need to be fancy as it's just flat files.

The instructions here are for Fedora 18 or 19. There are some special considerations for RHEL6 or CentOS6. These are detailed in a section at the bottom of this post. There are notes inline for when the process is different for RHEL.

There are also some considerations for creating a build box in AWS EC2 (or any managed hosting service).

Install the build/publish software

On a minimal install of Fedora, install the base packages needed for the build service. Git to retrieve the source code, tito to build the RPMs, and thttpd to serve the YUM repository to the install targets.

sudo yum install git tito thttpd firewalld

(On RHEL6, enable EPEL repository and skip firewalld)

Create the YUM repository root directory

Next, create a location for the YUM repository. Place it in a space where thttpd will find it and make it writable by the build user (assumed to be the current user)

sudo mkdir /var/www/thttpd/tito
sudo chown $(id --name --user):$(id --name --group) /var/www/thttpd/tito

Enable Web Services

I have to publish the packages to the install hosts once they're built. I need a web server and on Fedora, I need the firewall daemon running and configured to allow HTTP communications.

sudo systemctl enable thttpd
sudo systemctl start thttpd

sudo systemctl enable firewalld
sudo systemctl start firewalld

# Open the port for now
sudo firewall-cmd --zone public --add-service http

# Make the change persistent across reboots
sudo firewall-cmd --zone public --add-service http --permanent

Configure Tito Output Location

Tito places the build results and RPMs in /tmp/tito by default. I can set the target location using the titorc file.

echo "RPMBUILD_BASEDIR=/var/www/thttpd/tito/" > $HOME/.titorc

Retrieve the Source Code Repository

Now that the publication and build services are prepared, it's time to actually get the software source code.

git clone https://github.com/openshift/origin-server.git

If you are doing development, substitute your own fork and branch. If you are doing your editing on the build box (not really recommended, but slightly time saving) you can also use the git: (ssh) protocol and add your github user SSH key so that you can both pull and push changes.

Install Package Build Requirements

Before you can build packages, you must also install any build requirements for the packages. The bourne shell code snippet below will walk the source code tree, find each package root and install all of the build requirements it finds using yum-builddep.

This triggers off the presence of a .spec file in the root of a package tree. It's critical as a package developer to note all build requirements in the .spec file.

for SPECPATH in $(find origin-server -name \*.spec)
do
    PKGDIR=$(dirname $SPECPATH )
    SPECFILE=$(basename $SPECPATH)
    (cd $PKGDIR ; sudo yum-builddep -y $SPECFILE )
done

Build All Packages

Now that all the build requirements are installed, it's time to build the software.

The bourne shell snippet below will walk the entire source tree and locate the root of each package tree and run tito to build the package. If you are building test packages, uncomment the TEST assignment line.

# TEST=--test
# SCL=--scl=ruby193 # for RHEL6
for SPECPATH in $(find origin-server -name \*.spec)
do
  PKGDIR=$(dirname $SPECPATH )
  SPECFILE=$(basename $SPECPATH)
  (cd $PKGDIR ; tito build --rpm $TEST $SCL)
done
createrepo /var/www/thttpd/tito

This snippet walks the source tree and runs tito to build each package. The last line rebuilds the YUM repository metadata from the packages present.

You can build a single package by moving to the root of the package tree and running tito manually. You'll also have to re-run createrepo each time you update a package. If you've rebuilt a package but yum claims there's no update available check that.

Which reminds me, if you're using yum for frequent updates (more than once a day), you'll also have to clear the metadata on the client machine so that it sees the updated packages

client# sudo yum clean metadata

Building Test Packages

Tito builds not from the most recent commit. That is it ignores files in the workspace which have not been committed. It also requires at least one initial tito tag to operate.

Tito builds test packages by creating a temporary commit and tag. This allows it to create a package with a unique name for each test build. Each time you make a change and rebuild, a serial number is auto-incremented so that yum will see the new package as an 'update' and accept it in preference to any currently installed version.

Configuring a yum repo on the install host

You can supersede the stock Fedora or RHEL OpenShift package repositories by placing a new repo file in /etc/yum.repos.d

/etc/yum.repos.d/openshift_buildtest.repo
[openshift_buildtest]
name=OpenShift Build/Test repository
baseurl=http://build.example.com/tito/
enabled=1
gpgcheck=0

Considerations for RHEL6

There are two significant differences between Fedora and RHEL6 when creating a build box.

Firewall and Services on RHEL6

On RHEL6 systemd and firewalld are not available. Use iptables and lokkit instead of firewalld and firewall-cmd to open the TCP port for HTTP. Use service and chkconfig instead of systemctl to control services.

# Open Firewall for HTTP
sudo lokkit --service=http

# start and enable thttpd
chkconfig thttpd on
service thttpd start

RHEL6, OpenShift, Ruby/Rails versions and Software Collections

RHEL6 is .. special. OpenShift is written in Ruby 1.9.3 and Rails 3. These didn't exist or weren't stable when RHEL6 was created. Different Ruby versions don't play nicely on a single system (there have been at least 3 attempts I can find to get Ruby 1.8 and 1.9 to co-exists like Python 2 and 3. All have thrown up their hands in frustration). Given that, heroic measures were required to get them to run on RHEL6. Those heroic measures are called Software Collections. better known as "SCL".

What SCL does is provide a means to repackage software and run it in a special environment that isolates it from the rest of the system. The SCL team has re-packaged over 500 packages to run in the ruby193 environment on RHEL6. These are all needed to run OpenShift on RHEL6. They're also needed to build OpenShift for RHEL6.

Fortunately, the SCL and OpenShift teams have kindly provided a YUM repository for them. When you add the OpenShift dependencies repository to your YUM repo configurations all of the build dependencies will resolve. They've also added a switch to tito so that it will run your builds inside the SCL environment.

/etc/yum.repos.d/openshift-dependencies.repo
[openshift-dependencies]
name=OpenShift Dependencies
baseurl=http://mirror.openshift.com/pub/openshift-origin/nightly/rhel-6/dependen
cies/$basearch/
enabled=1
gpgcheck=0

There are other things in those dependencies repositories. There are a large number of update packages which OpenShift needs but which have not yet appeared upstream. On Fedora you'll still need the dependencies YUM repository to create the runtime hosts but you don't need them to build the OpenShift packages.

Considerations for EC2 hosting

If you're placing your build box in AWS EC2 there are a couple of additional things to consider:

EC2 security_policy must allow HTTP (port 80/TCP)
Your build instance must be created with a security policy which allows port 80/TCP for your install hosts.
Internal Hostname
EC2 hosts have an internal and external hostname. Both names are dynamic (unless you assign ElasticIP). If your install hosts are also on EC2 you can use the internal hostname and IP address for the security_policy.
External Hostname
If your install boxes are not hosted in EC2 then you must allow all hosts on port 80 TCP and note the EC2 public hostname so that the install hosts can access the build host web server.

OpenShift Source Code Repositories

origin-server -
http://github.com/openshift/origin-server
rhc -
http://githhub.com/openshift/rhc
origin-dependencies (SRPM repository)
http://mirror.openshift.com/pub/openshift-origin/nightly/fedora-latest/dependencies/SRPMS/

OpenShift Dependencies RPM Repositories

Fedora -
http://mirror.openshift.com/pub/openshift-origin/nightly/fedora-latest/dependencies/$basearch
RHEL6 -
http://mirror.openshift.com/pub/openshift-origin/nightly/rhel-6/dependencies/$basearch

References

git - http://git-scm.com/
github - https://github.com
thttpd - http://www.acme.com/software/thttpd/
firewalld - https://fedoraproject.org/wiki/FirewallD
tito - http://linux.die.net/man/8/tito
Software Collections (SCL) - https://fedorahosted.org/SoftwareCollections/

Diversion: Kerberos (FreeIPA) in AWS EC2

2013-10-11T12:30:00.002-07:00

One of the things many people are asking for in OpenShift is alternate ways of authenticating SSH and git interactions with the applications gears. Since I'm doing my development work in EC2, I thought that was surely the right place to try it out. Well as usual, it didn't work out quite as simple as I'd planned.

This post isn't about OpenShift directly. It addresses what I found when I tried to implement FreeIPA in EC2 so that I could develop code to allow Kerberos authentication in OpenShift.

Kerberos in way too few words

Kerberos is an authentication protocol and service defined originally at MIT as part of Project Athena (along with things like the X Windows System and Zephyr, a predecessor to modern IM services). It is meant to provide authenticated on unencrypted and even untrusted networks. Perfect right? Well Kerberos has some quirks.

First, different people can run their own Kerberos services. To avoid conflicts, each service is given an identifier string known as a realm. By convention the realm string is the same as the enterprise DNS domain name. That is, if a company has DNS domain example.com then the Kerberos realm would be EXAMPLE.COM. Unlike DNS domain names, Kerberos realms are case sensitive.

Each participating host must be registered with the Kerberos server and each user must be added to the user list on the server as well. Hosts and users (and any other manageable entity) is identified with a principal. This is basically a name which is unique for each resource, err user, umm host.... thing. The important thing is that the host is identified by a string which is derived from its hostname.

Now wars have been fought over whether a hostname should be the Fully Qualified Domain Name (FQDN) or just the host portion. For Kerberos there is only one answer: FQDN.

The host principal for a given host is composed of the hostname, and the realm. When a client tries to log in, it needs to know the correct principal to request from the server. This is why the FQDN must be the hostname. When the user attempts to log in he must provide both his own principal and the host principal for the destination. The only way to know the destination's host principal is if it is related to the hostname as viewed from the client host.

This is where life gets interesting in AWS EC2.

You see, AWS uses RFC 1918 and something like Network Address Translation to create a private network for the virtual machines which make up the EC2 service. AWS also uses an internal DNS service to identify each virtual machine. This means that from the view of a host inside the private network, the destination host has a different IP address and a different hostname than when viewed from outside the private network. The upshot is that, to use Kerberos with EC2 I need some way to make sure that the user can determine a valid host principal to request regardless of where the user is located.

A Word about IPA and AWS

IPA (and FreeIPA) is not a single service. It's a collection of services configured so that they work in concert to provide secure user and host access over untrusted networks. Kerberos is only one of the services, though it is probably the core one. LDAP, NTP and DNS are all support services which make the operation of Kerberos work. IPA wraps these services in such a way so that mere mortals don't necessarily need to know how the bindings work merely to get the service running.

In this post I'm dealing almost entirely with the Kerberos service within IPA and I'll refer to that component by name. Where I mention FreeIPA it will be in reference to the specific tools that FreeIPA provides to set up and manage the conglomerate service.

AWS (Amazon Web Services) is also a suite of services. The core of that is the EC2 virtual host service. Again, all of the AWS services generally work together, but I'm only dealing with EC2 instances in this post so I'll refer to EC2 specifically unless I'm referring to the full suite.

UPDATE: 2013-11-07 - AWS TOS do not permit open DNS recursion.

One other thing to be aware of when running IPA in AWS. Amazon terms of service do not allow users to create open recursive DNS services within AWS on the grounds that they can be abused.

When setting up your AWS security policies and the named service on your IPA hosts, be sure to disable recursion and/or limit access to appropriate IP ranges for your DNS clients or you'll get a polite nastygram from Amazon.

Kerberos, Linux and SSH

I want to use Kerberos with SSH so that I can avoid using SSH authorized_keys when pushing git updates to my applications on OpenShift. (mostly ignoring EC2 for now). To do that I need several things set up:

A Kerberos (FreeIPA) server - IPA installed, configured
A set of users configured into the FreeIPA service
A target host (OpenShift node)

For SSH the most important things are that the Kerberos and LDAP configurations are set up properly. This includes configuring sssd, and the /etc/nsswitch.conf settings. Luckly the FreeIPA ipa-client-install script (with the right inputs) will do all of that for me. I think there are ways to get it to tell me precisely what changes it's making but I haven't learned how yet. I do know that I can find the results in the /var/log/ipaclient-install.log.

The other thing I need to do is to make sure that the SSH client and server both will at least try to use the GSSAPI protocol for managing the authentication process. On the server this means making sure that the GSSAPIAuthentication is enabled.

On the client side, I may need to specify that I want to use the gssapi-with-mic authentication method. I may also need to specify the host principal to use to access the destination (as distinct from the hostname from the client's vantage point). More on these later.

EC2 , cloud-init and resisting dynamic naming

The network interface numbering and naming in EC2 are dynamic by design, both on the internal and external interfaces. EC2 does offer "elastic IP" which is really "static IP" for an instance and since I own a DNS zone I can assign a name to the address. Unfortunately this only offers control of the external IP address assigned to an instance. I have to find ways to manage the internal naming myself.

When a host registers with a Kerberos service it generally uses its own hostname as the identifier for the host principal. If this is the same as the DNS name associated with one or more if its IP addresses, this is just by convention. That is, Kerberos doesn't maintain the mapping. So if a host changes its hostname but no changes are made to the Kerberos database, the host can no longer identify itself by it's principal. Also, if the name by which it is known from the outside changes (because the IP address and/or DNS name changed) then clients will no longer know what principal to use to request an access ticket.

There are two factors here: Making sure the host knows its own name, and making sure that users coming from remote hosts can determine the (a?) valid principal (based on the hostname) to request a ticket for.

Maintaining Host Identity

For Kerberos, the hostname is the anchor for a host principal. If the hostname changes on a registered host, it will no longer be able to properly communicate with the Kerberos server and clients. Luckly the Fedora and RHEL images in EC2 use cloud-init to initialize potentially dynamic information on startup.

Cloud-init is software which, when installed on a host, can take input from the cloud environment and customize the host to integrate it into the environment. It can do things like.. oh, say, set the IP address of network interfaces and hostnames, install SSH host keys, set device mount points and the like. It will also allow me to tell it not to update the hostname on each reboot.

The main configuration for cloud-init is /etc/cloud/cloud.cfg. I just need to add a line containing 'preserve_hostname: 1' and set the hostname I want in /etc/hostname. From then on, restarts or reboots will keep the hostname I set. Given that value I have my anchor for registering the host with the kerberos server and maintaining the host/principal mapping.

The host now always knows its own name: part one solved.

The view from Inside/Outside

You do learn something every day. In talking with some of the FreeIPA developer folks I learned something I hadn't known about how the Kerberos protocol works. ;Here's the important bit.

When client wants to gain access to some resource, it sends a message to the kerberos server saying "I am this principal and I want access to that one over there, ok?" The Kerberos server sends back a signed/encrypted ticket with both names (principals) wrapped inside it. The client then sends the ticket in an authentication request to the destination host, who verifies "yep, that's me, and I can see that that's you, let me check are you allowed?" and if the answer is "yes" the client request is granted.

What this means is that the client must know the name (principal) of the destination resource before attempting to connect to the resource. It must know a name that both the kerberos server and the resource host itself will recognize. When everyone uses DNS FQDNs to identify hosts and they have the same view of DNS, this works nicely. Accessing private network resources from a public network creates some issues.

Most tools, SSH included, assume that they can compose a host principal from the hostname given by the user. So if a client was using realm EXAMPLE.COM and tried to reach a remote host with FQDN 'destination.example.com' the principal would be host/destination.example.com@EXAMPLE.COM. But since the EC2 hosts have (not one but two) random hostnames assigned when they boot, it's impossible to know from the hostname alone what the principal of the destination is.

If I happen to know the mapping (ie, what principal is associated with the destination host) then SSH allows me to specify that with -oGSSAPIServerIdentity=<principal> on the CLI or in a Host entry in my .ssh/config file. From the illustration above, to properly authenticate with the Kerberos Host I could do this:

ssh -oPreferredAuthentications=gssapi-with-mic -oGSSAPIServerIdentity=host1.example.com random2.external

(this also assumes that my local hostname and remote one are the same and that I've got a ticket-granting-ticket for the EXAMPLE.COM realm using kinit.)

What this says is to log into a host who's name (from this view) is random2.internal, and who's principal is host1.example.com. With that the local client can send a query to the Kerberos server and get the right ticket back to hand to the destination host. It can say "yep, that's me and yep you're you, and yep you're allowed"

The Many Faces of Kerberos

It's totally coincidence that Cerberos is the 3-headed dog that guards the landing in Hades on the river Styx and I'm going to add two "faces" to my kerberos clients. Totally.

I think that in the discussion above I've been careful to make it clear that a Kerberos principal is an identifier. That is, it is a handle which is used to refer to an object in the Kerberos database which corresponds to an object in reality. I have nick names. Hosts in Kerberos can have them too, and this is going to solve my identity problem with random dynamic names and IP addresses.

I've managed to give each host a fixed hostname, thanks to cloud-init. Once I know the dynamic names both public and internal I should be able to inform the Kerberos server of both of the aliases.

If this works, here's what will happen when I try to log in either from a host inside or outside the private network, my SSH client will form a principal from the (DNS) name I offer. My client will send that to the Kerberos server and request an access ticket to the remote host using the alias principal. And the Kerberos server will know which host that means. It will create an access ticket which will grant me access to the destination host, which will examine it and on finding everything in order, will allow my SSH connection.

It turns out that FreeIPA doesn't yet have a nice Web or CLI user interface to add principals to a registered host record, but the Kerberos database is stored in an LDAP server on the Kerberos master host. For now I (or a friend actually) can craft an LDAP query which will add the principals I need to the host record. This is assumed to be run on

kerberos# ldapmodify -h localhost -x -D "cn=Directory Manager" -W @lt;@lt;EOF
dn: fqdn=host1.example.com,cn=computers,cn=accounts,dc=example,dc=com
changetype: modify
add: krbprincipalname
krbprincipalname: host/random2.external@EXAMPLE.COM
krbprincipalname: host/random2.internal@EXAMPLE.COM
EOF

The invocation above will request the password of the admin user for the FreeIPA LDAP service. I'm sure there's a way to do it with Kerberos/GSSAPI, but I haven't got it yet.

What that change does is add two Kerberos principal names to the host entry for host1.example.com. The principal names match what an SSH client would construct using the DNS name (internal or external) to reach the target host. Now when the Kerberos server gets a ticket request from clients either inside or outside the private network, the principal in the ticket request will be associated with a known host.

The Devil's in the Dynamics

This is all fine so long as host1.example.com doesn't reboot. When it does, AWS will assign it a new internal and external IP address and new DNS names. It would be really nice if the host, when it boots could inform the Kerberos service what its new internal and external principal names are.

I don't currently know how to do this, but I suspect that I could add a module to cloud-init to do the job. The client is already configured to use the LDAP service on the Kerberos (FreeIPA) server. Once the server knows that all three principals refer to the same host life should be good.

Now to learn some cloud-init finagling and enough Kerberos so that I can have the host update itself on reboot.

What does this mean for OpenShift?

If you want to run an OpenShift service in AWS and you want to offer Kerberos authentication for SSH/git to the application gears, you'll have to do a little LDAP tweaking of the Kerberos principals associated with each host so that the Kerberos service will know which host you mean regardless of your view of the destination host.

The first round of Kerberos integration code is going into OpenShift Origin as I write this (the pull requst is submitted and getting commentary). By the next release it should be possible to manage developer access to gears with Kerberos and FreeIPA. Additional use cases will be added over time.

Summary

Cloud services like AWS and corporate networks often rely on private network spaces and Network Address Translation to manage dynamic hosts.
Cloud Init usually updates the hostname on each boot but this can be suppressed.
For a client trying to reach a host for SSH this poses a problem because the view of the destination from the client differs based on where the client sits in relation to the network boundary.
Kerberos can assign multiple principals to a single host, which allows authentication to work.

References

FreeIPA - A component based single-sign-on service
Kerberos - The authentication component of FreeIPA and MIT Project Athena
GSSAPI - A standardized generic authentication and access control protocol
Project Athena - 1980s MIT/DEC/IBM project to design network services and protocols
RFC 1918 - Private non-routable IP address space reservations
Network Address Translation - Private network boundary system
AWS Elastic IP - AWS static IP addresses for dynamic hosts
Cloud Init - A service for customizing host configuration on reboot

Broker-Node interaction and visibility - Debugging "missing" cartridges on a node.

2013-09-27T11:30:00.003-07:00

In the previous post I set up the end-point messaging for OpenShift. (Broker -> Messaging -> Node). I showed a simple use of the MCollective mco command and where the MCollective log files are. The last step was to send an echo message to the OpenShift agent on an OpenShift node and get the response back.

Now I have my OpenShift broker and node set up (I think) but something's not right and I have to figure out what.

DISCLAIMER: this post isn't a "how to" it's a mostly-stream-of-consciousness log of my attempt to answer a question and understand what's going on underneath. It's messy. It may cast light on some of the moving parts. It may also lead me to a confrontation with The Old Man From Scene 24 and we all know how that ends. You have been warned.

In the paragraphs below I include a number of CLI invocations and their responses. I include a prompt at the beginning of each one to indicate where (on which host) the CLI command is running.

broker$ - the command is running on my OpenShift broker host
node$ - the command is running on my OpenShift node host
dev$ - the command is running on my laptop

I've also got a copy of the origin-server source code checked out from the repository on Github.

I've got my rhc client already configured for my test user (cleverly named 'testuser') and my broker (using the libra_server variable). See ~/.openshift/express.conf if needed.

What's going on here?

I started trying to access the Broker with the rhc CLI command to create a user, register a namespace and then create an application. I'd like to create a python app and I've installed the openshift-origin-cartridge-python package to provide that app framework. But when I try to create my app I'm told that Python is not available:

client$ rhc create-app testapp1 python
Short Name Full name
========== =========

There are no cartridges that match 'python'.

So I figure I'll ask what cartridges ARE available:

client$ rhc cartridges

Note: Web cartridges can only be added to new applications.

Now, when I look for cartridge packages on the node I get a different answer:

node$ rpm -qa | grep cartridge
openshift-origin-cartridge-abstract-1.5.9-1.fc19.noarch
openshift-origin-cartridge-cron-1.15.2-1.git.0.aa68436.fc19.noarch
openshift-origin-cartridge-php-1.15.2-1.git.0.090a445.fc19.noarch
openshift-origin-cartridge-python-1.15.1-1.git.0.0eb3e95.fc19.noarch

Somehow, when the broker is asking the node to list its cartridges, the node isn't answering correctly. Why?

I'm going to see if I can observe the broker making the query to list the nodes and then see if I can determine where the node is (or isn't) getting its answer.

Refresher: MCollective RPC and OpenShift

MCollective is really an RPC (Remote Procedure Call) mechanism. It defines the interface for a set of functions to be called on the remote machine. The client submits a function call which is sent to the server. The server executes the function on behalf of the client and then returns the result.

The OpenShift client adds one more level of indirection and I want to get that out of the way. I can look at the logs on the broker and node to see what activity was caused when the rhc command issued the cartridge list query.

The broker writes its logs into several files in /var/log/openshift/broker. You can see the REST queries arrive and resolve in the Rails log file /var/log/openshift/broker/production.log.

broker$ sudo tail /var/log/openshift/broker/production.log
...
2013-09-26 17:54:06.445 [INFO ] Started GET "/broker/rest/api" for 127.0.0.1 at 2013-09-26 17:54:06 +0000 (pid:16730)
2013-09-26 17:54:06.447 [INFO ] Processing by ApiController#show as JSON (pid:16730)
2013-09-26 17:54:06.453 [INFO ] Completed 200 OK in 6ms (Views: 3.6ms) (pid:16730)
2013-09-26 17:54:06.469 [INFO ] Started GET "/broker/rest/api" for 127.0.0.1 at 2013-09-26 17:54:06 +0000 (pid:16730)
2013-09-26 17:54:06.470 [INFO ] Processing by ApiController#show as JSON (pid:16730)
2013-09-26 17:54:06.476 [INFO ] Completed 200 OK in 6ms (Views: 3.8ms) (pid:16730)
2013-09-26 17:54:06.504 [INFO ] Started GET "/broker/rest/cartridges" for 127.0.0.1 at 2013-09-26 17:54:06 +0000 (pid:16730)
2013-09-26 17:54:06.507 [INFO ] Processing by CartridgesController#index as JSON (pid:16730)
2013-09-26 17:54:06.509 [INFO ] Completed 200 OK in 1ms (Views: 0.4ms) (pid:16730)

From that I can see that my rhc calls are arriving and apparently the response is being returned OK.

The default settings for the MCollective client (on the OpenShift broker) don't go to a log file. I can check the OpenShift node though to see what's happened there and if it has received a query for the list of installed cartridges.

node$ sudo grep cartridge /var/log/mcollective.log | tail -3
I, [2013-09-26T17:10:23.696825 #9827]  INFO -- : openshift.rb:1217:in `cartridge_repository_action' action: cartridge_repository_action, agent=openshift, data={:action=>"list", :process_results=>true}
I, [2013-09-26T17:29:10.768487 #9827]  INFO -- : openshift.rb:1217:in `cartridge_repository_action' action: cartridge_repository_action, agent=openshift, data={:action=>"list", :process_results=>true}
I, [2013-09-26T17:29:24.957806 #9827]  INFO -- : openshift.rb:1217:in `cartridge_repository_action' action: cartridge_repository_action, agent=openshift, data={:action=>"list", :process_results=>true}

This too looks like the message has been received and processed properly and returned.

Hand Crafting An mco Message

Here's where that MCollective RPC interface definition comes in. I can look at that to see how to generate the cartridge list query using mco so that I can observe both ends and track down what's happening.

There are really two things to look for here:

What message is sent (and how do I duplicate it)?
What action does the agent take when it receives the message?

For part one, MCollective defines the RPC interfaces in a file with a .ddl extension. Looking for one of those in the openshift-server Github repository finds me this: origin-server/msg-common/agent/openshift.ddl

Of particular interest are lines 390-397. These define the cartridge_repository action and the set of operations it can perform: install, list, erase

Loading gist

Taking that, I can craft an mco rpc message to duplicate what the broker is doing when it queries the nodes:

mco rpc openshift cartridge_repository action=list
Discovering hosts using the mc method for 2 second(s) .... 1

 * [ ==========================================================> ] 1 / 1


ec2-54-211-74-85.compute-1.amazonaws.com 
   output:


Finished processing 1 / 1 hosts in 32.15 ms

Yep, it still says "none". When I go back and look at the logs, it shows the same query I was looking at, so I think I got that right.

But What Does It DO?

Now that I can send the query message, I need to find out what happens on the other end to generate the response. My search begins in the node messaging plugin for MCollective, particularly in the agent module code (in plugins/msg-node/mcollective/src/openshift.rb). This defines a function cartridge_repository_action which... doesn't actually do the work, but points me to the next piece of code which actually does implent the function.

It appears that the OpenShift node implements a class ::OpenShift::Runtime::CartridgeRepository which is a factory for an object that actually produces the answer. A quick look in the source repository shows me the file that defines the CartridgeRepository class.

dev$ cd ~/origin-server
dev$ find . -name \*.rb | xargs grep 'class CartridgeRepository' 
./node/test/functional/cartridge_repository_func_test.rb:  class CartridgeRepositoryFunctionalTest < NodeTestCase
./node/test/functional/cartridge_repository_web_func_test.rb:class CartridgeRepositoryWebFunctionalTest < OpenShift::NodeTestCase
./node/test/unit/cartridge_repository_test.rb:class CartridgeRepositoryTest < OpenShift::NodeTestCase
./node/lib/openshift-origin-node/model/cartridge_repository.rb:    class CartridgeRepository

So, on the node, when a query is received for the list of cartridges that is present, the MCollective agent for OpenShift creates one of the CartridgeRepository objects and then asks it for the list.

A quick look at the cartridge_repository.rb file on Github is enlightening. First, the file has 60 lines of excellent commentary before the code starts. Line 86 indicates that the CartridgeRepository object will look for cartridges in /var/lib/openshift/.cartridge_repository (while noting that this location should be configurable in the /etc/openshift/node.conf someday). And lines 170-189 define the install method which seems to populate the cartridge_repository from some directory which is provided as an argument.

But when does CartridgeRepository.install get invoked? Well, since CartridgeRespository is a factory and a Singleton (which provides the instance() method for initialization) I can look for where it's instantiated:

dev$ find . -type f | xargs grep -l OpenShift::Runtime::CartridgeRepository.instance | grep -v /test/
./plugins/msg-node/mcollective/src/openshift.rb
./node-util/bin/oo-admin-cartridge
./node/lib/openshift-origin-node/model/upgrade.rb

Note that I remove all of the files in the test directories using grep -v /test/. What remains are the working files which actually instantiate a CartridgeRepository object. If I also check for a call to the install() method, the list is reduced to one file:

find . -type f | xargs grep  OpenShift::Runtime::CartridgeRepository.instance | grep -v /test/  | grep install
./plugins/msg-node/mcollective/src/openshift.rb:              ::OpenShift::Runtime::CartridgeRepository.instance.install(path)

So, it looks like the node messaging module is what populates the OpenShift cartridge repository. When I looked earlier though, it didn't seem to have done that. Messaging is running and I've installed cartridge RPMs and I can successfully query for (what turns out to be) an empty database of cartridge information.

Finally! When I look at plugins/msg-node/mcollective/src/openshift.rb lines 26-45 I find what I'm looking for. CartridgeRepository.install is called when the MCollective openshift agent is loaded. That is: when the MCollective service starts.

It turns out that I'd started and began testing the MCollective service before installing any of the OpenShift cartridge packages. Restarting MCollective populates the .cartridge_repository directory and now my mco rpc queries indicate the cartridges I've installed.

Verifying the Change

So, I think, based on the code I've found, that when I restart the mcollective daemon on my OpenShift node, it will look in /usr/libexec/openshift/cartridges and it will use the contents to populate /var/lib/openshift/.cartridge_repository (not sure why that's hidden, but..).

node$ ls /var/lib/openshift/.cartridge_repository
redhat-cron  redhat-php  redhat-python

DING!

Now when I query with mco, I should see those. And I do:

broker$ mco rpc openshift cartridge_repository action=list
Discovering hosts using the mc method for 2 second(s) .... 1

 * [ ==========================================================> ] 1 / 1


ec2-54-211-74-85.compute-1.amazonaws.com 
   output: (redhat, php, 5.5, 0.0.5)
           (redhat, python, 2.7, 0.0.5)
           (redhat, python, 3.3, 0.0.5)
           (redhat, cron, 1.4, 0.0.6)



Finished processing 1 / 1 hosts in 29.41 ms

I suspect that the OpenShift broker also caches these values, so I might have to restart the openshift-broker service on the broker host as well. Then I can use rhc in my development environment to see what cartridges I can use to create an application.

dev$ rhc cartridges
php-5.5    PHP 5.5    web
python-2.7 Python 2.7 web
python-3.3 Python 3.3 web
cron-1.4   Cron 1.4   addon

Note: Web cartridges can only be added to new applications.

And when I try to create a new application:

$ rhc app-create testapp1 python-2.7

Application Options
-------------------
  Namespace:  testns1
  Cartridges: python-2.7
  Gear Size:  default
  Scaling:    no

Creating application 'testapp1' ... done

Waiting for your DNS name to be available ...

Well, that's better than before!

What I learned:

rhc is configured using ~/.openshift/express.conf
look at the logs

OpenShift broker: /var/log/openshift/broker/production.log
mcollective server: /var/log/mcollective.log

the mcollective client mco can be used to simulate broker activity

mco plugin doc - list all plugins available
mco plugin doc openshift - list OpenShift RPC actions and parameters
mco rpc openshift cartridge_repository action=list
query all nodes for their cartridge repository contents

source code is useful

OpenShift source repository: https://github.com/openshift/origin-server
judicious use of find and grep can narrow problem searches

cartridge RPMs are installed in /usr/libexec/openshift/cartridges
cartridges are "installed" in /var/lib/openshift/.cartridge_repository
adding cartridges to a node requires a restart for the mcollective service

OpenShift Support Services: Messaging Part 2 (MCollective)

2013-09-23T08:58:00.001-07:00

About a year ago I did a series of posts on verifying the plugin operations for OpenShift Origin support services. I showed how to check the datastore (mongodb) and DNS updates and how to set up an ActiveMQ message broker , but I when I got to actually sending and receiving messages I got stuck.

The Datastore and DNS services use a single point-to-point connection between the broker and the update server. The messaging services use an intermediate message broker (ActiveMQ, not to be confused with the OpenShift broker). This means that I need to configure and check not just one points, but three:

Mcollective client to (message) broker (on OpenShift broker)
Mcollective server to (message) broker (on OpenShift node)
End to End

I'm using the ActiveMQ message broker to carry MCollective RPC messages. The message broker is interchangeable. MCollective can be carried over any one of several messaging protocols. I'm using the Stomp protocol for now, though MCollective is deprecating Stomp in favor of a native ActiveMQ (AMQP?) messaging protocol.

OpenShift Messaging Components

In a previous post I set up an ActiveMQ message broker to be used for communication between the OpenShift broker and nodes. In this one I'm going to connect the OpenShift components to the messaging service, verify both connections and then verify that I can send messages end-to-end.

Hold on for the ride, it's a long one (even for me)

Mea Culpa: I'm referring to what MCollective does as "messaging" but that's not strictly true. ActiveMQ, RabbitMQ, QPID are message broker services. MCollective uses those, but actually, MCollective is an RPC (Remote Procedure Call) system. Proper messaging is capable of much more than MCollective requires, but to avoid a lot of verbal knitting I'm being lazy and calling MCollective "messaging".

The Plan

Since this is a longer process than any of my previous posts, I'm going to give a little road-map up front so you know you're not getting lost on the way. Here are the landmarks between here and a working OpenShift messaging system:

Ingredients: Gather configuration information for messaging setup.
Mcollective Client -
Establish communications between the Mcollective client and the ActiveMQ server
(OpenShift broker host to message broker host)
MCollective Server -
Establish communications beween the MCollective server and the ActiveMQ server
(OpenShift node host to message broker host)
MCollective End-To-End -
Verify MCollective communication from client to server
OpenShift Messaging and Agent -
Install OpenShift messaging interface definition and agent packages on both OpenShift broker and node

Ingredients

Variable	Value
ActiveMQ Server	msg1.infra.example.com
Message Bus
topic username	mcollective
topic password	marionette
admin password	msgadminsecret
Message End
password	mcsecret

A running ActiveMQ service
A host to be the MCollective client (and after that an OpenShift broker)
A host to run the MCollective service (and after that an OpenShfit node)

On the Mcollective client host, install these RPMs

mcollective-client
rubygem-openshift-origin-msg-broker-mcollective

On the MCollective server (OpenShift node) host, install these RPMs

mcollective
openshift-origin-msg-node-mcollective

Secrets and more Secrets

As with all secure network services, messaging requires authentication. Messaging has a twist though. You need two sets of authentication information, because, underneath, you're actually using two services. When you send a message to an end-point, the end point has to be assured that you are someone who is allowed to send messages. Like with a letter, having some secret code or signature so that you can be sure the letter isn't forged.

Now imagine a special private mail system. Before the mail carrier will accept a letter, you have to give them the secret handshake so that they know you're allowed to send letters. On the delivery end, the mail carrier requires not just a signature but a password before handing over the letter.

That's how authentication works for messaging systems.

When I set up the ActiveMQ service I didn't create a separate user for writing to the queue (sending a letter) and for reading (receiving) but I probably should have. As it is, getting a message from the OpenShift broker to an OpenShift node through MCollective and ActiveMQ requires two passwords and one username.

mcollective endpoint secret
ActiveMQ username
ActiveMQ password

The ActiveMQ values will have to match those I set on the ActiveMQ message broker in the previous post. The MCollective end point secret is only placed in the MCollective configuration files. You'll see those soon.

MCollective Client (OpenShift Broker)

The OpenShift broker service sends messages to the OpenShift nodes. All of the messages (currently) originate at the broker. This means that the nodes need to have a process running which connects to the message broker and registers to receive MColletive messages.

Client configuration: client.cfg

The MCollective client is (predictably) configured using the /etc/mcollective/client.cfg file. For the purpose of connecting to the message broker, only the connector plugin values are interesting, and for end-to-end communications I need the securityprovider plugin as well. The values related to logging are useful debugging too.

# Basic stuff
topicprefix     = /topic/
main_collective = mcollective
collectives     = mcollective
libdir          = /usr/libexec/mcollective
loglevel        = log   # just for testing, normally 'info'

# Plugins
securityprovider = psk
plugin.psk       = mcsecret

# Middleware
connector         = stomp
plugin.stomp.host = msg1.infra.example.com
plugin.stomp.port = 61613
plugin.stomp.user = mcollective
plugin.stomp.user = marionette

NOTE:if you're running on RHEL6 or CentOS 6 instead of Fedora you're going to be using the SCL version of Ruby and hence MCollective. The file is then at the SCL location:

/opt/rh/ruby193/root/etc/mcollective/client.cfg

Now I can test connections to the ActiveMQ message broker, though without any servers connected, it won't be very exciting (I hope).

Testing client connections

MCollective provides a command line tool for sending messages: mco . mco is capable of several other 'meta' operations as well. The one I'm interested in first is 'mco ping'. With mco ping I can verify the connection to the ActiveMQ service (via the Stomp protocol) .

The default configuration file is owned by root and is not readable by ordinary users. This is because it contains plain-text passwords (There are ways to avoid this, but that's for another time). This means I have to either run mco commands as root, or create a config file that is readable. I'm going to use sudo to run my commands as root.

The mco ping command connects to the messaging service and asks all available MCollective servers to respond. Since I haven't connected any yet, I won't get any answers, but I can at least see that I'm able to connect to the message broker, send queries. If all goes well I should get a nice message saying "no one answered".

sudo mco ping


---- ping statistics ----
No responses received

If that's what you got, feel free to skip down to the MCollective Server section.

Debugging client-side configuration errors

There are a couple of obvious possible errors:

Incorrect broker host
broker service not answering
Incorrect messaging username/password

The first two will appear the same to the MCollective client. Check the simple stuff first. If I'm sure that the host is correct then I'll have to diagnose the problem on the other (and write another blog post). Here's how that looks:

sudo mco ping
connect to localhost failed: Connection refused - connect(2) will retry(#0) in 5
connect to localhost failed: Connection refused - connect(2) will retry(#1) in 5
connect to localhost failed: Connection refused - connect(2) will retry(#2) in 5
^C
The ping application failed to run, use -v for full error details: Could not connect to Stomp Server:

Note the message Could not connect to the Stomp Server.

If you get this message, check these on the OpenShift broker host:

The plugin.stomp.host value is correct
The plugin.stomp.port value is correct
The host value resolves to an IP address in DNS
The ActiveMQ host can be reached from the OpenShift Broker host (by ping or SSH)
You can connect to Stomp port on the ActiveMQ broker host
telnet msg1.example.com 61613 (yes, telnet is a useful tool)

If all of these are correct, then look on the ActiveMQ message broker host:

The ActiveMQ service is running
The Stomp transport TCP ports match the plugin.stomp.port value
The host firewall is allowing inbound connections on the Stomp port

The third possibility indicates an information or configuration mismatch between the MCollective client configuration and the ActiveMQ server. That will look like this:

sudo mco ping
transmit to msg1.infra.example.com failed: Broken pipe
connection.receive returning EOF as nil - resetting connection.
connect to localhost failed: Broken pipe will retry(#0) in 5

The ping application failed to run, use -v for full error details: Stomp::Error::NoCurrentConnection

You can get even more gory details by changing the client.cfg to set the log level to debug and send the log output to the console:

...
loglevel = debug # instead of 'log' or 'info'
logger_type = console # instead of 'file', or 'syslog' or unset (no logging)
...

I'll spare you what that looks like here.

MCollective Server (OpenShift Node)

The mcollective server is a process that connects to a message broker, subscribes to (registers to receive messages from) one or more topics and then listens for incoming messages. When it accepts a message, the mcollective server passes it to a plugin module for execution and then returns any response. All OpenShift node hosts run an MCollective server which connects to one or more of the ActiveMQ message brokers.

Configure the MCollective service daemon: server.cfg

I bet you have already guessed that the MCollective server configuration file is /etc/mcollective/server.cfg

# Basic stuff
topicprefix     = /topic/
main_collective = mcollective
collectives     = mcollective
libdir          = /usr/libexec/mcollective
logfile         = /var/log/mcollective.log
loglevel        = debug # just for setup, normally 'info'
daemonize       = 1
classesfile     = /var/lib/puppet/state/classes.txt

# Plugins
securityprovider = psk
plugin.psk       = mcsecret

# Registration
registerinterval = 300
registration     = Meta

# Middleware
connector         = stomp
plugin.stomp.host = msg1.infra.example.com
plugin.stomp.port = 61613
plugin.stomp.user = mcollective
plugin.stomp.password = marionette


# NRPE
plugin.nrpe.conf_dir  = /etc/nrpe.d

# Facts
factsource = yaml
plugin.yaml = /etc/mcollective/facts.yaml

NOTE: again the mcollective config files will be in /opt/rh/ruby193/root/etc/mcollective/ if you are running on RHEL or CentOS.

The server configuration looks pretty similar to the client.cfg. The securityprovider plugin must have the same values, because that's how the server knows that it can accept a message from the clients. The plugin.stomp.* values are the same as well, allowing the MCollective server to connect to the ActiveMQ service on the message broker host. It's really a good idea for the logfile value to be set so that you can observe the incoming messages and their responses. The loglevel is set to debug to start so that I can see all the details of the connection process. Finally the daemonize value is set to 1 so that the mcollectived will run as a service.

The mcollectived will complain if the YAML file does not exist or if the Meta registration plugin is not installed and selected. Comment those out for now. They're out of scope for this post.

Running the MCollective service

When you're satisfied with the configuration, start the mcollective service and verify that it is running:

sudo service mcollective start
Redirecting to /bin/systemctl start  mcollective.service
ps -ef | grep mcollective
root     13897     1  5 19:37 ?        00:00:00 /usr/bin/ruby-mri /usr/sbin/mcollectived --config=/etc/mcollective/server.cfg --pidfile=/var/run/mcollective.pid

You should be able to confirm the connection to the ActiveMQ server in the log.

sudo tail /var/log/mcollective.log 
I, [2013-09-19T19:53:21.317197 #16544]  INFO -- : mcollectived:31:in `' The Marionette Collective 2.2.3 started logging at info level
I, [2013-09-19T19:53:21.349798 #16551]  INFO -- : stomp.rb:124:in `initialize' MCollective 2.2.x will be the last to fully support the 'stomp' connector, please migrate to the 'activemq' or 'rabbitmq' connector
I, [2013-09-19T19:53:21.357215 #16551]  INFO -- : stomp.rb:82:in `on_connecting' Connection attempt 0 to stomp://mcollective@msg1.infra.example.com:61613
I, [2013-09-19T19:53:21.418225 #16551]  INFO -- : stomp.rb:87:in `on_connected' Conncted to stomp://mcollective@msg1.infra.example.com:61613
...

If you see that, you can skip down again to the next section, MCollective End-to-End

Debugging MCollective Server Connection Errors

Again the two most likely problems are that the host or the stomp plugin are mis-configured.

sudo tail /var/log/mcollective.log
I, [2013-09-19T20:05:50.943144 #18600]  INFO -- : stomp.rb:82:in `on_connecting' Connection attempt 1 to stomp://mcollective@msg1.infra.example.com:61613
I, [2013-09-19T20:05:50.944172 #18600]  INFO -- : stomp.rb:97:in `on_connectfail' Connection to stomp://mcollective@msg1.infra.example.com:61613 failed on attempt 1
I, [2013-09-19T20:05:51.264456 #18600]  INFO -- : stomp.rb:82:in `on_connecting' Connection attempt 2 to stomp://mcollective@msg1.infra.example.com:61613
...

If I see this, I need to check the same things I would have for the client connection. On the MCollective server host:

plugin.stomp.host is correct
plugin.stomp.port matches Stomp transport TCP port on the ActiveMQ service
Hostname resolves to an IP address
ActiveMQ host can be reached from the MCollective client host (ping or SSH)

On the ActiveMQ message broker:

ActiveMQ service is running
Any firewall rules allow inbound connections to the Stomp TCP port

The other likely error is username/password mismatch. If you see this in your mcollective logs, check the ActiveMQ user configuration and compare it to your mcollective server plugin.stomp.user and plugin.stomp.password values.

...
I, [2013-09-19T20:15:13.655366 #20240]  INFO -- : stomp.rb:82:in `on_connecting'
 Connection attempt 0 to stomp://mcollective@msg1.infra.example.com:61613
I, [2013-09-19T20:15:13.700844 #20240]  INFO -- : stomp.rb:87:in `on_connected' 
Conncted to stomp://mcollective@msg1.infra.example.com:61613
E, [2013-09-19T20:15:13.729497 #20240] ERROR -- : stomp.rb:102:in `on_miscerr' U
nexpected error on connection stomp://mcollective@msg1.infra.example.com:61613: es_trans: transmit
 to msg1.infra.example.com failed: Broken pipe
...

MCollective End-to-End

Now that I have both the MCollective client and server configured to connect to the ActiveMQ message broker I can confirm the connection end to end. Remember that 'mco ping' command I used earlier? When there are connected servers, they should answer the ping request.

 sudo mco ping
node1.infra.example.com time=138.60 ms


---- ping statistics ----
1 replies max: 138.60 min: 138.60 avg: 138.60

OpenShift Node 'plugin' agent

Now I'm sure that both MCollective and ActiveMQ are working end-to-end between the OpenShift broker and node. But there's no "OpenShift" in there yet. I'm going to add that now.

There are three packages that specifically deal with MCollective and interaction with OpenShift:

openshift-origin-msg-common.noarch (misnamed, specifically mcollective)
rubygem-openshift-origin-msg-broker-mcollective
openshift-origin-msg-node-mcollective.noarch

The first package defines the messaging protocol for OpenShift. It includes interface specifications for all of the messages, their arguments and expected outputs. This is used on both the MCollective client and server side to produce and validate the OpenShift messages. The broker package defines the interface that the OpenShift broker (a Rails application) uses to generate messages to the nodes and process the returns. The node package defines how the node will respond when it receives each message.

The OpenShift node also requires several plugins that, while not required for messaging per-se, will cause the OpenShift agent to fail if they are not present

rubygem-openshift-origin-frontend-nodejs-websocket
rubygem-openshift-origin-frontend-apache-mod-rewrite
rubygem-openshift-origin-container-selinux

When these packages are installed on the OpenShift broker and node, mco will have a new set of messages available. MCollective calls added sets of messages... (OVERLOAD!) 'plugins'. So, to see the available message plugins, use mco plugin doc. To see the messages in the openshift plugin, use mco plugin doc openshift.

Mcollective client: mco

I've used mco previously just to send a ping message from a client to the servers. This just collects a list of the MCollective servers listening. The mco command can also send complete messages to remote agents. Now I need to learn how to determine what agents and messages are available and how to send them a message. Specifically, the OpenShift agent has an echo message which simply returns a string which was sent in the message. Now that all of the required OpenShift messaging components are installed, I should be able to tickle the OpenShift agent on the node from the broker. This is what it looks like when it works properly:

sudo mco rpc openshift echo msg=foo
Discovering hosts using the mc method for 2 second(s) .... 1

 * [ ========================================================> ] 1 / 1


node1.infra.example.com 
   Message: foo
      Time: nil



Finished processing 1 / 1 hosts in 25.49 ms

As you might expect, this has more than its fair share of interesting failure modes. The most likely thing you'll see from the mco command is this:

sudo mco rpc openshift echo msg=foo
Discovering hosts using the mc method for 2 second(s) .... 0

No request sent, we did not discover any nodes.

This isn't very informative, but it does at least indicate that the message was sent and nothing answered. Now I have to look at the MCollective server logs to see what happened. After setting the loglevel to 'debug' in /etc/mcollective/server.cfg, restarting the mcollective service and re-trying the mco rpc command, I can find this in the log file:

sudo grep openshift /var/log/mcollective.log 
D, [2013-09-20T14:18:05.864489 #31618] DEBUG -- : agents.rb:104:in `block in findagentfile' Found openshift at /usr/libexec/mcollective/mcollective/agent/openshift.rb
D, [2013-09-20T14:18:05.864637 #31618] DEBUG -- : pluginmanager.rb:167:in `loadclass' Loading MCollective::Agent::Openshift from mcollective/agent/openshift.rb
E, [2013-09-20T14:18:06.360415 #31618] ERROR -- : pluginmanager.rb:171:in `rescue in loadclass' Failed to load MCollective::Agent::Openshift: error loading openshift-origin-container-selinux: cannot load such file -- openshift-origin-container-selinux
E, [2013-09-20T14:18:06.360633 #31618] ERROR -- : agents.rb:71:in `rescue in loadagent' Loading agent openshift failed: error loading openshift-origin-container-selinux: cannot load such file -- openshift-origin-container-selinux
D, [2013-09-20T14:18:13.741055 #31618] DEBUG -- : base.rb:120:in `block (2 levels) in validate_filter?' Failing based on agent openshift
D, [2013-09-20T14:18:13.741175 #31618] DEBUG -- : base.rb:120:in `block (2 levels) in validate_filter?' Failing based on agent openshift

It turns out that the reason those three additional packages are requires is that they provide facters to MCollective. Facter is a tool which gathers a raft of information about a system and makes it quickly available to MCollective. The rubygem-openshift-origin-node package adds some facter code, but those facters will fail if the additional packages aren't present. If you do the "install everything" these resolve automatically, but if you install and test things piecemeal as I am they show up as missing requirements.

After I add those packages I can send an echo message and get a successful reply. If you can discover the MCollective servers from the client with mco ping, but can't get a response to an mco rpc openshift echo message, then the most likely problem is that the OpenShift node packages are missing or misconfigured. Check the logs and address what you find.

Finally! (sort of)

At this point, I'm confident that the Stomp and MCollective services are working and that the OpenShift agent is installed on the node and will at least respond to the echo message. I was going to also include testing through the Rails console, but this has gone on long enough. That's next.

References

ActiveMQ - Message Broker
RabbitMQ - Message Broker
QPID - Message Broker
MCollective - RPC

MCollective Client Configuration
MCollective Server Configuration

Stomp - messaging protocol
AMQP (Advanced Message Queue Protocol)
OpenWire - messaging protocol

Installing OpenShift using Puppet, Part 1: Divide and Conquer

2013-07-25T12:07:00.001-07:00

It's been quite a while since I posted last. I got stuck on three things

I didn't (don't?) know Puppet
The layers of service and configuration were (are?) muddy.
There are several competing significant installation use cases to be considered.

It would be very Agile to just leap in and start coding things until I got a set of boxes that worked. But it would also likely lead to something which was difficult to adapt to new uses because it didn't respect the working boundaries between different layers and compartments which make up the OpenShift service.

So I learned Puppet, and started coding some top down samples and some bottom up samples, while at the same time writing philosophical tracts trying to justify the direction(s) I was going.

I'm not nearly done (having thrown out several attempts and restarted each time) but I think I've reached a point where I can express clearly *how* I want to go about developing a CMS reference implementation for OpenShift installation and configuration.

OK, you're not going to get away without some philosophy. Rather a lot actually this time.

Where do Configuration Management Services (CMS) fit...

Up until now I've concentrated on reaching a point where I can start installing OpenShift. And I'm finally there. No. Wait. I'm at the point where I can start installing the parts that make up OpenShift. After that I have to configure each of the parts to run in their own way and then I have to configure the settings that OpenShift cares about.

See what happened there? It's layers.

Host and Service Configuration Management Layers

See where the CMS fits in? Between the running OS and all those configured hosts/services. That's where I am now.

Look at the top layer. Those vertical slices are individual hosts or services that have to be created. Only the ones in the middle are OpenShift. The others are operations support (for a running service) or development and testing stuff which isn't really OpenShift but is needed to create OpenShift.

... and what do they need to do.

I need to show you another complicated looking picture:

Draft OpenShift CMS Module Layout

As you can see, I need to learn Inkscape more, because Dia graphics just don't look as cool.

I'm a fan of big complicated looking graphics to help describe big complicated concepts. This is a very rough incomplete draft of a module breakdown for installing OpenShift using a CM system (Puppet, by name, though this should be applicable to any modular CM system). The three columns in the diagram represent different class uses.

The first column contains classes that are just used to hold information that will be used to instantiate other classes on the target hosts. None of these classes will be instantiated directly on any host. The second column shows an OpenShift Broker and an OpenShift Node. Each includes a class which describes the function of that host within the OpenShift service. Each also includes any support services which run on the same host. The third column contains the definitions of the hosts which run support services. They include a module for the support service itself, and then one which applies the OpenShift customizations to the service.

OpenShift uses plugin modules for several support services. In the diagram, the plugins for each support service are grouped together. Only one would be instantiated for a given OpenShift installation. Which one is selected as a parameter of the Master Configuration class ::openshift

There is one lonely class at the bottom of the middle column: ::openshift::host. This is currently a catch-all class which provides a single point of control for configuring common host settings such as SSH firewall rules, the presence (or absence) of special YUM repositories and the like. It will be instantiated on every host which participates in the OpenShift service (for now) but can be customized using class parameters. This class could be broken up or other features added depending on how (in)coherent it becomes.

I showed you that diagram to show you this one.

Now if you look back to the top diagram, in the top row there are a bunch of vertical items that are peers of a sort. Each blob represents a component service of OpenShift or a supporting service or task. In a fully distributed service configuration each one would represent an individual host.

Keep that in mind as you look at the middle and right side of the second diagram. Those (UML/Puppet) nodes there map to the blobs ad the top of the first diagram. They show the internal structure of those blobs when installing OpenShift and support components. Each one contains at least one module which installs a support service or component and which doesn't have the word openshift in it. Each one also contains (at least) one OpenShift customization class. This latter uses the information classes from the first column to customize the software on the node and integrate it with the OpenShift service.

This is the key point:

There are layers here too.

The configuration management tools should be designed so so that you can plug them together in a way that gets you the service you want to have, building up from the base to the completed service. But: you should also be able to understand how the service is put together by looking at the configuration files.

By creating each (Puppet) node from the (Puppet) parts that define what a host does, you can see what the host does by looking at the Puppet node definition. Knowledge is maintained both ways.

Outside-In Development

Since I'm still learning specific CMS implementations (Puppet now, and Ansible soon) and trying to understand how best to express a configuration for OpenShift using these CMS, I'm working from the top alot. At the same time, I'm trying to actually implement (or steal implementations of) modules to do things like set up the YUM repositories and install the packages. I like this kind of Outside-In development model because (if I'm careful not to thrash too much) it helps me keep both perspectives in mind and hopefully meet in the middle.

In the next installment I'll try putting some meat on the bones of this skeleton: Actually creating the empty class definitions in their hierarchical structure and then creating a set of node definitions that import and use the classes to at least pretend to install an OpenShift service. Hopefully it won't take me another couple of months.

References

CMS Software

Puppet
PuppetForge - puppet modules
Ansible

Drawing Software

Inkscape
Dia

OpenShift on AWS EC2, Part 5 - Preparing Configuration Management (with Puppet)

2013-06-05T13:30:00.000-07:00

I'm 5 posts into this and still haven't gotten to any OpenShift yet, except for doling out the instances and defining the ports and securitygroups for network communication. I did say "from the ground up" though, so if you've been here from the beginning, you knew what you were getting into.

In this post I'm going to build and run the tasks needed to turn an EC2 base instance with almost nothing installed into a Puppet master, or a Puppet client. There are a number of little details that need managing to get puppet to communicate and to make it as easy as possible to manage updates.

First a short recap for people just joining and so I can get my bearings.

Previously, our heros...

In the first post I introduced a set of tools I'd worked up for myself to help me understand and then automate the interactions with AWS.

In the second one I registered a DNS domain and delegated it to the AWS Route53 DNS service.

In the third I figured out what hosts (or classes of hosts) I'd need to run for an OpenShift service. Then I defined a set of network filter rules (using the AWS EC2 securitygroup feature) to make sure that my hosts and my customers could interact.

Finally in the previous post I selected an AMI to use as the base for my hosts, allocated a static IP address, added DNS A record, and started an instance for the puppet master and broker hosts. The remaining three (data1, message1, and node1) were left as an exercise for the reader.

So now I have five AWS EC2 instances running. I can reach them via SSH. The default account ec2-user has sudo ALL permissions. The instances are completely unconfigured.

The next few sections are a bunch of exposition and theory. It explains some about what I'm doing and why, but doesn't contain a lot of doing. Scan ahead if you get bored to the real stuff closer to the bottom.

The End of EC2

With the completion of the 4th post, we're done with EC2. All of the interactions from here on occur over SSH. The only remaining interactions with Amazon will be with Route53. The broker will be configured to update the app.example.org zone when applications are added or removed.

You could reach this point with any other host provisioning platform, AWS cloudformation, libvirt, virtualbox, Hyper-V, VMWare, or bare metal, it doesn't matter. Each of those will have its own provisioning details but if you can get to networked hosts with stable public domain names you can pick up here and go on, ignoring everything but the first post.

The first post is still needed for the process I'm defining because the origin-setup tools written with Thor aren't just used for EC2 manipulation. If that's all they were for I would have used one of the existing EC2 CLI packages.

Configuration Management: An Operations Religion

I mean this with every coloring and shade of meaning it can have, complete with schisms and dogma and redemption and truth.

Some small shop system administrators think that configuration management isn't for them, it isn't needed. I differ with that opinion. Configuration management systems have two complementary goals. Only one of them is managing large numbers of systems. The important goal is managing even one repeatably. This is the Primary Dogma of System Administration. If you can't do it 1000 times, you can't do it at all.

The service I'm outlining only requires four hosts (the puppet master will be 5). I could do it on one. That's how most demos until now have done it. I could describe to you how to manually install and tweak each of the components in an OpenShift system, but its very unlikely that anyone would ever be able to reproduce what I described exactly. (I speak from direct experience here, following that kind of description in natural language is hard and writing it is harder) Using a CMS it is possible to expose what needs to be configured specially and what can be defaulted, and to allow (if its done well) for flexibility and customization.

The religion comes in when you try to decide which one.

I'm going to go with sheep and expedience and choose Puppet. Other than that I'm not going to explain why.

Brief Principals of Puppet

Puppet is one of the currently popular configuration management systems. It is widely available and has a large knowledgeable user base. (that's why).

The Master/Agent deployment

The standard installation of puppet contains a puppet master and one or more puppet clients running the puppet agent service. The configuration information is stored on the puppet master host. The agent processes periodically poll the master for updates to their configuration. When an agent detects a change in the configuration spec the change is applied to the host.

The puppet master scheme has some known scaling issues, but for this scenario it will suit just fine. If the OpenShift service grows beyond what the master/agent model can handle, then there are other ways of managing and distributing the configuration, but they are beyond the scope of this demonstration.

The Site Model Paradigm

That's the only time you'll see me use that word. I promise.

The puppet configuration is really a description of every component, facet and variable you care about in the configuration of your hosts. It is a model in the sense that it represents the components and their relationships. The model can be compared to reality to find differences. Procedures can be defined to resolve the differences and bring the model and reality into agreement.

There are some things to be aware of. The model is, at any moment, static. It represents the current ideal configuration. The agents are responsible for polling for changes to the model and for generating the comparisons as well as applying any changes to the host. It is certain that when a change is made to the model, there will be a window of time when the site does not match. Usually it doesn't matter, but sometimes changes have to be coordinated. Later I may add MCollective to the configuration to address this. MCollective is Puppet's messaging/remote procedure call service and it allows for more timing control than the standard Puppet agent pull model.

Also, the model is only aware of what you tell it to be aware of. Anything that you don't specify is.... undetermined. Now specifying everything will bury you and your servers under the weight of just trying to stay in sync. It's important to determine what you really care about and what you don't. It's also important to look carefully at what you're leaving out to be sure that it's safe.

Preparing the Puppet Master and Clients

As usual, there's something you have to do before you can do the thing you really want to do. While puppet can manage pretty much anything about a system after it is set up, it can''t set it self up from nothing.

The puppet master must have a well known public hostname (DNS). Check.
Each participating client must have a well known public hostname (DNS): Check
The master and clients must know its own hostname (for id to the master) Err.
The master and clients must have time sync. Ummm
The master and clients must have the puppet (master/client) software installed. Not Check.
The master must have any additional required modules installed.
The master must have a private certificate authority (CA) so that it can sign client credentials. Not yet
The clients must generate and submit a client certificate for the master to sign. Nope.
The master must have a copy of the site configuration files to generate the configuration model. No.

The first four points are generic host setup, and the first two are complete. Installing the puppet software should be simple, but I may need to check and/or tweak the package repositories to get the version I want. The last four are pure puppet configuration and the last one is the goal line.

Hostname

Puppet uses the hostname value set on each host to identfy the host. Each host should have its hostname set to the FQDN of the IP address on which it expects incoming connections.

Time Sync on Virtual Machines

Time sync needs a little space here. On an ordinary bare-metal host I'd say "install an ntpd on every host". NTP daemons are light weight and more reliable and stable than something like cron job to re-sync. Virtual machines are special though.

On a properly configured virtual machine, the system time comes from the VM host. As the guest, you must assume that the host is doing the right thing. The guest VM has a simulated real-time clock (RTC) which is a pass-through either of the host clock or the underlying hardware RTC. In either case, the guest is not allowed to adjust the underlying clock.

Typically a service like ntpd gets time information from outside and not only slews the system (OS) clock but it compares that to the RTC and tries to compensate for drift between the RTC and the "real" time. In the default case it will even adjust the RTC to keep it in line with the system clock and "real" time.

As a guest, it's impolite to go around adjusting your host's clocks.

So a virtual machine system like an IaaS is one of the few places I'd advise against installing a time server. If your VMs aren't in sync, call your provider and ask them why their hardware clocks are off. If they can't give you a good answer, find a new IaaS provider.

Time Zones and The Cloud

I'm going to throw one more timey-wimey things in here. I set the system timezone on every server host to UTC. If I ever have to compare logs on servers from different regions of the world (this is the cloud remember?) I don't have to convert time zones. User accounts can always set their timezone to localtime using the TZ environment variable. The tasks offer an option so that you can override the timezone.

Host Preparation vs. Software Configuration

It would be fairly easy to write a single task that completes all of the bullet points listed above, but something bothers me about that idea. The first 4 are generic host tasks. The last four are distinctly puppet configuration related. Installing the software packages sits on the edge of both. The system tasks are required on every host. Only the puppet master will get the puppet master service software and configuration. The puppet clients will get different software and a different configuration process.

I'm going to take advantage of the design of Thor to create three separate tasks to accomplish the job:

origin:prepare - do the common hosty tasks
origin:puppetmaster - prepare and then install and configure a master
origin:puppetclient - prepare, and then install and register a client

So the origin:prepare task needs to set the hostname on the box to match the FQDN. I prefer also to enable the local firewall service and open a port for SSH to minimize the risk of unexpected exposure. This is also where I'd put a task to add a software repository for the puppet packages if needed.

Each of the origin:puppetmaster and origin:puppetclient tasks will invoke the origin:prepare task first.

File Revision Control

Since Configuration Management is all about control and repeatability it also makes sense to place the configuration files themselves under revision control. For this example I'm going to place the site configuration in a Github repository. Changes can be made in a remote work space and pushed to the repository. ;Then they can be pulled down to the puppet master and the service notified to re-read the configurations. ;They can also be reverted as needed.

When the Puppet site configuration is created on the puppet master, it will be cloned from the git repo on github.

Initialize the Puppet Master

The puppet master process runs as a service on the puppet server. It listens for polling queries from puppet agents on remote machines. The puppet master service must read the site configurations to build the models that will define each host. The puppet service runs as a non-root user and group, each named "puppet". The default location for puppet configuration files is in /etc/puppet. This area is only writable by the root user. Other service files reside in /var/lib/puppet. This area is writable by the puppet user and group. Further, SELinux limits access by the puppet user to files outside these spaces.

On RHEL6, the EC2 login user is still root. The user and group settings aren't really needed there, but they are still consistent.

The way I choose to manage this is:

Add the ec2-user to the puppet group
Place the site configuration in /var/lib/puppet/site
Update the puppet configuration file (/etc/puppet/puppet.conf) to reflect the change
Clone the configuration repo into the local configuration directory
Symlink the configuration repo root into the ec2-user home directory.

This way the ec2-user has permission and access to update the site configuration.

Puppet uses x509 server and client certificates. The puppet master needs a server certificate and needs to self-sign it before it can sign client certificates or accept connections from clients.

Once the server certificate is generated and signed, I also need to enable and start the puppet master service. Finally, I need to add a firewall rule allowing inbound connections on the puppet master port, 8140/TCP.

So the process of initializing the puppet master is this:

install the puppet master software
modify the puppet config file to reflect the new site configuration file location
install additional puppet modules
generate server certificate and sign it
add ec2-user to puppet group (or root user on RHEL6)
create site configuration directory and set owner, group, permissions
clone the git repository into the configuration directory
start and enable the puppet master service

Installing Packages

Since I'm using Thor, the package installation process is a Thor task. Each sub-task will only run once within the invocation of its parent. The origin:puppetmaster task calls the origin:prepare task and provides a set of packages needed for a puppet master in addition to any installed as part of the standard preparation (firewall management and augeas). For the puppet master, these additional packages are the puppet-master and git packages. Dependencies are resolved by YUM.

Adding user to Puppet group

The puppet service is controlled by the root user, but runs as a role user and group both called puppet. I would like the login user to be able to manage the puppet site configuration files, but not to log in either as the root or puppet user. I'll add the ec2-user user to the puppet group, and set the group write permissions so that this user can manage the site configuration.

Creating the Site Configuration Space

As noted above, the ec2-user account will be used to manage the puppet site configuration files. The files must be writable by the ec2-user (through the puppet group) but they must also be readable by the puppet user and service. In addition, since these are service configurations rather than (local) host configuration files, I'd prefer that they not reside in /etc.

SELinux policy restricts the location of files which the puppet service processes can read. One of those locations is in /var/lib/puppet. Rather than update the policy, it seems easier to place the site configuration data within /var/lib/puppet.

I create a new directory /var/lib/puppet/site and set the owner, group and permissions so that the puppet user/group and read and write the files. I also set the permissions so that new files will inherit the group and group permissions. This way the ec2-user will have the needed access, and SELinux will not prevent the puppet master service from reading the files. In a later step I'll use git to clone the site configuration files into place.

Install Service Configuration File (setting variables)

Moving the location of the site configuration files from the default (/etc/puppet/manifests) and adding a location for user defined modules requires updating the default configuration file. Currently I make three alterations to the default file:

set the puppet master hostname as needed
set the location of the site configuration (manifests)
add a location to the modulepath

I use a template file, push a copy to the master and use sed to replace the values before copying the updated file into place.

Installing Standard Modules

Puppet provides a set of standard modules for managing common aspects of clients. These are installed from a web site on PuppetLabs with the puppet module install command. these are installed before starting the master process.

Unpacking Site Configuration (From git)

I already have a task for cloning a git repository on a remote host. Unpack the site configurations into the directory prepared previously. The git repo must have two directories at the top: manifests and modules. These will contain the site configuration and any custom modules needed for OpenShift. These locations are configured into the puppet master configuration above.

Adding Firewall Rules

The puppet master service listens on port 8140/TCP. I need to add an allow rule so that inbound connections to the puppet master will succeed.

Just to be safe I also add an explicit rule to allow SSH (22/TCP) before restarting the firewall service.

These match the securitygroup rule definitions defined in the third post. Some people would question the need for running a host based firewall when EC2 provides network filtering I would refer anyone who asks that to read up on Defense in Depth.

Filtering the Puppet logs into a separate file

It is much easier to observe the operation of the service if the logs are in a separate file. I add an entry to the /etc/rsyslog.d/ directory and restart the rsyslog daemon to place puppet master logs in /var/log/puppet-master.log

Enabling and Starting the Puppet Master Service

Finally, when all of the puppet master host customization is complete, I can enable and start the puppet master service.

What all that looks like

That's a whole long list and I created a whole set of Thor tasks to manage the steps. Then I created an uber-task to execute it all. It starts with the result of origin:baseinstance (run with the securitygroups default and puppetmaster). It results in a running puppet master waiting for clients to connect.

thor origin:puppetmaster puppet.infra.example.org --siterepo https://github.com/markllama/origin-puppet
origin:puppetmaster puppet.infra.example.org
task: remote:available puppet.infra.example.org
task: origin:prepare puppet.infra.example.org
task: remote:distribution puppet.infra.example.org
fedora 18
task: remote:arch puppet.infra.example.org
x86_64
task: remote:timezone puppet.infra.example.org UTC
task: remote:hostname puppet.infra.example.org
task: remote:yum:install puppet.infra.example.org puppet-server git system-config-firewall-base augeas
task: puppet:master:join_group puppet.infra.example.org
task: remote:git:clone puppet.infra.example.org https://github.com/markllama/origin-puppet
task: puppet:master:configure puppet.infra.example.org
task: puppet:master:enable_logging puppet.infra.example.org
task: puppet:module:install puppet.infra.example.org puppetlabs-ntp
task: remote:firewall:stop puppet.infra.example.org
task: remote:firewall:service puppet.infra.example.org ssh
task: remote:firewall:port puppet.infra.example.org 8140
task: remote:firewall:start puppet.infra.example.org
task: remote:service:start puppet.infra.example.org puppetmaster
task: remote:service:enable puppet.infra.example.org puppetmaster

You can check that the puppet master has created and signed its own CA certificate by listing the puppet certificates like this:

thor puppet:cert list puppet.infra.example.org --all
task puppet:cert:list puppet.infra.example.org
+ puppet.infra.example.org BD:27:A5:3B:AE:F5:1D:05:7E:8F:E7:E9:CA:BA:32:4B

This indicates that there is now a single certifiicate associated with the puppet master. This certificate will be used to sign the client certificates as they are submitted.

Initializing a Puppet Client

The first part of creating a puppet client host is the same as for the master (almost). It involves installing some basic puppet packages (puppet, facter, augeas), setting the hostname and time zone and the rest of the hosty stuff. Then we get to the puppet client registration.

The puppet agent runs on the controlled client hosts. It polls the puppet master periodically checking for updates to the configuration model for the host.

When the puppet agent starts the first time it generates an x509 client certificate and sends a signing request to the puppet master.

When the puppet master receives an unsigned certificate from an agent for the first time it places it in a list of certificates waiting to be signed. The user can then sign and accept each new client certificate and the initial identification process is complete. From then on the puppet agent polls using its client certificate for identification and the signature provides authentication.

The process then for installing and initializing the puppet client is this:

On the client:

install the puppet agent package
configure the puppet master hostname into the configuration file
enable the puppet agent service
start the puppet agent service

Then on the puppet master:

wait for the client unsigned certificate to arrive
sign the new client certificate

This is what it looks like for the broker host:

thor origin:puppetclient broker.infra.example.org puppet.infra.example.org
origin:puppetclient broker.infra.example.org, puppet.infra.example.org
task: remote:available broker.infra.example.org
task: origin:prepare broker.infra.example.org
task: remote:distribution broker.infra.example.org
fedora 18
task: remote:arch broker.infra.example.org
x86_64
task: remote:timezone broker.infra.example.org UTC
task: remote:hostname broker.infra.example.org
task: remote:yum:install broker.infra.example.org puppet facter system-config-firewall-base augeas
task: puppet:agent:set_server broker.infra.example.org puppet.infra.example.org
task: puppet:agent:enable_logging broker.infra.example.org
task: remote:service:enable broker.infra.example.org puppet
task: remote:service:start broker.infra.example.org puppet
task: puppet:cert:sign puppet.infra.example.org broker.infra.example.org

At this point the client can request its own configuration model and the master will confirm the identity of the client and return the requested information.

thor puppet:cert:list puppet.infra.example.org --all
task puppet:cert:list puppet.infra.example.org
+ broker.infra.example.org 09:97:22:B9:A9:16:AE:B1:32:93:EC:3A:6D:7A:CF:67
+ puppet.infra.example.org 70:B8:E0:C0:F8:5B:48:67:4E:92:91:D2:0D:E4:2B:F4

Repeat the origin:puppetclient step for the data1, message1 and node1 instances you created last time. You did create them, right? Check the certs as each one registers.

The next step is to actually build a model for the client to request by creating a site manifest and a set of node descriptions.

That means: we finally get to do some OpenShift.

OpenShift on AWS EC2, Part 4 - The First Machine

2013-05-30T11:04:00.003-07:00

There's enough infrastructure in place now that I should be able to create the first instance for my OpenShift service. I'm going to be managing the configuration with a Puppet master, so that will be the first instance I create.

The puppet master must have a public name and a fixed IP address. I need to be able to reach it via SSH, and the puppet agents need to be able to find it by name (oversimplification, go with me on this).

With Route53 and EC2 configured, I can request a static (elastic) IP and associate it with a hostname in my domain. I can also associate it with a new instance after the instance is launched. I can specify the network filtering rules so I can access the host over the network.

I actually have a task that does all this in one go, but I'm going to walk through the steps once so it's not magic.

NOTE: if you haven't pulled down the origin-setup tools from github and you want to follow along, you should go back to the first post in this series and do so.

This is not the only way to accomplish the goals set here. You can use the AWS web console, CloudFormation or even tools like the control plugins for Vagrant.

Instances and Images in EC2

First, a little terminology. EC2 has a number of terms to disambiguate... things.

An image is a static piece of storage which contains an OS. It is the "gold copy" that we used to make when we still cloned hard disks to copy systems. An image cannot run an OS. It's storage. An image does have some metadata though. It has an associated machine architecture. It has instructions for how it is to be mounted when it is used to create an instance.

(actually, this is a lie, an image is the metadata, the storage is really in a snapshot with a volume but that's too much and not really important right now.)

An instance is a runnable copy of an image. It has a copy of the disk, but it also has the ability to start and stop. It is assigned an IP address when it starts. A copy of your RSA security key is installed when it starts so that you can log in.

When you want a new machine, you create an instance from an image. You select an image which uses the architecture and contains the OS that you want. You give the instance a name, and comment, and its security groups. There are other things you can specify as well, but they don't come into play here.

Finding the Right Image

People like their three letter abbreviations. On the web interface you'll see the term "AMI", which, I think stands for "Amazon Machine Image". Otherwise known as "an image" in this context. While the image ID's all begin with ami- I'm going to continue to refer to them as "images".

For OpenShift I want to start with either a Fedora or a RHEL (or CentOS) image. I can't think of a reason anymore not to use a 64 bit OS and VM, so I'll specify that. You can easily find official RHEL images on the AWS web console or using the AWS Marketplace. You can find CentOS in the Marketplace. There are "official" Fedora images there too, though they're not publicized.

What I do is use the web interface to find a recommended image and then make a note of the owner ID of the image. From then on I can use the owner ID to find images using the CLI tools. It doesn't look like you can look up an owner's information from their owner ID.

New instances (running machines) are created from images. Conversely new images can be created from an instance. People can create and register and publish their images, so there can be lots of things that look like they're "official" which may have been altered. It takes a little sleuthing to find the images that come from the source you want.

Using the AWS console, I narrowed the Fedora x86_64 images down to this:

I made a note of the owner ID, and the pattern for the names and I can search for them on the CLI like this:

thor ec2:image list --owner 125523088429 --name 'Fedora*' --arch x86_64
ami-2509664c 125523088429 x86_64 Fedora-x86_64-17-1-sda 
ami-6f3b5006 125523088429 x86_64 Fedora-x86_64-19-Beta-20130523-sda 
ami-b71078de 125523088429 x86_64 Fedora-x86_64-18-20130521-sda

Note that the --name search allows for globbing using the asterisk (*) character.

Launching the First Instance

I think I have enough information now to fire up the first instance for my OpenShift service. The first one will be the puppet master, as that will control the configuration of the rest.

What I know:

hostname - puppet.infra.example.com
base image - ami-b71078de
securitygroup(s) - default, allow SSH
SSH key pair name

Later, I will also need a static (ElasticIP) address and a DNS A record. Both of those can be set after the instance is running.

There is one last thing to decide. When you create an EC2 instance, you must specify the instance type which is a kind of sizing for the machine resources. AWS has a table of EC2 instance types that you can use to help you size your instances to your needs. Since I'm only building a demo, I'm going to use the t1.micro type. This has 7GB instance storage a single virtual core and enough memory for this purpose. The CPU usage is also free (Storage and unused elastic IPs still cost).

size: t1.micro

So, here we go, with the CLI tools:

thor ec2:instance create --name puppet --type t1.micro --image ami-b71078de --key <mykeyname> --securitygroup default 
task: ec2:instance:create --image ami-b71078de --name puppet
  id = i-d8c912bb

That's actually pretty.... anti-climactic. I've got a convention that each task echos the required arguments back as it is invoked. That way as the tasks are composed into bigger tasks, you can see what's going on inside while it runs.

All this one seemed to do was to return an image id. To see what's going on, I can request the status of the instance:

thor ec2:instance status --id i-d8c912bb
pending

Since I'm impatient, I do that a few more times and after about 30 seconds it changes to this:

thor ec2:instance status --id i-d8c912bb
running

I want to log in, but so far all I know is the instance ID. I can ask for the hostname.

thor ec2:instance hostname --id i-d8c912bb
ec2-23-22-234-113.compute-1.amazonaws.com

And with that I should be able to log in via SSH using my private key:

ssh -i ~/.ssh/<mykeyfile;>.pem ec2-user@ec2-23-22-234-113.compute-1.amazonaws.com
The authenticity of host 'ec2-23-22-234-113.compute-1.amazonaws.com (23.22.234.113)' can't be established.
RSA key fingerprint is 64:ec:6d:7d:af:ae:9a:70:78:0d:02:28:f1:c3:45:50.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-23-22-234-113.compute-1.amazonaws.com,23.22.234.113' (RSA) to the list of known hosts.

It looks like I generated and saved my key pair right, and specified it correctly when creating the instance.

The Fedora instances don't use the root user as the primary remote login. Instead, there's an ec2-user account which has ssh ALL:ALL permissions. That is, the ec2-user account can use sudo without providing a password. This really just gives you a little separation and forces you to think before you take some action as root.

Getting a Static IP Address

Now I have a host running and I can get into it, but the hostname is some long abstract string in the EC2 amazonaws.com domain. I want MY name on it. I also want to be able to reboot the host and have it get the same IP address and name. Well, it's not quite that simple.

Amazon EC2 has a curious and wonderful feature. Each running instance actually has two IP addresses associated with it. One is the internal IP address (the one configured in eth0). But that's in an RFC 1918 private network space. You can't route it. You can't reach it. You could even have a duplicate inside your corporate or home network.

The second address is an external IP address and this is the one you can see and can route to. Amazon works some router table magic at the network border to establish the connection between the internal and external addresses. What this means is that EC2 can change your external IP address without doing a thing to the host behind it. This is where Elastic IP addresses come in.

As with all of these things, you can do it from the web interface, but since I'm trying to automate things, I've made a set of tasks to manipulate the elastic IPs. I'm lazy and there's no other kind of IP in EC2 that I can change, so the tasks are in the ec2:ip namespace.

Creating a new IP is pretty much what you'd expect. You're not allowed to specify anything about it so it's as simple as can be:

thor ec2:ip create
task: ec2:ip:create
184.72.228.220

Once again, not very exciting. Since each IP must be unique, the address itself serves as an ID. An address isn't very useful until it's associated with a running instance. The ipaddress task can retrieve the IP address of an instance. It can also set the external IP address (the address must be an allocated Elastic IP)

thor ec2:instance ipaddress 184.72.228.220 --id i-d8c912bb
task:  ec2:instance:ipaddress 184.72.228.220

You can get the status and more information about an instance. You can also request the status using the instance name rather than the ID. For objects which have an ID and a name, you can query using either one, but you must specify it with an argument. For objects like the IP address which do not have a name, the id is the first argument f any query.

thor ec2:instance info --name puppet --verbose
EC2 Instance: i-d8c912bb (puppet)
  DNS Name: ec2-184-72-228-220.compute-1.amazonaws.com
  IP Address: 184.72.228.220
  Status: running
  Image: ami-b71078de
  Platform: 
  Private IP: 10.212.234.234
  Private Hostname: ip-10-212-234-234.ec2.internal

And now for something completely different: Route53 and DNS

I now have a a running host with the operating system and architecture I want. It has a fixed address. But it has a really funny domain name.

When I created my Route53 zones, I split them in two. infra.example.org will contain my service hosts. app.example.com will contain the application CNAME records. The broker will only have permission to change the application zone. It won't be able to damage the infrastructure either through a compromise or a bug.

I'm going to call the puppet master puppet.infra.example.org. It will have the IP address I was granted above.

All of the previous tasks were in the ec2: namespace. Route53 is actually a different service within AWS, so it gets its own namespace.

An IP address record has four components:

type
name
value
ttl (time to live, in seconds)

All of the infrastructure records will be A (address) records. The TTL has a regular default and there's no reason generally to override it. The value of an A record is IP address.

The name in an A record is a Fully Qualified Domain Name (FQDN). It has both the domain suffix and and the hostname and any sub-domain parts. To save some trouble parsing, the route53:record:create task expects the zone first, and the host part next as a separate argument. The last two arguments are the type and value.

thor route53:record create infra.example.org puppet a 184.72.228.220
task: route53:record:create infra.example.org puppet a 184.72.228.220

Also pretty anti-climactic. This time though there will be an external effect.

First, I can list the contents of the infra.example.org zone from Route53. Then I can also query the A record from DNS, though this may take some time to be available.

thor route53:record:get infra.example.org puppet A 
task: route53:record:get infra.example.org puppet A
puppet.infra.example.org. A
  184.72.228.220

And the same when viewed with host:

host puppet.infra.example.org
puppet.infra.example.org has address 184.72.228.220

The SOA records for AWS Route53 have a TTL of 900 seconds (15 minutes). When you add or remove a record from a zone, you also cause an update to the SOA record serial number. Between you and Amazon there are almost certainly one or more caching nameservers and they will only refresh their cache when the SOA TTL expires. So you could experience a delay of up to 15 minutes from the time that you create a new record in a zone and when it resolves. I'm hoping this doesn't hold true for individual records, because it's going to cause problems for OpenShift.

You can check the TTL of the SOA record by requesting the record directly using dig:

dig infra.example.org soa

; <<>> DiG 9.9.2-rl.028.23-P2-RedHat-9.9.2-10.P2.fc18 <<>> infra.example.org soa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60006
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;infra.example.org.  IN SOA

;; ANSWER SECTION:
infra.example.org. 900 IN SOA ns-1450.awsdns-53.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

;; Query time: 222 msec
;; SERVER: 172.30.42.65#53(172.30.42.65)
;; WHEN: Wed May 29 18:46:46 2013
;; MSG SIZE  rcvd: 130

The '900' on the first line of the answer section is the record TTL.

Wrapping it all up.

The beauty of Thor is that you can take each of the tasks defined above and compose them into more complex tasks. You can invoke each task individually from the command line or you can invoke the composed task and observe the process.

Because this task uses several others from both EC2 and Route53, I put it under a different namespace. All of the specific composed tasks will go in the origin: namespace.

The composed task is called origin:baseinstance. At the top I know the fully qualified domain name of the host, the image and securitygroups that I want to use to create the instance. Since I already have the puppet master this one will be the broker.

hosthame: broker.infra.example.org
image: ami-b71078de
instance type: t1.micro
securitygroups: default, broker
key pair name: <mykeypair>

thor origin:baseinstance broker --hostname broker.infra.example.org --image ami-b71078de --type t1.micro --keypair <mykeypair> --securitygroup default broker 
task: origin:baseinstance broker
task: ec2:ip:create
184.73.182.10
task: route53:zone:contains broker.infra.example.org
Z1PLM62Y00LCIN infra.example.org.
task: route53:record:create infra.example.org. broker A 184.73.182.10
- image id: ami-b71078de
task: ec2:instance:create ami-b71078de broker
  id = i-19b1f576
task: remote:available ec2-54-226-116-229.compute-1.amazonaws.com
task: ec2:ip:associate 184.73.182.10 i-19b1f576

This process takes about two minutes. If you add --verbose you can see more of what is happening. There is a delay waiting for the A record creation to sync so that you don't accidentally create negative cache records which can slow propagation. Also you can see the remote:available task which polls a host for SSH login access. This allows time for the instance to be created, start running and reach multi-user network state.

ssh ec2-user@broker.infra.example.org
The authenticity of host 'broker.infra.example.org (184.73.182.10)' can't be established.
RSA key fingerprint is 8f:db:46:25:bf:19:2e:47:f5:f4:4a:23:a5:98:e3:5c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'broker.infra.example.org,184.73.182.10' (RSA) to the list of known hosts.
Last login: Thu May 30 11:37:08 2013 from 66.187.233.206

I will duplicate this process for the data and message servers, and for one node to begin.
My tier of AWS only allows 5 Elastic IP addresses, so I'm at my limit. For a real production setup, only the broker, nodes and possibly the puppet master require fixed IP addresses and public DNS. The datastore and message servers could use dynamic addresses, but then they will require some tweaking on restart. I'm sure Amazon will give you more IP addresses for money, but I haven't looked into it.

Summary

There's a lot packed into this post:

Select an image to use as a base
Manage IP addresses
Bind IP addresses to running instances
Create a running instance.

All of this can be done with the AWS console. The ec2, route53 tasks just make it a little easier and the origin:baseinstance task wraps it all up so that creating new bare hosts is a single step.

In the next post I'll establish the puppet master service on the puppet server and install a puppet agent on each of the other infrastructure hosts. From then all of the service management will happen in puppet and we can let EC2 fade into the background.

References

EC2 Documentation

Route53 Documentation

Zone
Record

Access via SSH

OpenShift on AWS EC2, Part 3: Getting In and Out (securitygroups)

2013-05-28T09:21:00.001-07:00

In the previous two posts, I talked about tools to manage AWS EC2 with a CLI toolset, and preparing AWS Route53 so that the OpenShift broker will be able to publish new applications. There is one more facet of EC2 that needs to be addressed before trying to start the instances which will host the OpenShift service components.

AWS EC2 provides (enforces?) network port filtering. The filter rule sets are called securitygroups. AWS also offers two forms of EC2, "classic", and "VPC" (virtual private cloud). Managing securitygroups for classic and VPC are a little different. I'm going to present securitygroups in EC2-Classic. If you're going to use EC2-VPC, you'll need to read the Amazon documentation and adapt your processes to the VPC behaviors. Also note that securitygroups have a scope. They can be applied only in the region in which they are defined.

In EC2-Classic you must associate all of the securitygroups with a new instance when you launch it (create it from an image). You cannot change the set of securitygroups associated with an instances later. You can change the rulesets in the securitygroups and the new rules will be applied immediately to all of the members of the securitygroup.

Amazon provides a default securitygroup which basically restricts all network traffic to the members (but not *between* members). To make OpenShift work we will need a set of security groups which allow communications between the OpenShift Broker and the back-end services, and between the broker and nodes (through some form of messaging). We will also need to allow external access to the OpenShift broker (for control) and to the nodes (for user access to the applications).

The creation of the securitygroups probably does not need to be automated. The securitygroups will be created and the rulesets defined only once for a given OpenShift service. The web interface is probably appropriate for this.

Since we'll be creating the instances with the CLI, it will be necessary to be able to list, examine to apply the securitygroups to new instances there as well.

NOTE: These are not the security settings you are looking for.

The securitygroups and rulesets shown here are designed to demonstrate the securitygroup features and the user interface used to manage them. They are not designed with an eye to the best possible function and security for your service. You must look at your service design and requirements to create the best group and rulesets for your service.

Most people focus on the inbound (ingress) filtering rules. I'm going to go with that. I won't be defining any outbound (egress) rule sets.

I expect to need a different group for each type of host:

OpenShift broker
OpenShift node
datastore
message broker
puppetmaster

In addition I'm going to manage the service hosts with Puppet using a puppetmaster host. Each of the service hosts will be a puppet client. I don't think the puppet agent needs any special rules so I only have one additional securitygroup.

If I also planned to use an external authentication service on the broker, I would need a securitygroup for that. I could also extend this set to include build and test servers for development of OpenShift itself.

Defining Securitygroups

Each of the groups below has only a single rule. To be rigorous I could add the SSH (22/TCP) rule to the node securitygroup. It is actually required for the operation of the node, not just for administrative remote access.

securitygroup	service	port/proto	source	comments
default	SSH	22/TCP	OpenShift Ops	remote access and control
puppetmaster	puppetmaster	8140/TCP	all managed hosts	configuration management
datastore	mongodb	27017/TCP	OpenShift Broker Hosts	NoSQL DB
messagebroker	activemq/stomp	61613/TCP	OpenShift broke and node hosts	carries MCollective
broker	httpd (apache2)	80/TCP, 443/TCP	OpenShift Ops and Users (unrestricted)	Ruby on Rails and Passenger
node	httpd (apache2)	80/TCP, 443/TCP	OpenShift Application Users (unrestricted)	HTTP routing
	Web Sockets	8000/TCP, 8443/TCP	OpenShift App user	web sockets
	SSH	22/TCP	OpenShift App Users (unrestricted)	shell and app control

Populating each securitygroup is a two step process. First create the empty security group. Then add the rules to the group. At that point, the group is ready to be applied to new instances.

Creating a Securitygroup

Each security group starts with a name and an option description string. The restrictions on the names are different from EC2-Classic and EC2-VPC securitygroups. See the Amazon documentation for the differences. Simple upper/lower case strings with no white space are allowed in both. The descriptions are more freeform.

You can add new securitygroups on the AWS EC2 console page. Select the "Security Groups" tab on the left side and click "Create Security Group". Fill in the name and description fields, make sure that the VPC selector indicates "No VPC" and click "Yes, Create".

Adding Rulesets

Securitygroup rulesets are one of the more complex elements in EC2. When using the web interface, Amazon provides a set of pre-defined rules for things like HTTP and SSH and common database connections. You should use them when they're appropriate. The web interface also allows you to create custom rulesets.

There are several things to note about this display. The default group has three mandatory rules (blue and white bars in the lower right). These allow all of the members of the group unrestricted access to each other.

I'm adding the SSH rule which allows inbound port 22 connections. I'm leaving the source as the default 0.0.0.0/0. This is the IPv4 notation for "everything", so there will be no restrictions on the source of inbound SSH connections. If you want to restrict SSH access so that connections come only from your corporate network, you can set the exit address space for your company there.

Since the members of the default group have unrestricted access to each other and since I'm going to apply the default group to all of my instances, it turns out that I only need special rules for access to hosts from the outside. I need to add the SSH rule above, and I need to allow web access to the broker and node hosts. I am going to create these as distinct groups because I can't change the assigned groups for an instance after it is launched. I'd like the ability to restrict access to the broker later.

If I were to apply rigorous security to this setup, I would avoid using the default group. Instead I would create a distinct group for each service component. Then I would add rulesets which allow only the required communications. This would decrease the risk that a compromise of one host would grant access to the rest of the service hosts.

Since it's a one-time task, I created both of my securitygroups and rulesets using the web interface. I have written Thor tasks to create and populate securitygroups:

 thor help ec2:securitygroup
Tasks:
  thor ec2:securitygroup:create NAME                        # create a new se...
  thor ec2:securitygroup:delete                             # delete the secu...
  thor ec2:securitygroup:help [TASK]                        # Describe availa...
  thor ec2:securitygroup:info                               # retrieve and re...
  thor ec2:securitygroup:list                               # list the availa...
  thor ec2:securitygroup:rule:add PROTOCOL PORTS [SOURCES]  # add a permissio...
  thor ec2:securitygroup:rules                              # list the rules ...

Options:
  [--verbose]

The list of tasks is incomplete, as I have not needed to change or delete rulesets. If I find that I need those tasks, I'll add them.

Next Up

This is everything that must be done before beginning to create running instances for my OpenShift service. In the next post I'll select a base image to use for my host instances and begin creating running machines.

References

AWS Network Security (securitygroups) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

OpenShift on AWS EC2, Part 2: Being Seen (DNS)

2013-05-26T18:44:00.000-07:00

OpenShift is, at least in part, a publication system. Developers create applications and OpenShift tells the world about them. This means that the very first thing you need to think about when you're considering creating an OpenShift service is "what do I call it?"

I actually created two zones when setting up the DNS for OpenShift. The servers reside in one zone, and the user applications in another. The broker service will be making updates to the application zone. It doesn't seem like a good idea to have the server hostnames in the same zone where a bug or intrusion could alter or delete them. Something like this will do.

infra.example.org - contains the server hostnames
app.example.org - contains the application records

Picking a Domain Name (and a Registrar)

In most cases your choice is going to be constrained by what domains you own or have access too. You may need (as I did) to purchase a domain from a domain registrar. Or you will have to have your corporate IT department delegate a domain for you (whether they run it or you do).

When you register or delegate a domain your domain registrar will request a list of name servers which will be serving the content of your domain. Route53 won't tell you the nameservers until you tell them what domain they'll be serving for you. That means that creating a domain, if you don't have one, is a 3 step exchange:

Request domain from a registrar
Tell Route53 to serve the domain for you
Tell your registrar which Route53 nameservers will be providing your domain

These steps will happen so rarely that I haven't bothered to script them. I just use the web interface for each step.

NOTE: there are technical differences between a zone and a domain but I'm going to treat them as synonyms for this process. When you're registering, it's called a domain. When you're going to change the contents it's called a zone.

Each registrar will have a different means for you to set your domain's nameserver records. You'll have to look them up yourself. If you're getting a domain delegated from your corporate IT department you'll have to give them the list of Route53 nameservers so that they can install the "glue records" into their service.

So, pick your Registrar, search for an available domain, request, register, and pay. Then head over to the AWS Route53 console.

Adding a zone to Route53

On the web interface, click "Create Hosted Zone" in the top tool bar. You'll see this dialog on the right side.

Fill in the values for your new domain and a comment, if you wish. Then click "Create Hosted Zone" at the bottom of the dialog and Route53 will create your zone and assign a set of nameservers.

Make a note of the "Delegation Set". This is the set of nameservers which you need to provide to your domain registrar. The registrar will provide some place to enter the nameserver list and then they will add the glue records to the top-level domain.

Make a note as well of the "Hosted Zone ID". That's what you will use to select the zone to update when you send requests to AWS Route53.

When the domain registrar completes adding the Route53 nameservers it's time to come back to the thor CLI tools installed in part one.

Viewing the Route53 DNS information

You certainly can view the DNS information on the Route53 console. If you've set up the AWS CLI tools indicated in the previous post you can also view them on the CLI.

NOTE: If you haven't followed the previous post, you should before you continue here.

First list the zones you have registered.

thor route53:zone:list
task: route53:zone:list
id: <YOURZONEID> name: app.example.org. records: 3

Now you can list the records in the zone (indicating the zone by name)

thor route53:record:list app.example.org
task: route53:record:list app.example.org
looking for zone id <YOURZONEID>
example.org. NS
  ns-131.awsdns-16.com.
  ns-860.awsdns-43.net.
  ns-2023.awsdns-60.co.uk.
  ns-1076.awsdns-06.org.
app.example.org. SOA
  ns-131.awsdns-16.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

Adding a DNS Record

The real goal in all of this is that the OpenShift broker must be able to add and remove records for applications. OpenShift uses the aws-sdk rubygem. The Thor tasks also use that gem. You can call them from the command line or use them to compose more complex operations.

OpenShift currently uses CNAME records to publish applications rather than A records. This is largely to allow for rapid re-naming or re-numbering of nodes within AWS. The use of CNAME records (which are aliases to another FQDN or fully qualified domain name means that the node which hosts the applications can be renumbered without the need to update every DNS record for every application. If bulk updates of DNS are not expensive, I believe that OpenShift could use A records, though it could require significant recoding.

To verify your DNS domain has been properly configured, add a CNAME record. Thor provides a standard help option for each query.

thor help route53:record:create
Usage:
  thor route53:record:create ZONE NAME TYPE VALUE

Options:
  [--ttl=N]    
               # Default: 300
  [--verbose]  
  [--wait]     

create a new resource record

From this you can craft a command. This example includes the --wait and --verbose options so that you can observe the process. Without the --wait option, the task will complete and return, but there will be a propagation delay before the name will resolve. With the --wait option, the task polls the Route53 service until it reports that the DNS services have synched.

thor route53:record create app.example.org test1 CNAME test2.app.example.org --verbose --wait
task: route53:record:create app.example.org test1 CNAME test2.infra.example.org
update record = {:comment=>"add CNAME record test1.app.example.org", :changes=>[{:action=>"CREATE", :resource_record_set=>{:name=>"test1.app.example.org", :type=>"CNAME", :ttl=>300, :resource_records=>[{:value=>"test2.infra.example.org"}]}}]}
response = {:change_info=>{:id=>"/change/C2VQAFRSE6OXMY", :status=>"PENDING", :submitted_at=>2013-05-27 00:24:19 UTC, :comment=>"add CNAME record test1.app.example.org"}}
1) change id: /change/C2VQAFRSE6OXMY, status: UNKNOWN - sleeping 5
2) change id: /change/C2VQAFRSE6OXMY, status: PENDING - sleeping 5
3) change id: /change/C2VQAFRSE6OXMY, status: PENDING - sleeping 5
4) change id: /change/C2VQAFRSE6OXMY, status: PENDING - sleeping 5
5) change id: /change/C2VQAFRSE6OXMY, status: PENDING - sleeping 5

When this command completes the new record should resolve:

host -t cname test1.app.example.org 
test1.app.example.org is an alias for test2.infra.example.org.

Also, now if you list the zone records with thor route53:record list app.example.org you'll see the new CNAME record.

Deleting a DNS record

Deleting a DNS record is nearly identical to adding one. To insure that you are deleting the correct record the delete task requires the same complete inputs as the create task.

thor route53:record delete app.example.org test1 CNAME test2.infra.example.org --verbose --wait
task: route53:record:delete app.example.org CNAME test1
update record = {:comment=>"delete CNAME record test1.app.example.org", :changes=>[{:action=>"DELETE", :resource_record_set=>{:name=>"test1.app.example.org", :type=>"CNAME", :ttl=>300, :resource_records=>[{:value=>"test2.infra.example.org"}]}}]}
response = {:change_info=>{:id=>"/change/C3ORAEV7FTLPBJ", :status=>"PENDING", :submitted_at=>2013-05-27 00:58:25 UTC, :comment=>"delete CNAME record test1.app.example.org"}}
1) change id: /change/C3ORAEV7FTLPBJ, status: UNKNOWN - sleeping 5
2) change id: /change/C3ORAEV7FTLPBJ, status: PENDING - sleeping 5
3) change id: /change/C3ORAEV7FTLPBJ, status: PENDING - sleeping 5
4) change id: /change/C3ORAEV7FTLPBJ, status: PENDING - sleeping 5
5) change id: /change/C3ORAEV7FTLPBJ, status: PENDING - sleeping 5

Again, with the --verbose and --wait options, the task will not complete until the DNS change has propagated. When it completes, the name will no longer resolve.

host -t cname test1.app.example.org
Host test1.app.example.org not found: 3(NXDOMAIN)

Summary

Now that we've registered a domain, and arranged to have it served by Route53, we can add and remove names. When we configure OpenShift, it will be able to publish new application records.

Next Time

We still have to create hosts to run the OpenShift service. On AWS that means creating instances , virtual machines in Amazon's cloud. Amazon applies some fairly restrictive nework level packet filtering. They use a feature called a securitygroup to define the filtering rules. In the next post, I'll discuss how to create and manage new securitygroups, and what groups we'll need to allow OpenShift to operate.

Resources

AWS Route53 DNS management console - https://console.aws.amazon.com/route53/home
ICANN list of Accredited Registrars - http://www.icann.org/registrar-reports/accredited-list.html
DNS Zones - https://en.wikipedia.org/wiki/DNS_zone

OpenShift on AWS EC2, Part 1: From the wheels up

2013-05-23T14:10:00.001-07:00

Someone asked me recently how to build an OpenShift Origin service on Amazon Web Services EC2. My first thought was "easy, we do this all the time". I started going through what exists for our own testing, development and deployment. It clearly works, it's clearly the place to start, right? Just fire up a few instances, tweak the existing puppet configs and zoom! right?

Then I started trying to figure out how to describe it and adapt it to general use, and I found myself adding more and more caveats and limitations and internal assumptions. It's grown organically to do what is needed but what I have available isn't really designed for general use. Some of it I couldn't understand just from reading and observing (since I'm kind of a hands on break-it-to-understand-it kind of guy). Time to start taking it apart so I can put it back together. When I can do that and it starts up when I turn the key, then I can claim to understand it.

So I decided to go back to the fundamentals not of OpenShift, but of AWS EC2 itself.

Defining a Goal: Machines Ready To Eat

An OpenShift service consists of a number of component services. Ideally each component would have multiple instances for availability and scaling, but that's not required for initial setup. Only the OpenShift broker, console and nodes need to be exposed to the users.

The host configuration is complex enough that even for a small service it is best to use a Configuration Management System (CMS) to configure and manage the system, but the CMS can't start work until the hosts exist and have network communications. The CMS itself must be installed and configured. Once the hosts exist and are bound together then the CMS can do the rest of the work and a clean boundary of control and access is established. This will later allow the bottom layer (establishing hosts and installing/configuring the CMS) to be replaced without affecting the actual service installation above.

So the goal here is: create and connect hosts with a CMS installed using EC2. That's the base on which the OpenShift service will be built. If you run each of the component services on its own host using external DNS and authentication services, OpenShift requires a minimum of four hosts:

OpenShift Broker
Data Store (mongodb)
Message Broker (activemq)
OpenShift Node

Each of these can (theoretically, at least) be duplicated to provide high availability, but for now I'll start there. The goal of this series of posts is to create the hosts on which these services will be installed. We won't come back to OpenShift itself until that's done.

AWS EC2: Getting the lay of the land

If you're not familiar with AWS EC2, go check out https://aws.amazon.com . EC2 is the part of AWS which provides "virtual" hosts (for a fee, of course). There are free-to-try levels, but you are required to give a credit card to sign up and you're very likely to start incurring charges for storage even if you stick to the "free" tier. Read, be informed, decide for yourself.

AWS without the "W"

AWS presents a modern single-page web interface for all interactions, but I'm interested in command line or scripted interaction. Amazon does provide a REST protocol and has implemented libraries for a wide number of scripting languages. I'm using the rubygem-aws-sdk library (which is, surprisingly enough, written in Ruby) because I also want to use another Ruby tool called Thor.

Tasks and the Command Line Interface

Thor is a ruby library which helps create really nice command line "tasks". The beauty of Thor is that you can use it both to define individual tasks and to compose those tasks into more complex task sequences. This allows you to test each step as a distinct CLI operation and also to debug only the step that fails when one inevitably does.

I'm going to use Thor and the aws-sdk to create a CLI interface to the AWS low level operations, and then compose them to create higher level tasks which, in the end, will leave me with a set of hosts ready to receive an OpenShift service.

I'm not going to try to create a comprehensive CLI interface to AWS. I'm only going to create the steps that I need to get this job done. A number of the steps will encapsulate operations which may seem trivial, but this will allow for better consistency and visibility of the operations. A primary goal is to have as little magic as possible. At the same time, I want to avoid overwhelming the user (me) with unnecessary detail when things are working as planned.

I'm not going to make you sit through the entire development process (which isn't complete). Instead I mean to show the tools that I've developed and use them to cleanly define the base on which an OpenShift service would sit.

AWS Setup

To work with AWS, you must have an established account. To use the the REST API you need to have generated a set of access keys. To log into your EC2 instances you need to have generated a set of SSH key pairs and placed them so your SSH client can find them. (Usually in $HOME/.ssh) and configure your ssh client to use those keys when logging into EC2 instances (in $HOME/.ssh/config).

AWS Access Keys
AWS SSH Key Pairs
SSH client configuration

You can learn about and generate both sets of keys on the AWS Security Credentials page

Origin-Setup (really EC2 and SSH tools)

The tool set is currently called origin-setup and it resides in a repository on Github. The name is a misnomer, there's not actually any OpenShift in most of it.

Github repo URL: https://github.com/markllama/origin-setup

Requirements

The tasks are written in Ruby using the Thor library. They also require several other rubygems. All of them are available on Fedora 18 as RPMs.

ruby
rubygems
rubygem-thor
rubygem-aws-sdk
rubygem-parseconfig
rubygem-net-ssh
rubygem-net-scp

Getting (and setting) the Bits

Thor can be used to create stand-alone CLI commands, but I have not done that yet for these tasks. To use them you need to cd into the origin-setup directory and call thor directly. You will also need to set the RUBYLIB path to find a small helper library which manages the AWS authentication.

git clone https://github.com/markllama/origin-setup
cd origin-setup
export RUBYLIB=`pwd`/lib
thor list --all

AWS Again: configuring the toolset

The final step is to give the origin-setup toolset the information needed to communicate with the AWS REST interface.

AWSAccessKeyId=YOURKEYIDHERE
AWSSecretKey=YOURSECRETKEYHERE
AWSKeyPairName=YOURKEYPAIRNAMEHERE
RemoteUser=ec2-user
AWSEC2Type=t1.micro

This file contains what is essentially the passwords to your AWS account. You should set the permissions on this file so that only you can read it and protect the contents as you would your credit card.

The RemoteUser is the default user for SSH logins (F18+). For RHEL6 it would be root. The AWSEC2Type value defines the default instance "type" to be created when you create a new instance. The t1.micro instance type is small and it is in the free tier. You will need to choose a larger type for real use.

Turn the Key

You should be able to use the thor command to explore the list of available tasks. Thor allows the creation of namespaces to contain related tasks. Most of the important tasks to begin with are in the ec2 namespace.

You can see the available tasks with the thor list command:

thor list ec2 --all
ec2
---
thor ec2:image:create                                     # Create a new imag...
thor ec2:image:delete                                     # Delete an existin...
thor ec2:image:find TAGNAME                               # find the id of im...
thor ec2:image:info                                       # retrieve informat...
thor ec2:image:list                                       # list the availabl...
thor ec2:image:tag --tag=TAG                              # set or retrieve i...
thor ec2:instance:create --image=IMAGE --name=NAME        # create a new EC2 ...
thor ec2:instance:delete                                  # delete an EC2 ins...
thor ec2:instance:hostname                                # print the hostnam...
thor ec2:instance:info                                    # get information a...
thor ec2:instance:ipaddress [IPADDR]                      # set or get the ex...
thor ec2:instance:list                                    # list the set of r...
thor ec2:instance:private_hostname                        # print the interna...
thor ec2:instance:private_ipaddress                       # print the interna...
thor ec2:instance:rename --newname=NEWNAME                # rename an EC2 ins...
thor ec2:instance:start                                   # start an existing...
thor ec2:instance:status                                  # get status of an ...
thor ec2:instance:stop                                    # stop a running EC...
thor ec2:instance:tag --tag=TAG                           # set or retrieve i...
thor ec2:instance:wait                                    # wait until an ins...
thor ec2:ip:associate IPADDR INSTANCE                     # associate and Ela...
thor ec2:ip:associate IPADDR INSTANCE                     # associate and Ela...
thor ec2:ip:create                                        # create a new elas...
thor ec2:ip:delete IPADDR                                 # delete an elastic IP
thor ec2:ip:list                                          # list the defined ...
thor ec2:securitygroup:create NAME                        # create a new secu...
thor ec2:securitygroup:delete                             # delete the securi...
thor ec2:securitygroup:info                               # retrieve and repo...
thor ec2:securitygroup:list                               # list the availabl...
thor ec2:securitygroup:rule:add PROTOCOL PORTS [SOURCES]  # add a permission ...
thor ec2:snapshot:delete SNAPSHOT                         # delete the snapshot
thor ec2:snapshot:list                                    # list the availabl...
thor ec2:volume:delete VOLUME                             # delete the volume
thor ec2:volume:list                                      # list the availabl...

It's time to see if you can talk to EC2. This first query requests a list of images produced by the Fedora hosted team:

thor ec2:image list --name \*Fedora\* --owner 125523088429
ami-2509664c Fedora-x86_64-17-1-sda
ami-4b0b6422 Fedora-i386-17-1-sda
ami-6f640c06 Fedora-i386-18-20130521-sda
ami-b71078de Fedora-x86_64-18-20130521-sda
ami-d13758b8 Fedora-18-ec2-20130105-x86_64-sda
ami-dd3758b4 Fedora-18-ec2-20130105-i386-sda
ami-ed375884 Fedora-17-ec2-20120515-i386-sda
ami-fd375894 Fedora-17-ec2-20120515-x86_64-sda

If instead you get a really long messy ruby error, then check the permissions and contents of your ~/.awscred file.

It's probably a good idea, before experimenting too much here to go get familar with EC2 and Route53 using the web console a bit.

Next post I'll establish the DNS zone in Route53 and show how to manage DNS records to prepare for my OpenShift service.

References

AWS EC2 Console - managing remote virtual machines
AWS Route53 (DNS) Console - managing DNS
rubygem-aws-sdk - an implimentation of the AWS REST protocol in Ruby
SSH publickey - secure login without passwords
Thor - A ruby gem to build command line interface "tasks"
Puppet - A popular Configuration Management System
Git - a popular Source Code Management system
Github - a site for keeping Git repositories

origin-setup - a set of Thor tasks for managing AWS EC2 and Route53
With a goal of automating the creation of an OpenShift Origin service in EC2

OpenShift Process Tools (for humans)

2013-04-03T11:16:00.000-07:00

Making Sausage: It ain't for the faint of stomach

Otto von Bismark is known (among other things) for his observations on law and sausage. The observation could apply to software as well, but there are lots of us with cast-iron stomachs needed to produce some really wonderful stuff. If you're one undaunted, read on.

I give a lot of attention to the parts running under an OpenShift service. I want people to be able to run their own service, to understand, configure, tune and diagnose problems with it. But I REALLY want people to understand that, if there's something they want it to do, that it doesn't yet, they don't have to wait for Red Hat to do it for them.

I have a bit of a different position from most people with respect to OpenShift development. It came about by serendipity, but it suits me well and the management seem happy with it for now. I'm not actually in the product development hierarchy. I work on things mostly from the outside. I work to experience installing and configuring OpenShift as someone from the community, and I comment and report using the same channels the community members have. (I talk about that some in a blog interview with Gordon Haff)

Red Hat has a vested interest in the success of OpenShift, but it defines that interest in terms that go beyond market penetration and adoption rates. Most of the developers currently work at Red Hat, but they are going out of their way to allow people to see not just the inner workings of the code, but of the process, and to form a community of contributors who help steer and shape what OpenShift becomes.

Building a community is itself a process and there are learning steps so it's not all there yet, but you're invited already.

Where's the Beef?

When I started at Red Hat and began coding, one of the first things I asked was "where do I put my code". Coming from a background at proprietary software companies I was asking "where's the internal code repository?". I got a quizzical look for a second and then a reply: "Umm. Github?" like I'd just asked "Where's the bathroom" while standing at the mirror washing my hands. (Some people use bitbucket or sourceforge or a number of others)

OpenShift has been on Github for quite a while. It was one of the first big moves to bring OpenShift to the community (the fact that it made life easier for our folks was a warmly anticipated beneficial side effect). The hottest newest stuff is out there. It's not always pretty. If you look at the wiki and blog posts (especially from me) you'll find a number that are out of date because they contained hacks around warts and things have been changed or fixed (it's a good idea to check on the #openshift-dev channel on freenode if you're ever wondering about something). It's embarrassing sometimes to find something's gone stale, but the alternative is waiting or hiding things, and we're not doing that. (There are people working on the Official Documentation, but that's for Official Releases. We're talking bleeding edge here)

All of the developers (and by this, I include you) work by forking the repository, making changes on a branch and then submitting a pull request to bring those changes back into the master repository. The pull requests get review and commentary and then get run through automated tests (discussed next), and when they're ready are merged.

Have it your way.

The newest move is the switch to using Trello for planning. OpenShift has always done development using an Agile process based on the Scrum framework. The project is now hosted on Trello and you're invited to look and contribute.

The OpenShift Origin Broker Scrum Board

Trello is free in the same way that Github is. To contribute, get an account and join the OpenShift organization. It's a web based scrum task board system. New features are added to the board as "cards". The cards are used to define and track the tasks needed to complete the feature. They're moved along the board from inception to completion as the tasks are defined, filled in, assigned (or assumed) and completed. All of the planning and work happen in plain sight.

If you're new to Agile or Scrum you want to take some time to look up what they are and how they work. Scrum is known as a "framework" for a reason. It's a set of priorities and guidelines, not rules. Each team has it's own conventions and etiquette. You'll get the best response from people if you observe a bit and dip your toe in slowly.

Check out Trello's Help site and take the tour for some idea of what Trello itself is and does, and how it works. Take a look too at the first card in the Broker board. It describes how the OpenShift team is expected to use Trello. Take a look at the story template card. It show the skeleton of the questions a card should ask (and answer).

A card, still capturing requirements.

The Trello site is a place for contributors. It's not a forum or question and answer session. Those are better served on the OpenShift fora, on IRC or on the mailing lists. Reasonable suggestions are encouraged. See this one requesting feedback on an update to the PHP cartridge as an example.

Welcome to the kitchen

Well it's hot in here. The stainless steel doors with the windows in them so the servers don't crash into each other are flapping on their springs behind you. The knife rack and the prep counter are in front of you. You know your way around an industrial fridge? All of us bus dishes now and then. You're a bit overdressed, but get to work we've got some hungry folks out there waiting. Check that card over your head and start cooking.

Resources

Git - Software Revision Control System - http://git-scm.com/
Github - an online service for Git - https://github.com
Agile Software Development - https://en.wikipedia.org/wiki/Agile_software_development
The Agile Manifesto - http://www.agilemanifesto.org/
The Scrum Framework - https://en.wikipedia.org/wiki/Scrum_(development)
Trello - online Scrum board service - https://www.trello.com
OpenShift community fora - https://www.openshift.com/forums/openshift
OpenShift developer mailing list - https://lists.openshift.redhat.com/openshiftmm/listinfo/dev

Installing (but not configuring) the broker service by hand

2013-03-22T13:31:00.000-07:00

I'm working through a totally(?) manual installation of the OpenShift Origin service on Fedora 18. The last post on this topic was about building the RPMs on your own Yum repository. This time I'm going to install the broker service and make a few tweaks that are still required.

One seriously major thing to note is that I don't recommend actually doing this. I'm doing it to shed some light on some of the things still going on in the development process and to highlight the ways in which you can get some visibility into the installation and monitoring of the service.

If you're interested in building and running your own development environment or service for real, I suggest starting by reading through Krishna Raman's article on creating a development environment using Vagrant and Puppet and the puppet script sources themselves to see what's involved. Finally there's a comprehensive document that describes the procedure with fewer warts.

Ingredients

As usual, I start with a clean minimal install of Fedora 18. In addition this time I also have a yum repository filled with a bleeding-edge build from source as I described previously. Finally I have a prepared MongoDB server waiting for a connection.

I'm replacing my real URLs and access information with dummies for demonstration purposes.

Yum repo URL
http://myrepo.example.com/origin-server

MONGO_HOST_PORT="mydbhost.example.com:27017"
MONGO_USER="openshift"
MONGO_PASSWORD="dontuseme"
MONGO_DB="openshift"

Preparation

Since I'm building my own packages from source and placing them in a Yum repository, I need to add that repo to the standard set. I'll add a new file to /etc/yum.repod.d referring to my yum server.

Even if you're building from your own sources, there are still some packages you need to get that aren't in either the stock Fedora repositories or in the OpenShift sources. These are generally packages with patches that are in the process of moving upstream or are in the acceptance process for Fedora. Right now a set is maintained by the OpenShift build engineers. I need to add the repo file for that too:

[origin-server]
name=OpenShift Origin Server
baseurl=http://myrepo.example.com/openshift-origin
enable=1
gpgcheck=0

[origin-extras]
name=Custom packages for OpenShift Origin Server
baseurl=https://mirror.openshift.com/pub/openshift-origin/fedora-18/x86_64/
enable=1
gpgcheck=0

At this point you can install the openshift-origin-broker package.

yum install openshift-origin-broker
...
  urw-fonts.noarch 0:2.4-14.fc18                                                
  v8.x86_64 1:3.13.7.5-1.fc18                                                   
  xorg-x11-font-utils.x86_64 1:7.5-10.fc18                                      

Complete!

There are a set of Rubygems that are not yet packaged as RPMs. I need to install these as gems for now.

gem install mongoid
Fetching: i18n-0.6.1.gem (100%)
Fetching: moped-1.4.4.gem (100%)
Fetching: origin-1.0.11.gem (100%)
Fetching: mongoid-3.1.2.gem (100%)
Successfully installed i18n-0.6.1
Successfully installed moped-1.4.4
Successfully installed origin-1.0.11
Successfully installed mongoid-3.1.2
3 gems installed
Installing ri documentation for moped-1.4.4...
Building YARD (yri) index for moped-1.4.4...
Installing ri documentation for origin-1.0.11...
Building YARD (yri) index for origin-1.0.11...
Installing ri documentation for mongoid-3.1.2...
Building YARD (yri) index for mongoid-3.1.2...
Installing RDoc documentation for moped-1.4.4...
Installing RDoc documentation for origin-1.0.11...
Installing RDoc documentation for mongoid-3.1.2...

There are a number of gem version restrictions in the broker Gemfile which are not met by the current rubygem RPMs. I have to remove the version restrictions so that the broker application will use what is available. This risks breaking things due to interface changes, but will at least allow the broker application to start.

sed -i -f - <<EOF /var/www/openshift/broker/Gemfile
/parseconfig/s/,.*//
/minitest/s/,.*//
/rest-client/s/,.*//
/mocha/s/,.*//
/rake/s/,.*//
EOF

For some reason, even with the --without clause for :test and :development, bundle still wants the mocha rubygem. This should not be required for production, but right now you need to install it so that the Rails application will start.

yum install rubygem-mocha
...
Installed:
 rubygem-mocha.noarch 0:0.12.1-1.fc18

Dependency Installed:
  rubygem-metaclass.noarch 0:0.0.1-6.fc18

Verifying The Dependencies

Now that all of the software dependencies have been installed (mostly by RPM requirements through Yum, and finally through gem requirements and some version tweaking of the Gemfile) I can check that all of them resolve when I start the application. Rails will call bundler when the application starts so I'll call it explicitly before hand. I'm only interested in the production environment, so I'll explicitly exclude development and test.

cd /var/www/openshift/broker
bundle --local
Using rake (0.9.6) 
Using bigdecimal (1.1.0)
....
Using systemu (2.5.2)
Using xml-simple (1.1.2)
Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.

If I try to start the rails console now, though, I'll be sad. It won't connect to the database.

Configure MongoDB access/authentication

The OpenShift broker is (right now) tightly coupled to MongoDB. Recently it switched to using the rubygem-mongoid ODM module (which is a definite plus if you have to work on the code).

The last thing I need to do before I can fire up the Rails console with the broker application is to set the database connectivity parameters. One side effect of using an ODM is that it establishes a connection to the database the moment the application starts.

NOTE: when this is done I will not have a complete working broker server. I still need to configure the other external services: auth, dns and messaging.

Set the values listed in the Ingredients into /etc/openshift/broker.conf.

/etc/openshift/broker.conf
...
# Eg: MONGO_HOST_PORT="<host1:port1>,<host2:port2>..."
MONGO_HOST_PORT="mydbhost.example.com:27017"
MONGO_USER="openshift"
MONGO_PASSWORD="dontuseme"
MONGO_DB="openshift"
MONGO_SSL="false"

...

Now I can try starting the rails console. It should connect to the mongodb and offer an irb prompt:

To verify the database connectivity, take a look at this recent blog post.

Next up is configuring each plugin, one by one.

Gist Scripts

I'm trying something new. Rather than including code snippets inline, I'm going to post them as Github Gist entries.

Add Yum Repos - oo-add-repo.sh
Fix broker requirements - oo-broker-fix-requirements.sh