Under The Hood of Cloud Computing: Docker: Re-using a custom base image

Here's the next step in the ongoing saga of containerizing the Pulp service in Docker for use with Kubernetes.

In the last post I spent a bunch of effort creating a base image for a set of Pulp service components. Then I only implemented one, the Celery beat server. In this (hopefully much shorter) post I'll create a second image from that base. This one is going to be the Pulp Resource Manager service.

A couple of recap pieces to start.

The Pulp service is made up of several independent processes that communicate using AMQP messaging (through a QPID message bus) and by access to a MongoDB database. The QPID services and the MongoDB services are entirely independent of the Pulp service processes and communicate only over TCP/IP. There are also a couple of processes that are tightly coupled, both requiring access to shared data. These will come later. What's left is the Pulp Resource Manager process and the Pulp Admin REST service.

I'm going to take these in two separate posts to make them a bit more digestible than the last one was.

Extending the Base - Again

As in the case with the Pulp Beat service, the Resource Manager process is a singleton. Each pulp service has exactly one. (Discussions of HA and SPOF will be held for later). The Resource Manager process communicates with the other components solely through the QPID message broker and the MongoDB over TCP. There is no need for persistent storage.

In fact the only difference between the Beat service and the Resource Manager is the invocation of the Celery service. This means that the only difference between the Docker specifications is the name and two sections of the run.sh file.

The Dockerfile is in fact identical in content to that for the Pulp Beat container:

Now to the run.sh script.

The first difference in the run.sh is simple. The Beat service is used to initialize the database. The Resource Manager doesn't have to do that.

The second is also pretty simple: The exec line at the end starts the Celery service use the resource_manager entry point instead of the beat service.

I do have one other note to myself. It appears that the wait_for_database() function will be needed in every derivative of the pulp-base image. I should probably refactor that but I'm not going to do it yet.

One Image or Many?

So, if I hadn't been using shell functions, this really would come down to two lines different between the two. Does it really make sense to create two images? It is possible to pass a mode argument to the container on startup. Wouldn't that be simpler?

It actually might be. It is possible to use the same image and pass an argument. The example from which mine are derived used that method.

I have three reasons for using separate images. One is for teaching and the other two are development choices. Since one of my goals is to show how to create custom base images and then use derived images to create customizations I used this opportunity to show that.

The deeper reasons have to do with human nature and the software development life cycle.

People expect to be able to compose service by grabbing images off the shelf and plugging them together. Adding modal switches to the images means that they are not strongly differentiated by function. You can't just say "Oh, I need 5 functional parts, let me check the bins". You have to know more about each image than just how it connects to others. You have to know that this particular image can take more than one role within the service. I'd like to avoid that if I can. Creating images with so little difference feels like inefficiency, but only when viewed from the standpoint of the person producing the images. To the consumer it maintains the usage paradigm. Breaks in the paradigm can lead to mistakes or confusion.

The other reason to use distinct images has to do with what I expect and hope will be a change in the habits of software developers.

Developers of complex services currently feel a tension, when they are creating and packaging their software, between putting all of the code, binaries and configuration templates into a single package. You only create a new package if the function is strongly different. This makes it simpler to install the software and configure it once. On traditional systems where all of the process components would be running on the same host there was no good reason to separate the code for distinct processes based on their function. There are clear cases where the separation does happen in host software packaging, notably in client and server software. These clearly will run on different hosts. Other cases though are not clear cut.

The case of the Pulp service is in a gray area. Much of the code is common to all four Celery based components (beat, resource manager, worker and admin REST service). It is likely possible to refactor the unique code into separate packages for the components, though the value is questionable at this point.

I want to create distinct images because it's not very expensive, and it allows for easy refactoring should the Pulp packaging ever be decomposed to match the actual service components. Any changes would happen when the new images are built, but the consumer would not need to see any change. This is a consideration to keep in mind when ever I create a new service with different components from the same service RPM.

Running and Verifying the Resource Manager Image

The Pulp Resource Manager process makes the same connections that the Pulp Beat process does. It's a little harder to detect the Resource Manager access to the database since the startup doesn't make radical changes like the DB initialization. I'm going to see if I can find some indications that the resource manager is running though. The QPID connection will be much easier to detect. The Resource Manager creates its own set of queues which will be easy to see.

The resource manager requires the database service and an initialized database. Testing this part will start where the previous post left off, with running QPID and MongoDB and with the Pulp Beat service active.

NOTE: there's currently (20140929) a bug in Kubernetes where, during the period between waiting for the image to download and when it actually starts, kubecfg list pods will indicate that the pods have terminated. If you see this, give it another minute for the pods to actually start and transfer to the running state.

Testing in Docker

All I need to do using Docker directly is to verify that the container will start and run. The visibility in Kubernetes still isn't up to general dev and debugging.

docker run -d --name pulp-resource-manager \
  -v /dev/log:/dev/log \
  -e PULP_SERVER_NAME=pulp.example.com \
  -e SERVICE_HOST=10.245.2.2 \
  markllama/pulp-resource-manager
0e8cbc4606cf8894f8be515709c8cd6a23f37b3a58fd84fecf0d8fca46c64eed

 docker ps
CONTAINER ID        IMAGE                                    COMMAND             CREATED             STATUS              PORTS               NAMES
0e8cbc4606cf        markllama/pulp-resource-manager:latest   "/run.sh"           9 minutes ago       Up 9 minutes                            pulp-resource-manager

Once it's running I can check the logs to verify that everything has started as needed and that the primary process has been executed at the end.

docker logs pulp-resource-manager
+ '[' '!' -x /configure_pulp_server.sh ']'
+ . /configure_pulp_server.sh
++ set -x
++ PULP_SERVER_CONF=/etc/pulp/server.conf
++ export PULP_SERVER_CONF
++ PULP_SERVER_NAME=pulp.example.com
++ export PULP_SERVER_NAME
++ SERVICE_HOST=10.245.2.2
++ export SERVICE_HOST
++ DB_SERVICE_HOST=10.245.2.2
++ DB_SERVICE_PORT=27017
++ export DB_SERVICE_HOST DB_SERVICE_PORT
++ MSG_SERVICE_HOST=10.245.2.2
++ MSG_SERVICE_PORT=5672
++ MSG_SERVICE_USER=guest
++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME
++ check_config_target
++ '[' '!' -f /etc/pulp/server.conf ']'
++ configure_server_name
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com
Saved 1 file(s)
++ configure_database
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017
Saved 1 file(s)
++ configure_messaging
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672
Saved 1 file(s)
++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672
Saved 1 file(s)
+ '[' '!' -x /test_db_available.py ']'
+ wait_for_database
+ DB_TEST_TRIES=12
+ DB_TEST_POLLRATE=5
+ TRY=0
+ '[' 0 -lt 12 ']'
+ /test_db_available.py
Testing connection to MongoDB on 10.245.2.2, 27017
+ '[' 0 -ge 12 ']'
+ start_resource_manager
+ exec runuser apache -s /bin/bash -c '/usr/bin/celery worker -c 1 -n resource_manager@pulp.example.com --events --app=pulp.server.async.app --umask=18 --loglevel=INFO -Q resource_manager --logfile=/var/log/pulp/resource_manager.log'

If you fail to see it start especially with "file not found" or "no access" errors, check the /dev/log volume mount and the SERVICE_HOST value.

I also want to check that the QPID queues have been created.

qpid-config queues -b guest@10.245.2.4
Queue Name                                       Attributes
======================================================================
04f58686-35a6-49ca-b98e-376371cfaaf7:1.0         auto-del excl 
06fa019e-a419-46af-a555-a820dd86e66b:1.0         auto-del excl 
06fa019e-a419-46af-a555-a820dd86e66b:2.0         auto-del excl 
0c72a9c9-e1bf-4515-ba4b-0d0f86e9d30a:1.0         auto-del excl 
celeryev.ed1a92fd-7ad0-4ab1-935f-6bc6a215f7d3    auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={}
e70d72aa-7b9a-4083-a88a-f9cc3c568e5c:0.0         auto-del excl 
e7e53097-ae06-47ca-87d7-808f7042d173:1.0         auto-del excl 
resource_manager                                 --durable --argument passive=False --argument exclusive=False --argument arguments=None
resource_manager@pulp.example.com.celery.pidbox  auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments=None
resource_manager@pulp.example.com.dq             --durable auto-del --argument passive=False --argument exclusive=False --argument arguments=None

Line 8 looks like the Celery Beat service queue and lines 11, 12, and 13 are clearly associated with the resource manager. So far, so good.

Testing in Kubernetes

I had to reset the database between starts to test the Pulp Beat container. This image doesn't change the database structure, so I don't need to reset. I can just create a new pod definition and try it out.

Again, the differences from the Pulp Beat pod definition are pretty trivial.

So here's what it looks like when I start the pod:

kubecfg -c pods/pulp-resource-manager.json create pods
I0930 00:00:24.581712 16159 request.go:292] Waiting for completion of /operations/14
ID                      Image(s)                          Host                Labels                       Status
----------              ----------                        ----------          ----------                   ----------
pulp-resource-manager   markllama/pulp-resource-manager   /                   name=pulp-resource-manager   Waiting

kubecfg list pods
ID                      Image(s)                          Host                    Labels                       Status
----------              ----------                        ----------              ----------                   ----------
pulpdb                  markllama/mongodb                 10.245.2.2/10.245.2.2   name=db                      Running
pulpmsg                 markllama/qpid                    10.245.2.2/10.245.2.2   name=msg                     Running
pulp-beat               markllama/pulp-beat               10.245.2.4/10.245.2.4   name=pulp-beat               Terminated
pulp-resource-manager   markllama/pulp-resource-manager   10.245.2.4/10.245.2.4   name=pulp-resource-manager   Terminated

kubecfg get pods/pulp-resource-manager
ID                      Image(s)                          Host                    Labels                       Status
----------              ----------                        ----------              ----------                   ----------
pulp-resource-manager   markllama/pulp-resource-manager   10.245.2.4/10.245.2.4   name=pulp-resource-manager   Running

There are two things of note here. Line 13 shows the pulp-resource-manager pod as terminated. Remember the bug note from above. The pod isn't terminated, it's between the pause container which downloads the image for a new container and the execution.

One line 15 I requested the information for that pod by name using the get command, rather than listing them all. This time it shows running. as it should.

When you use get all you get by default is a one line summary. If you want details you have to consume them as JSON and they're complete. In fact they use the same schema as the JSON used to create the pods in the first place (with a bit more detail filled in). While this could be hard for humans to swallow, it makes it AWESOME to write programs and scripts to process the output. Every command should offer some form of structured data output. Meanwhile, I wish Kubernetes would offer a --verbose option with nicely formatted plaintext. It will come (or I'll write it if I get frustrated enough).

Get ready... Here it comes.

kubecfg --json get pods/pulp-resource-manager | python -m json.tool
{
    "apiVersion": "v1beta1",
    "creationTimestamp": "2014-09-30T00:00:24Z",
    "currentState": {
        "host": "10.245.2.4",
        "hostIP": "10.245.2.4",
        "info": {
            "net": {
                "detailInfo": {
                    "Args": null,
                    "Config": null,
                    "Created": "0001-01-01T00:00:00Z",
                    "Driver": "",
                    "HostConfig": null,
                    "HostnamePath": "",
                    "HostsPath": "",
                    "ID": "",
                    "Image": "",
                    "Name": "",
                    "NetworkSettings": null,
                    "Path": "",
                    "ResolvConfPath": "",
                    "State": {
                        "ExitCode": 0,
                        "FinishedAt": "0001-01-01T00:00:00Z",
                        "Paused": false,
                        "Pid": 0,
                        "Running": false,
                        "StartedAt": "0001-01-01T00:00:00Z"
                    },
                    "SysInitPath": "",
                    "Volumes": null,
                    "VolumesRW": null
                },
                "restartCount": 0,
                "state": {
                    "running": {}
                }
            },
            "pulp-resource-manager": {
                "detailInfo": {
                    "Args": null,
                    "Config": null,
                    "Created": "0001-01-01T00:00:00Z",
                    "Driver": "",
                    "HostConfig": null,
                    "HostnamePath": "",
                    "HostsPath": "",
                    "ID": "",
                    "Image": "",
                    "Name": "",
                    "NetworkSettings": null,
                    "Path": "",
                    "ResolvConfPath": "",
                    "State": {
                        "ExitCode": 0,
                        "FinishedAt": "0001-01-01T00:00:00Z",
                        "Paused": false,
                        "Pid": 0,
                        "Running": false,
                        "StartedAt": "0001-01-01T00:00:00Z"
                    },
                    "SysInitPath": "",
                    "Volumes": null,
                    "VolumesRW": null
                },
                "restartCount": 0,
                "state": {
                    "running": {}
                }
            }
        },
        "manifest": {
            "containers": null,
            "id": "",
            "restartPolicy": {},
            "version": "",
            "volumes": null
        },
        "podIP": "10.244.3.4",
        "status": "Running"
    },
    "desiredState": {
        "host": "10.245.2.4",
        "manifest": {
            "containers": [
                {
                    "env": [
                        {
                            "key": "PULP_SERVER_NAME",
                            "name": "PULP_SERVER_NAME",
                            "value": "pulp.example.com"
                        }
                    ],
                    "image": "markllama/pulp-resource-manager",
                    "name": "pulp-resource-manager",
                    "volumeMounts": [
                        {
                            "mountPath": "/dev/log",
                            "name": "devlog",
                            "path": "/dev/log"
                        }
                    ]
                }
            ],
            "id": "pulp-resource-manager",
            "restartPolicy": {
                "always": {}
            },
            "uuid": "c73a89c0-4834-11e4-aba7-0800279696e1",
            "version": "v1beta1",
            "volumes": [
                {
                    "name": "devlog",
                    "source": {
                        "emptyDir": null,
                        "hostDir": {
                            "path": "/dev/log"
                        }
                    }
                }
            ]
        },
        "status": "Running"
    },
    "id": "pulp-resource-manager",
    "kind": "Pod",
    "labels": {
        "name": "pulp-resource-manager"
    },
    "resourceVersion": 20,
    "selfLink": "/api/v1beta1/pods/pulp-resource-manager"
}

So there you go.

I won't repeat the QPID queue check here because if everything's going well it looks the same.

Summary

As designed there isn't really much to say. The only real changes were to remove the DB setup and change the exec line to start the resource manager process. That's the idea of cookie cutters.

The next one won't be as simple. It uses the Pulp software package, but it doesn't run a Celery service. Instead it runs an Apache daemon and a WSGI web service to offer the Pulp Admin REST protocol. It connects to the database and the messaging service. It also needs SSL and a pair of external public TCP connections.

References

Docker
Containerized Applications
Kubernetes
Orchestration for Docker applications
Pulp
Enterprise OS and configuration content management
Celery
A distributed job management framework
QPID
AMQP Message service
MongoDB
NoSQL Database

Under The Hood of Cloud Computing

Monday, September 29, 2014

Docker: Re-using a custom base image - Pulp Resource Manager image.