In the last post I spent a bunch of effort creating a base image for a set of Pulp service components. Then I only implemented one, the Celery beat server. In this (hopefully much shorter) post I'll create a second image from that base. This one is going to be the Pulp Resource Manager service.
A couple of recap pieces to start.
The Pulp service is made up of several independent processes that communicate using AMQP messaging (through a QPID message bus) and by access to a MongoDB database. The QPID services and the MongoDB services are entirely independent of the Pulp service processes and communicate only over TCP/IP. There are also a couple of processes that are tightly coupled, both requiring access to shared data. These will come later. What's left is the Pulp Resource Manager process and the Pulp Admin REST service.
I'm going to take these in two separate posts to make them a bit more digestible than the last one was.
Extending the Base - Again
As in the case with the Pulp Beat service, the Resource Manager process is a singleton. Each pulp service has exactly one. (Discussions of HA and SPOF will be held for later). The Resource Manager process communicates with the other components solely through the QPID message broker and the MongoDB over TCP. There is no need for persistent storage.
In fact the only difference between the Beat service and the Resource Manager is the invocation of the Celery service. This means that the only difference between the Docker specifications is the name and two sections of the run.sh file.
The Dockerfile is in fact identical in content to that for the Pulp Beat container:
Now to the run.sh script.
The first difference in the run.sh is simple. The Beat service is used to initialize the database. The Resource Manager doesn't have to do that.
The first difference in the run.sh is simple. The Beat service is used to initialize the database. The Resource Manager doesn't have to do that.
The second is also pretty simple: The exec line at the end starts the Celery service use the resource_manager entry point instead of the beat service.
I do have one other note to myself. It appears that the wait_for_database() function will be needed in every derivative of the pulp-base image. I should probably refactor that but I'm not going to do it yet.
One Image or Many?
So, if I hadn't been using shell functions, this really would come down to two lines different between the two. Does it really make sense to create two images? It is possible to pass a mode argument to the container on startup. Wouldn't that be simpler?
It actually might be. It is possible to use the same image and pass an argument. The example from which mine are derived used that method.
I have three reasons for using separate images. One is for teaching and the other two are development choices. Since one of my goals is to show how to create custom base images and then use derived images to create customizations I used this opportunity to show that.
The deeper reasons have to do with human nature and the software development life cycle.
People expect to be able to compose service by grabbing images off the shelf and plugging them together. Adding modal switches to the images means that they are not strongly differentiated by function. You can't just say "Oh, I need 5 functional parts, let me check the bins". You have to know more about each image than just how it connects to others. You have to know that this particular image can take more than one role within the service. I'd like to avoid that if I can. Creating images with so little difference feels like inefficiency, but only when viewed from the standpoint of the person producing the images. To the consumer it maintains the usage paradigm. Breaks in the paradigm can lead to mistakes or confusion.
The other reason to use distinct images has to do with what I expect and hope will be a change in the habits of software developers.
Developers of complex services currently feel a tension, when they are creating and packaging their software, between putting all of the code, binaries and configuration templates into a single package. You only create a new package if the function is strongly different. This makes it simpler to install the software and configure it once. On traditional systems where all of the process components would be running on the same host there was no good reason to separate the code for distinct processes based on their function. There are clear cases where the separation does happen in host software packaging, notably in client and server software. These clearly will run on different hosts. Other cases though are not clear cut.
The case of the Pulp service is in a gray area. Much of the code is common to all four Celery based components (beat, resource manager, worker and admin REST service). It is likely possible to refactor the unique code into separate packages for the components, though the value is questionable at this point.
I want to create distinct images because it's not very expensive, and it allows for easy refactoring should the Pulp packaging ever be decomposed to match the actual service components. Any changes would happen when the new images are built, but the consumer would not need to see any change. This is a consideration to keep in mind when ever I create a new service with different components from the same service RPM.
All I need to do using Docker directly is to verify that the container will start and run. The visibility in Kubernetes still isn't up to general dev and debugging.
Once it's running I can check the logs to verify that everything has started as needed and that the primary process has been executed at the end.
If you fail to see it start especially with "file not found" or "no access" errors, check the /dev/log volume mount and the SERVICE_HOST value.
I also want to check that the QPID queues have been created.
Line 8 looks like the Celery Beat service queue and lines 11, 12, and 13 are clearly associated with the resource manager. So far, so good.
One line 15 I requested the information for that pod by name using the get command, rather than listing them all. This time it shows running. as it should.
When you use get all you get by default is a one line summary. If you want details you have to consume them as JSON and they're complete. In fact they use the same schema as the JSON used to create the pods in the first place (with a bit more detail filled in). While this could be hard for humans to swallow, it makes it AWESOME to write programs and scripts to process the output. Every command should offer some form of structured data output. Meanwhile, I wish Kubernetes would offer a --verbose option with nicely formatted plaintext. It will come (or I'll write it if I get frustrated enough).
Get ready... Here it comes.
So there you go.
I won't repeat the QPID queue check here because if everything's going well it looks the same.
It actually might be. It is possible to use the same image and pass an argument. The example from which mine are derived used that method.
I have three reasons for using separate images. One is for teaching and the other two are development choices. Since one of my goals is to show how to create custom base images and then use derived images to create customizations I used this opportunity to show that.
The deeper reasons have to do with human nature and the software development life cycle.
People expect to be able to compose service by grabbing images off the shelf and plugging them together. Adding modal switches to the images means that they are not strongly differentiated by function. You can't just say "Oh, I need 5 functional parts, let me check the bins". You have to know more about each image than just how it connects to others. You have to know that this particular image can take more than one role within the service. I'd like to avoid that if I can. Creating images with so little difference feels like inefficiency, but only when viewed from the standpoint of the person producing the images. To the consumer it maintains the usage paradigm. Breaks in the paradigm can lead to mistakes or confusion.
The other reason to use distinct images has to do with what I expect and hope will be a change in the habits of software developers.
Developers of complex services currently feel a tension, when they are creating and packaging their software, between putting all of the code, binaries and configuration templates into a single package. You only create a new package if the function is strongly different. This makes it simpler to install the software and configure it once. On traditional systems where all of the process components would be running on the same host there was no good reason to separate the code for distinct processes based on their function. There are clear cases where the separation does happen in host software packaging, notably in client and server software. These clearly will run on different hosts. Other cases though are not clear cut.
The case of the Pulp service is in a gray area. Much of the code is common to all four Celery based components (beat, resource manager, worker and admin REST service). It is likely possible to refactor the unique code into separate packages for the components, though the value is questionable at this point.
I want to create distinct images because it's not very expensive, and it allows for easy refactoring should the Pulp packaging ever be decomposed to match the actual service components. Any changes would happen when the new images are built, but the consumer would not need to see any change. This is a consideration to keep in mind when ever I create a new service with different components from the same service RPM.
Running and Verifying the Resource Manager Image
The Pulp Resource Manager process makes the same connections that the Pulp Beat process does. It's a little harder to detect the Resource Manager access to the database since the startup doesn't make radical changes like the DB initialization. I'm going to see if I can find some indications that the resource manager is running though. The QPID connection will be much easier to detect. The Resource Manager creates its own set of queues which will be easy to see.
The resource manager requires the database service and an initialized database. Testing this part will start where the previous post left off, with running QPID and MongoDB and with the Pulp Beat service active.
NOTE: there's currently (20140929) a bug in Kubernetes where, during the period between waiting for the image to download and when it actually starts, kubecfg list pods will indicate that the pods have terminated. If you see this, give it another minute for the pods to actually start and transfer to the running state.
Testing in Docker
All I need to do using Docker directly is to verify that the container will start and run. The visibility in Kubernetes still isn't up to general dev and debugging.
docker run -d --name pulp-resource-manager \ -v /dev/log:/dev/log \ -e PULP_SERVER_NAME=pulp.example.com \ -e SERVICE_HOST=10.245.2.2 \ markllama/pulp-resource-manager 0e8cbc4606cf8894f8be515709c8cd6a23f37b3a58fd84fecf0d8fca46c64eed docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0e8cbc4606cf markllama/pulp-resource-manager:latest "/run.sh" 9 minutes ago Up 9 minutes pulp-resource-manager
Once it's running I can check the logs to verify that everything has started as needed and that the primary process has been executed at the end.
docker logs pulp-resource-manager + '[' '!' -x /configure_pulp_server.sh ']' + . /configure_pulp_server.sh ++ set -x ++ PULP_SERVER_CONF=/etc/pulp/server.conf ++ export PULP_SERVER_CONF ++ PULP_SERVER_NAME=pulp.example.com ++ export PULP_SERVER_NAME ++ SERVICE_HOST=10.245.2.2 ++ export SERVICE_HOST ++ DB_SERVICE_HOST=10.245.2.2 ++ DB_SERVICE_PORT=27017 ++ export DB_SERVICE_HOST DB_SERVICE_PORT ++ MSG_SERVICE_HOST=10.245.2.2 ++ MSG_SERVICE_PORT=5672 ++ MSG_SERVICE_USER=guest ++ export MSG_SERVICE_HOST MSG_SERVICE_PORT MSG_SERVICE_NAME ++ check_config_target ++ '[' '!' -f /etc/pulp/server.conf ']' ++ configure_server_name ++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''server'\'']/server_name' pulp.example.com Saved 1 file(s) ++ configure_database ++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''database'\'']/seeds' 10.245.2.2:27017 Saved 1 file(s) ++ configure_messaging ++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''messaging'\'']/url' tcp://10.245.2.2:5672 Saved 1 file(s) ++ augtool -s set '/files/etc/pulp/server.conf/target[. = '\''tasks'\'']/broker_url' qpid://guest@10.245.2.2:5672 Saved 1 file(s) + '[' '!' -x /test_db_available.py ']' + wait_for_database + DB_TEST_TRIES=12 + DB_TEST_POLLRATE=5 + TRY=0 + '[' 0 -lt 12 ']' + /test_db_available.py Testing connection to MongoDB on 10.245.2.2, 27017 + '[' 0 -ge 12 ']' + start_resource_manager + exec runuser apache -s /bin/bash -c '/usr/bin/celery worker -c 1 -n resource_manager@pulp.example.com --events --app=pulp.server.async.app --umask=18 --loglevel=INFO -Q resource_manager --logfile=/var/log/pulp/resource_manager.log'
If you fail to see it start especially with "file not found" or "no access" errors, check the /dev/log volume mount and the SERVICE_HOST value.
I also want to check that the QPID queues have been created.
qpid-config queues -b guest@10.245.2.4 Queue Name Attributes ====================================================================== 04f58686-35a6-49ca-b98e-376371cfaaf7:1.0 auto-del excl 06fa019e-a419-46af-a555-a820dd86e66b:1.0 auto-del excl 06fa019e-a419-46af-a555-a820dd86e66b:2.0 auto-del excl 0c72a9c9-e1bf-4515-ba4b-0d0f86e9d30a:1.0 auto-del excl celeryev.ed1a92fd-7ad0-4ab1-935f-6bc6a215f7d3 auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments={} e70d72aa-7b9a-4083-a88a-f9cc3c568e5c:0.0 auto-del excl e7e53097-ae06-47ca-87d7-808f7042d173:1.0 auto-del excl resource_manager --durable --argument passive=False --argument exclusive=False --argument arguments=None resource_manager@pulp.example.com.celery.pidbox auto-del --limit-policy=ring --argument passive=False --argument exclusive=False --argument arguments=None resource_manager@pulp.example.com.dq --durable auto-del --argument passive=False --argument exclusive=False --argument arguments=None
Line 8 looks like the Celery Beat service queue and lines 11, 12, and 13 are clearly associated with the resource manager. So far, so good.
Testing in Kubernetes
I had to reset the database between starts to test the Pulp Beat container. This image doesn't change the database structure, so I don't need to reset. I can just create a new pod definition and try it out.
Again, the differences from the Pulp Beat pod definition are pretty trivial.
So here's what it looks like when I start the pod:Again, the differences from the Pulp Beat pod definition are pretty trivial.
kubecfg -c pods/pulp-resource-manager.json create pods I0930 00:00:24.581712 16159 request.go:292] Waiting for completion of /operations/14 ID Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- pulp-resource-manager markllama/pulp-resource-manager / name=pulp-resource-manager Waiting kubecfg list pods ID Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- pulpdb markllama/mongodb 10.245.2.2/10.245.2.2 name=db Running pulpmsg markllama/qpid 10.245.2.2/10.245.2.2 name=msg Running pulp-beat markllama/pulp-beat 10.245.2.4/10.245.2.4 name=pulp-beat Terminated pulp-resource-manager markllama/pulp-resource-manager 10.245.2.4/10.245.2.4 name=pulp-resource-manager Terminated kubecfg get pods/pulp-resource-manager ID Image(s) Host Labels Status ---------- ---------- ---------- ---------- ---------- pulp-resource-manager markllama/pulp-resource-manager 10.245.2.4/10.245.2.4 name=pulp-resource-manager RunningThere are two things of note here. Line 13 shows the pulp-resource-manager pod as terminated. Remember the bug note from above. The pod isn't terminated, it's between the pause container which downloads the image for a new container and the execution.
One line 15 I requested the information for that pod by name using the get command, rather than listing them all. This time it shows running. as it should.
When you use get all you get by default is a one line summary. If you want details you have to consume them as JSON and they're complete. In fact they use the same schema as the JSON used to create the pods in the first place (with a bit more detail filled in). While this could be hard for humans to swallow, it makes it AWESOME to write programs and scripts to process the output. Every command should offer some form of structured data output. Meanwhile, I wish Kubernetes would offer a --verbose option with nicely formatted plaintext. It will come (or I'll write it if I get frustrated enough).
Get ready... Here it comes.
kubecfg --json get pods/pulp-resource-manager | python -m json.tool { "apiVersion": "v1beta1", "creationTimestamp": "2014-09-30T00:00:24Z", "currentState": { "host": "10.245.2.4", "hostIP": "10.245.2.4", "info": { "net": { "detailInfo": { "Args": null, "Config": null, "Created": "0001-01-01T00:00:00Z", "Driver": "", "HostConfig": null, "HostnamePath": "", "HostsPath": "", "ID": "", "Image": "", "Name": "", "NetworkSettings": null, "Path": "", "ResolvConfPath": "", "State": { "ExitCode": 0, "FinishedAt": "0001-01-01T00:00:00Z", "Paused": false, "Pid": 0, "Running": false, "StartedAt": "0001-01-01T00:00:00Z" }, "SysInitPath": "", "Volumes": null, "VolumesRW": null }, "restartCount": 0, "state": { "running": {} } }, "pulp-resource-manager": { "detailInfo": { "Args": null, "Config": null, "Created": "0001-01-01T00:00:00Z", "Driver": "", "HostConfig": null, "HostnamePath": "", "HostsPath": "", "ID": "", "Image": "", "Name": "", "NetworkSettings": null, "Path": "", "ResolvConfPath": "", "State": { "ExitCode": 0, "FinishedAt": "0001-01-01T00:00:00Z", "Paused": false, "Pid": 0, "Running": false, "StartedAt": "0001-01-01T00:00:00Z" }, "SysInitPath": "", "Volumes": null, "VolumesRW": null }, "restartCount": 0, "state": { "running": {} } } }, "manifest": { "containers": null, "id": "", "restartPolicy": {}, "version": "", "volumes": null }, "podIP": "10.244.3.4", "status": "Running" }, "desiredState": { "host": "10.245.2.4", "manifest": { "containers": [ { "env": [ { "key": "PULP_SERVER_NAME", "name": "PULP_SERVER_NAME", "value": "pulp.example.com" } ], "image": "markllama/pulp-resource-manager", "name": "pulp-resource-manager", "volumeMounts": [ { "mountPath": "/dev/log", "name": "devlog", "path": "/dev/log" } ] } ], "id": "pulp-resource-manager", "restartPolicy": { "always": {} }, "uuid": "c73a89c0-4834-11e4-aba7-0800279696e1", "version": "v1beta1", "volumes": [ { "name": "devlog", "source": { "emptyDir": null, "hostDir": { "path": "/dev/log" } } } ] }, "status": "Running" }, "id": "pulp-resource-manager", "kind": "Pod", "labels": { "name": "pulp-resource-manager" }, "resourceVersion": 20, "selfLink": "/api/v1beta1/pods/pulp-resource-manager" }
So there you go.
I won't repeat the QPID queue check here because if everything's going well it looks the same.
Summary
As designed there isn't really much to say. The only real changes were to remove the DB setup and change the exec line to start the resource manager process. That's the idea of cookie cutters.
The next one won't be as simple. It uses the Pulp software package, but it doesn't run a Celery service. Instead it runs an Apache daemon and a WSGI web service to offer the Pulp Admin REST protocol. It connects to the database and the messaging service. It also needs SSL and a pair of external public TCP connections.
The next one won't be as simple. It uses the Pulp software package, but it doesn't run a Celery service. Instead it runs an Apache daemon and a WSGI web service to offer the Pulp Admin REST protocol. It connects to the database and the messaging service. It also needs SSL and a pair of external public TCP connections.
No comments:
Post a Comment