What I'm going to do this time is to create the subsidiary services that I'll need for the Pulp service within a Kubernetes cluster.
UPDATE 12/16/2014: recently the kubecfg command has been deprecated and replaced with kubectl. I've updated this post to reflect the CLI call and output from kubectl.
Pre-Launch
A Pulp service stores it's persistent data in the database. The service components, a Celery Beat server and a number of Celery workers, as well as one or more Apache web server daemons all communicate using the AMQP message broker. They store and retrieve data from the database.
In a traditional bare-bare metal or VM based installation all of these services would likely be run on the same host. If they are distributed, then the IP addresses and credentials of the support services would have to be configured into Pulp servers manually or using some form of configuration management. Using containers the components can be isolated but the task of tracking them and configuring the consumer processes remains.
Using just Docker, the first impulse of an implementer would be similar, to place all of the containers on the same host. This would simplify the management of the connectivity between the parts, but it also defeats some of the benefit of containerized applications: portability and non-locality. This isn't a failing of Docker. It is the result of conscious decisions to limit the scope of what Docker attempts to do, avoiding feature creep and bloat. And this is where a tool like Kubernetes comes in.
As mentioned elsewhere, Kubernetes is a service which is designed to bind together a cluster of container hosts, which can be regular hosts running the etcd and kubelet daemons or they can be specialized images like Atomic or CoreOS. They can be private or public services such as Google Cloud
For Pulp, I need to place a MongoDB and a QPID container within a Kubernetes cluster and create the infrastructure so that clients can find it and connect to it. For each of these I need to create a Kubernetes Service and a Pod (group of related containers).
Kicking the Tires
It's probably a good thing to explore a little bit before diving in so that I can see what to expect from Kubernetes in general. I also need to verify that I have a working environment before I start trying to bang on it.
Preparation
If you're following along, at this point I'm going to assume that you have access to a running Kubernetes cluster. I'm going to be using the Vagrant test cluster as defined in the github repository and described in the Vagrant version of the Getting Started Guides.
I'm also going to assume that you've built the kubernetes binaries. I'm using the shell wrappers in the cluster sub-directory, especially cluster/kubectl.sh. If you try that and you haven't built the binaries you'll get a message that looks like this:
cluster/kubectl.sh It looks as if you don't have a compiled kubectl binary. If you are running from a clone of the git repo, please run './build/run.sh hack/build-cross.sh'. Note that this requires having Docker installed. If you are running from a binary release tarball, something is wrong. Look at http://kubernetes.io/ for information on how to contact the development team for help.
If you see that, do as it says. If that fails, you probably haven't installed the golang package.
For convenience I alias the kubectl.sh wrapper so that I don't need the full path.
alias kubectl=~/kubernetes/cluster/kubectl.sh
Like most CLI commands now if you invoke it with no arguments it prints usage.
kubectl --help 2>1 | more Usage of kubectl: Usage: kubectl [flags] kubectl [command] Available Commands: version Print version of client and server proxy Run a proxy to the Kubernetes API server get [(-o|--output=)json|yaml|...] <resource> [<id>] Display one or many resources describe <resource> <id> Show details of a specific resource create -f filename Create a resource by filename or stdin createall [-d directory] [-f filename] Create all resources specified in a directory, filename or stdin update -f filename Update a resource by filename or stdin delete ([-f filename] | (<resource> <id>)) Delete a resource by filename, stdin or resource and id
The full usage output can be found in the CLI documentation in the Kubernetes Github repository.
Exploring the CLI control objects
You can see in the REST API line the possible operations: get, list, create, delete, update . That line also shows the objects that the API can manage: minions, pods, replicationControllers, servers.
Minions
A minion is a host that can accept containers. It runs an etcd and a kubelet daemon in addition to the Docker daemon.For our purposes a minion is where containers can go.
I can list the minions in my cluster like this:
kubectl get minions NAME LABELS 10.245.2.4 <none> 10.245.2.2 <none> 10.245.2.3 <none>
The only valid operation on minions using the REST protocol are the list and get actions. The get response isn't very interesting.
Until I add some of the other objects this is the most interesting query. It indicates that there are three minions connected and ready to accept containers.
Pods
A pod is the Kubernetes object which describes a set of one or more containers to be run on the same minion. While the point of a cluster is to allow containers to run anywhere within the cluster, there are times when a set of containers must run together on the same host. Perhaps they share some external filesystem or some other resource. See the golang specification for the Pod struct.
kubectl get pods NAME IMAGE(S) HOST LABELS STATUSSee? not very interesting.
Replication Controllers
I'm going to defer talking about replication controllers in detail for now. It's enough to note their existence and purpose.
Replication controllers are the tool to create HA or load balancing systems. Using a replication controller you can tell Kubernetes to create multiple running containers for a given image. Kubernetes will ensure that if one container fails or stops that a new container will be spawned to replace it.
Replication controllers are the tool to create HA or load balancing systems. Using a replication controller you can tell Kubernetes to create multiple running containers for a given image. Kubernetes will ensure that if one container fails or stops that a new container will be spawned to replace it.
I can list the replication controllers in the same way as minions or pods, but there's nothing to see yet.
Services
I think the term service is an unfortunate but probably unavoidable terminology overload.
In Kubernetes, a service defines a TCP or UDP port reservation. It provides a way for applications running in containers to connect to each other without requiring that each one be configured with the end-point IP addresses. This both allows for abstracted configuration and for mobility and load balancing of the providing containers.
When I define a Kubernetes service, the service providers (the MongoDB and QPID containers) will be labeled to receive traffic and the service consumers (the Pulp components) will be given the access information in the environment so that they can reach the providers. More about that later.
I can list the services in the same way as I would minions or pods. And it turns out that creating a couple of Kubernetes services is the first step I need to take to prepare the Pulp support service containers.
In Kubernetes, a service defines a TCP or UDP port reservation. It provides a way for applications running in containers to connect to each other without requiring that each one be configured with the end-point IP addresses. This both allows for abstracted configuration and for mobility and load balancing of the providing containers.
When I define a Kubernetes service, the service providers (the MongoDB and QPID containers) will be labeled to receive traffic and the service consumers (the Pulp components) will be given the access information in the environment so that they can reach the providers. More about that later.
I can list the services in the same way as I would minions or pods. And it turns out that creating a couple of Kubernetes services is the first step I need to take to prepare the Pulp support service containers.
Creating a Kubernetes Service Object
In a cloud cluster one of the most important considerations is being able to find things. The whole point of the cloud is to promote non-locality. I don't care where things are, but I still have to be able to find them somehow.
A Kubernetes Service object is a handle that allows my MongoDB and QPID clients find the servers without them having to know where they really are. It defines a port to listen on and a way for clients to indicate that they want to accept the traffic that comes in. Kubernetes arranges for the traffic to be forwarded to the servers.
Kubernetes both accepts and produces structured data formats for input and reporting. The two currently supported formats are JSON and YAML. The Service structure is relatively simple but it has elements which are shared by all of the top level data structures. Kubernetes doesn't yet have any tooling to make the creation of an object description easier than hand-crafting a snipped of JSON or YAML. Each of the structures is documented in the godoc for Kubernetes. For now that's all you get.
Kubernetes both accepts and produces structured data formats for input and reporting. The two currently supported formats are JSON and YAML. The Service structure is relatively simple but it has elements which are shared by all of the top level data structures. Kubernetes doesn't yet have any tooling to make the creation of an object description easier than hand-crafting a snipped of JSON or YAML. Each of the structures is documented in the godoc for Kubernetes. For now that's all you get.
There are a couple of provided examples and these will have to do for now. The guestbook example demonstrates using ReplicationServers and master/slave implementation using Redis. The second shows how to perform a live update of the pods which make up an active service within a Kubernetes cluster. These are actually a bit more advanced than I'm ready for and don't give the detailed break-down of the moving parts that I mean to do.
This is a complete description of the service. Lines 5-8 define the actual content.
- Line 2 indicates that this is a Service object.
- Line 3 indicates the object schema version.
v1beta1 is current
(note: my use of the term 'schema' is a loose one) - Line 4 identifies the Service object.
This must be unique within the set of services - Line 5 is the TCP port number that will be listening
- Line 6 is for testing. It tells the proxy on the minion with that IP to listen for inbound connections.
I'll also use the publicIPs value to expose the HTTP and HTTPS services for Pulp - Lines 7-9 set the Selector
The selector is used to associate this Service object with containers that will accept the inbound traffic.
This will match with one of the label items assigned to the containers.
When a new service is created Kubernetes establishes a listener on an available IP address (one of the minions addresses). While the service object exists any new containers will start with a new set of environment variables which provide access information. The value of the selector (converted to upper case) is used as the prefix for these environment variables so that containers can be designed to pick them up and use them for configuration.
For now I just need to establish the service so that when I create the DB and QPID containers they have something to be bound to.
The QPID service is identical to the MongoDB service, replacing the port (5672) and the selector (msg)
Querying a Service Object
I've just created a Service object. I wonder what Kubernetes thinks of it? I can list the services as seen above. I can also get the object information using kubectl.
kubectl get services db NAME LABELS SELECTOR IP PORT dbname=db 10.0.41.48 27017
That's nice. I know the important information now. But what does it look like really.
Clearly Kubernetes has filled out some of the object fields. Note the --output=json flag for structured data.
I'll be using this method to query information about the other elements as I go along.
kubectl get --output=json services db { "kind": "Service", "id": "db", "uid": "c040da3d-8536-11e4-a18b-0800279696e1", "creationTimestamp": "2014-12-16T15:18:12Z", "selfLink": "/api/v1beta1/services/db?namespace=default", "resourceVersion": 13, "apiVersion": "v1beta1", "namespace": "default", "port": 27017, "protocol": "TCP", "selector": { "name": "db" }, "publicIPs": [ "10.245.2.2" ], "containerPort": 0, "portalIP": "10.0.41.48" }
Clearly Kubernetes has filled out some of the object fields. Note the --output=json flag for structured data.
I'll be using this method to query information about the other elements as I go along.
Describing a Container (Pod) in Kubernetes
We've seen how to run a container on a Docker host. With Kubernetes we have to create and submit a description of the container with all of the required variables defined.
Kubernetes has an additional abstraction called a pod. While Kubernetes is designed to allow the operator to ignore the location of containers within the cluster, there are times when a set of containers needs to be co-located on the same host. A pod is Kubernetes' way of grouping containers when needed. When starting a single container it will still be referred to as a member of a pod.
Here's the description of a pod containing the MongoDB service image I created earlier.
This is actually a set of nested structures, maps and arrays.
You can see how learning the total schema means fishing through each of these structure definitions in the documentation. If you work at it you will get to know them. To be fair they are really meant to be generated and consumed by machines rather than humans. Kubernetes is still the business end of the service. Pretty dashboards will be provided later. The only visibility I really need is for development and diagnostics. There are gaps here too, but finding them is what experiments like this are about.
There are several places where there is a key named "name" or "id". I could give them all the same value, but I'm going to deliberately vary them so I can expose which ones are used for what purpose. Names can be arbitrary strings. I believe that IDs are restricted somewhat (no hyphens).
TADA! I now have a MongoDB running in Kubernetes.
Now that I actually have a pod, I should be able to query the Kubernetes service about it and get more than an empty answer.
Familiar and Boring. But I can get more from kubectl by asking for the raw JSON return from the query.
It's really long. So I'm not going to include it inline. Instead I put it into a gist.
If you fish through it you'll find the same elements I used to create the pod, and lots, lots more. The structure now contains both a desiredState and a currentState sub-structure, with very different contents.
Now a lot of this is just noise to us, but lines 59-72 are of particular interest. These show the effects of the Service object that was created previously. These are the environment variables and network ports declared. These are the values that a client container will use to connect to this service container.
Here's the description of a pod containing the MongoDB service image I created earlier.
This is actually a set of nested structures, maps and arrays.
- Lines 1-21 define a Pod.
- Lines 2-4 are elements of an inline JSONBase structure
- Lines 5-7 are a map (hash) of strings assigned to the Pod struct element named Labels.
- Lines 8-20 define a PodState named DesiredState.
The only required element is the ContainerManifest, named Manifest in the PodState. - A Podstate has a required Version and ID, though it is not a subclass of JSONBase.
It also has a list of Containers and an optional list of Volumes - Lines 12-18 define the set of containers (only one in this case) that will reside in the pod.
A Container has a name and an image path (in this case to the previously defined mongodb image). - Lines 15-17 are a set of Port specifications.
These indicate that something inside the container will be listening on these ports.
You can see how learning the total schema means fishing through each of these structure definitions in the documentation. If you work at it you will get to know them. To be fair they are really meant to be generated and consumed by machines rather than humans. Kubernetes is still the business end of the service. Pretty dashboards will be provided later. The only visibility I really need is for development and diagnostics. There are gaps here too, but finding them is what experiments like this are about.
A note on Names and IDs
There are several places where there is a key named "name" or "id". I could give them all the same value, but I'm going to deliberately vary them so I can expose which ones are used for what purpose. Names can be arbitrary strings. I believe that IDs are restricted somewhat (no hyphens).
Creating the first Pod
Now I can get back to business.
Once I have the Pod definition expressed in JSON I can submit that to kubectl for processing.
kubectl create -f pods/mongodb.json pulpdb
TADA! I now have a MongoDB running in Kubernetes.
But how do I know?
kubectl get pods pulpdb NAME IMAGE(S) HOST LABELS STATUS pulpdb markllama/mongodb 10.245.2.3/10.245.2.3 name=db Running
Familiar and Boring. But I can get more from kubectl by asking for the raw JSON return from the query.
{ "kind": "Pod", "id": "pulpdb", "uid": "4bac8381-8537-11e4-a18b-0800279696e1", "creationTimestamp": "2014-12-16T15:22:06Z", "selfLink": "/api/v1beta1/pods/pulpdb?namespace=default", "resourceVersion": 22, "apiVersion": "v1beta1", "namespace": "default", "labels": { "name": "db" }, "desiredState": { "manifest": { "version": "v1beta2", "id": "", "volumes": [ { "name": "devlog", "source": { "hostDir": { "path": "/dev/log" }, ... "pulp-db": { "state": { "running": { "startedAt": "2014-12-16T15:27:04Z" } }, "restartCount": 0, "image": "markllama/mongodb", "containerID": "docker://8f21d45e49b18b37b98ea7556346095261699bc 3664b52813a533edccee55a63" } } } }
It's really long. So I'm not going to include it inline. Instead I put it into a gist.
If you fish through it you'll find the same elements I used to create the pod, and lots, lots more. The structure now contains both a desiredState and a currentState sub-structure, with very different contents.
Now a lot of this is just noise to us, but lines 59-72 are of particular interest. These show the effects of the Service object that was created previously. These are the environment variables and network ports declared. These are the values that a client container will use to connect to this service container.
Testing the MongoDB service
If you've read my previous blog post on creating a MongoDB Docker image you'll be familiar with the process I used to verify the basic operation of the service.
In that case I was running the container using Docker on my laptop. I knew exactly where the container was running and I had direct access to the Docker CLI so that I could ask Docker about my new container.
I'd opened up the MongoDB port and told Docker to bind it to a random port on the host and I could connect directly to that port.
In a Kubernetes cluster there's no way to know a priori where the MongoDB container will end up. You have to ask Kubernetes where it is. Further you don't have direct access to the Docker CLI.
This is where that publicIPs key in the mongodb-service.json file comes in. I set the public IP value of the db service to an external IP address of one of the Kubernetes minions: 10.245.2.2. This causes the proxy on that minion to accept inbound connections and forward them to the db service pods where ever they are.
This is where that publicIPs key in the mongodb-service.json file comes in. I set the public IP value of the db service to an external IP address of one of the Kubernetes minions: 10.245.2.2. This causes the proxy on that minion to accept inbound connections and forward them to the db service pods where ever they are.
The minion host is accessible from my desktop so I can test the connectivity directly.
echo "show dbs" | mongo 10.245.2.2 MongoDB shell version: 2.4.6 connecting to: 10.245.2.4/test local 0.03125GB bye
And now for QPID?
As with the Service object, creating and testing the QPID container within Kubernetes requires the same process. Create a JSON file which describes the QPID service and another for the pod. Submit them and test as before.
Summary
Now I have two running network services inside the Kubernetes cluster. This consists of a Kubernetes Service object and a Kubernetes Pod which is running the image I'd created for each service application.
I can prove to myself that the application services are running and accessible, though for some of the detailed tests I have to go under the covers of Kuberntes still.
I have the information I need to craft images for the other Pulp services so that they can consume the database and messenger services.
Next Up
In the next post I mean to create the first Pulp service image, the Celery Beat server. There are elements that all of the remaining images will have in common, so I'm going to first build a base image and then apply the last layer to differentiate the beat server from the Pulp resource manager and the pulp workers.
References
- Docker
https://docker.com/ - Kubernetes
https://github.com/GoogleCloudPlatform/kubernetes/ - Kubernetes Source Code Documentation
https://godoc.org/github.com/GoogleCloudPlatform/kubernetes - Pulp
http://www.pulpproject.org/ - Celery
http://www.celeryproject.org/ - JSON
http://json.org/ - YAML
http://yaml.org/ - Pretty Printing JSON with Python
http://stackoverflow.com/questions/352098/how-can-i-pretty-print-json
Thanks for the post. Instead of using that python one-liner to work with JSON in a terminal, you could try the portable, single-binary tool called jq (http://stedolan.github.io/jq/).
ReplyDeletegreat post, very helpful as I attempt to understand and work with Kubernetes. I'm guessing that there is no current way to refer to a service by a DNS name in Kubernetes? thanks again.
ReplyDeleteThis post is obsoleted by the introduction of the kubectl client. I'm hoping sometime soon to edit it and replace the references to kubecfg with kubectl.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteGreat blog and site, thanks for all the time spent on these!!
ReplyDeleteI have one point of clarity I'd like to offer - In your example 'echo "show dbs" | mongo 10.245.2.2' you are using the minion IP. The pod may move containers around, especially if they are frequently restarting or being reloaded, so you should always use the "portal IP" for this. It performs slb to the public minion IPs which have that pod/container running. Otherwise, if a pod moves containers off that minion, it won't have that service anymore. The portal IP goes to the service and it always has a real-time list of backend IPs running that service's pod. I hope this helps (and makes sense)!