Thursday, September 4, 2014

Kubernetes: Simple Containers and Services

From previous posts I now have a MongoDB image and another which runs a QPID AMQP broker.  I intend for these to be used by the Pulp service components.

What I'm going to do this time is to create the subsidiary services that I'll need for the Pulp service within a Kubernetes cluster.

UPDATE 12/16/2014: recently the kubecfg command has been deprecated and replaced with kubectl. I've updated this post to reflect the CLI call and output from kubectl.

Pre-Launch


A Pulp service stores it's persistent data in the database.  The service components, a Celery Beat server and a number of Celery workers, as well as one or more Apache web server daemons all communicate using the AMQP message broker.  They store and retrieve data from the database.

In a traditional bare-bare metal or VM based installation all of these services would likely be run on the same host.  If they are distributed, then the IP addresses and credentials of the support services would have to be configured into Pulp servers manually or using some form of configuration management. Using containers the components can be isolated but the task of tracking them and configuring the consumer processes remains.

Using just Docker, the first impulse of an implementer would be similar, to place all of the containers on the same host.  This would simplify the management of the connectivity between the parts, but it also defeats some of the benefit of containerized applications: portability and non-locality. This isn't a failing of Docker. It is the result of conscious decisions to limit the scope of what Docker attempts to do, avoiding feature creep and bloat.  And this is where a tool like Kubernetes comes in.

As mentioned elsewhere, Kubernetes is a service which is designed to bind together a cluster of container hosts, which can be regular hosts running the etcd and kubelet daemons or they can be specialized images like Atomic or CoreOS.  They can be private or public services such as Google Cloud

For Pulp, I need to place a MongoDB and a QPID container within a Kubernetes cluster and create the infrastructure so that clients can find it and connect to it.  For each of these I need to create a Kubernetes Service and a Pod (group of related containers).

Kicking the Tires


It's probably a good thing to explore a little bit before diving in so that I can see what to expect from Kubernetes in general.  I also need to verify that I have a working environment before I start trying to bang on it.

Preparation


If you're following along, at this point I'm going to assume that you have access to a running Kubernetes cluster.  I'm going to be using the Vagrant test cluster as defined in the github repository and described in the Vagrant version of the Getting Started Guides.

I'm also going to assume that you've built the kubernetes binaries.  I'm using the shell wrappers in the cluster sub-directory, especially cluster/kubectl.sh.   If you try that and you haven't built the binaries you'll get a message that looks like this:

cluster/kubectl.sh 
It looks as if you don't have a compiled kubectl binary.

If you are running from a clone of the git repo, please run
'./build/run.sh hack/build-cross.sh'. Note that this requires having
Docker installed.

If you are running from a binary release tarball, something is wrong. 
Look at http://kubernetes.io/ for information on how to contact the 
development team for help.

If you see that, do as it says. If that fails, you probably haven't installed the golang package.



For convenience I alias the kubectl.sh wrapper so that I don't need the full path.

alias kubectl=~/kubernetes/cluster/kubectl.sh

Like most CLI commands now if you invoke it with no arguments it prints usage.

kubectl --help 2>1 | more
Usage of kubectl:

Usage: 
  kubectl [flags]
  kubectl [command]

Available Commands: 
  version                                             Print version of client and server
  proxy                                               Run a proxy to the Kubernetes API server
  get [(-o|--output=)json|yaml|...] <resource> [<id>] Display one or many resources
  describe <resource> <id>                            Show details of a specific resource
  create -f filename                                  Create a resource by filename or stdin
  createall [-d directory] [-f filename]              Create all resources specified in a directory, filename or stdin
  update -f filename                                  Update a resource by filename or stdin
  delete ([-f filename] | (<resource> <id>))          Delete a resource by filename, stdin or resource and id

The full usage output can be found in the CLI documentation in the Kubernetes Github repository.

kubectl has one oddity that makes a lot of sense once you understand why it's there. The command is meant to produce output which is consumable by machines using UNIX pipes. The output is structured data formatted using JSON or YAML. To avoid strange errors in the parsers, the only output to STDOUT is structured data. This means that all of the human readable output goes to STDERR. This isn't just the error output though. This includes the help output. So if you want to run the help and usage output through a pager app like more(1) or less(1), you have to first redirect STDERR to STDOUT as I did above.

Exploring the CLI control objects


You can see in the REST API line the possible operations: get, list, create, delete, update . That line also shows the objects that the API can manage: minions, pods, replicationControllers, servers.

Minions


A minion is a host that can accept containers.  It runs an etcd and a kubelet daemon in addition to the Docker daemon.For our purposes a minion is where containers can go.

I can list the minions in my cluster like this:

kubectl get minions
NAME                LABELS
10.245.2.4          <none>
10.245.2.2          <none>
10.245.2.3          <none>

The only valid operation on minions using the REST protocol are the list and get actions.  The get response isn't very interesting.

Until I add some of the other objects this is the most interesting query.  It indicates that there are three minions connected and ready to accept containers.

Pods


A pod is the Kubernetes object which describes a set of one or more containers to be run on the same minion.  While the point of a cluster is to allow containers to run anywhere within the cluster, there are times when a set of containers must run together on the same host. Perhaps they share some external filesystem or some other resource.  See the golang specification for the Pod struct.

kubectl get pods
NAME                IMAGE(S)            HOST                    LABELS              STATUS

See? not very interesting.

Replication Controllers


I'm going to defer talking about replication controllers in detail for now.  It's enough to note their existence and purpose.

Replication controllers are the tool to create HA or load balancing systems. Using a replication controller you can tell Kubernetes to create multiple running containers for a given image.  Kubernetes will ensure that if one container fails or stops that a new container will be spawned to replace it.

I can list the replication controllers in the same way as minions or pods, but there's nothing to see yet.

Services


I think the term service is an unfortunate but probably unavoidable terminology overload.

In Kubernetes, a service defines a TCP or UDP port reservation.  It provides a way for applications running in containers to connect to each other without requiring that each one be configured with the end-point IP addresses. This both allows for abstracted configuration and for mobility and load balancing of the providing containers.

When I define a Kubernetes service, the service providers (the MongoDB and QPID containers) will be labeled to receive traffic and the service consumers (the Pulp components) will be given the access information in the environment so that they can reach the providers. More about that later.

I can list the services in the same way as I would minions or pods. And it turns out that creating a couple of Kubernetes services is the first step I need to take to prepare the Pulp support service containers.

Creating a Kubernetes Service Object


In a cloud cluster one of the most important considerations is being able to find things.  The whole point of the cloud is to promote non-locality.  I don't care where things are, but I still have to be able to find them somehow.

A Kubernetes Service object is a handle that allows my MongoDB and QPID clients find the servers without them having to know where they really are. It defines a port to listen on and a way for clients to indicate that they want to accept the traffic that comes in. Kubernetes arranges for the traffic to be forwarded to the servers.

Kubernetes both accepts and produces structured data formats for input and reporting.  The two currently supported formats are JSON and YAML.  The Service structure is relatively simple but it has elements which are shared by all of the top level data structures. Kubernetes doesn't yet have any tooling to make the creation of an object description easier than hand-crafting a snipped of JSON or YAML.  Each of the structures is documented in the godoc for Kubernetes. For now that's all you get.

There are a couple of provided examples and these will have to do for now. The guestbook example demonstrates using ReplicationServers and master/slave implementation using Redis.  The second shows how to perform a live update of the pods which make up an active service within a Kubernetes cluster. These are actually a bit more advanced than I'm ready for and don't give the detailed break-down of the moving parts that I mean to do.

This is a complete description of the service. Lines 5-8 define the actual content.
  • Line 2 indicates that this is a Service object.
  • Line 3 indicates the object schema version.
    v1beta1 is current
    (note: my use of the term 'schema' is a loose one)
  • Line 4 identifies the Service object.
    This must be unique within the set of services
  • Line 5 is the TCP port number that will be listening
  • Line 6 is for testing.  It tells the proxy on the minion with that IP to listen for inbound connections.
    I'll also use the publicIPs value to expose the HTTP and HTTPS services for Pulp
  • Lines 7-9 set the Selector
    The selector is used to associate this Service object with containers that will accept the inbound traffic.
    This will match with one of the label items assigned to the containers.

When a new service is created Kubernetes establishes a listener on an available IP address (one of the minions addresses).  While the service object exists any new containers will start with a new set of environment variables which provide access information.  The value of the selector (converted to upper case) is used as the prefix for these environment variables so that containers can be designed to pick them up and use them for configuration.

For now I just need to establish the service so that when I create the DB and QPID containers they have something to be bound to.

The QPID service is identical to the MongoDB service, replacing the port (5672) and the selector (msg)

Querying a Service Object


I've just created a Service object. I wonder what Kubernetes thinks of it? I can list the services as seen above. I can also get the object information using kubectl.

kubectl get services db
NAME                LABELS              SELECTOR            IP                  PORT
db                                name=db             10.0.41.48          27017


That's nice. I know the important information now.  But what does it look like really.


kubectl get --output=json services db
{
    "kind": "Service",
    "id": "db",
    "uid": "c040da3d-8536-11e4-a18b-0800279696e1",
    "creationTimestamp": "2014-12-16T15:18:12Z",
    "selfLink": "/api/v1beta1/services/db?namespace=default",
    "resourceVersion": 13,
    "apiVersion": "v1beta1",
    "namespace": "default",
    "port": 27017,
    "protocol": "TCP",
    "selector": {
        "name": "db"
    },
    "publicIPs": [
        "10.245.2.2"
    ],
    "containerPort": 0,
    "portalIP": "10.0.41.48"
}


Clearly Kubernetes has filled out some of the object fields.  Note the --output=json flag for structured data.

I'll be using this method to query information about the other elements as I go along.

Describing a Container (Pod) in Kubernetes


We've seen how to run a container on a Docker host.  With Kubernetes we have to create and submit a description of the container with all of the required variables defined.

Kubernetes has an additional abstraction called a pod.  While Kubernetes is designed to allow the operator to ignore the location of containers within the cluster, there are times when a set of containers needs to be co-located on the same host.  A pod is Kubernetes' way of grouping containers when needed.  When starting a single container it will still be referred to as a member of a pod.


Here's the description of a pod containing the MongoDB service image I created earlier.




This is actually a set of nested structures, maps and arrays.


  • Lines 1-21 define a Pod.
  • Lines 2-4 are elements of an inline JSONBase structure
  • Lines 5-7 are a map (hash) of strings assigned to the Pod struct element named Labels.
  • Lines 8-20 define a PodState named DesiredState.
    The only required element is the ContainerManifest, named Manifest in the PodState.
  • A Podstate has a required Version and ID, though it is not a subclass of JSONBase.
      It also has a list of Containers and an optional list of Volumes
  • Lines 12-18 define the set of containers (only one in this case) that will reside in the pod.
    A Container has a name and an image path (in this case to the previously defined mongodb image).
  • Lines 15-17 are a set of Port specifications.
      These indicate that something inside the container will be listening on these ports.


You can see how learning the total schema means fishing through each of these structure definitions in the documentation.  If you work at it you will get to know them.  To be fair they are really meant to be generated and consumed by machines rather than humans.  Kubernetes is still the business end of the service. Pretty dashboards will be provided later.  The only visibility I really need is for development and diagnostics. There are gaps here too, but finding them is what experiments like this are about.

A note on Names and IDs


There are several places where there is a key named "name" or "id". I could give them all the same value, but I'm going to deliberately vary them so I can expose which ones are used for what purpose. Names can be arbitrary strings. I believe that IDs are restricted somewhat (no hyphens).

Creating the first Pod


Now I can get back to business.

Once I have the Pod definition expressed in JSON I can submit that to kubectl for processing.


kubectl create -f pods/mongodb.json 
pulpdb


TADA! I now have a MongoDB running in Kubernetes.

But how do I know?


Now that I actually have a pod, I should be able to query the Kubernetes service about it and get more than an empty answer.

kubectl get pods pulpdb
NAME                IMAGE(S)            HOST                    LABELS              STATUS
pulpdb              markllama/mongodb   10.245.2.3/10.245.2.3   name=db             Running


Familiar and Boring. But I can get more from kubectl by asking for the raw JSON return from the query.

{
    "kind": "Pod",
    "id": "pulpdb",
    "uid": "4bac8381-8537-11e4-a18b-0800279696e1",
    "creationTimestamp": "2014-12-16T15:22:06Z",
    "selfLink": "/api/v1beta1/pods/pulpdb?namespace=default",
    "resourceVersion": 22,
    "apiVersion": "v1beta1",
    "namespace": "default",
    "labels": {
        "name": "db"
    },
    "desiredState": {
        "manifest": {
            "version": "v1beta2",
            "id": "",
            "volumes": [
                {
                    "name": "devlog",
                    "source": {
                        "hostDir": {
                            "path": "/dev/log"
                        },
...
            "pulp-db": {
                "state": {
                    "running": {
                        "startedAt": "2014-12-16T15:27:04Z"
                    }
                },
                "restartCount": 0,
                "image": "markllama/mongodb",
                "containerID": "docker://8f21d45e49b18b37b98ea7556346095261699bc
3664b52813a533edccee55a63"
            }
        }
    }
}


It's really long. So I'm not going to include it inline. Instead I put it into a gist.

If you fish through it you'll find the same elements I used to create the pod, and lots, lots more.  The structure now contains both a desiredState and a currentState sub-structure, with very different contents.

Now a lot of this is just noise to us, but lines 59-72 are of particular interest.  These show the effects of the Service object that was created previously.  These are the environment variables and network ports declared. These are the values that a client container will use to connect to this service container.

Testing the MongoDB service


If you've read my previous blog post on creating a MongoDB Docker image you'll be familiar with the process I used to verify the basic operation of the service.

In that case I was running the container using Docker on my laptop.  I knew exactly where the container was running and I had direct access to the Docker CLI so that I could ask Docker about my new container.
I'd opened up the MongoDB port and told Docker to bind it to a random port on the host and I could connect directly to that port.

In a Kubernetes cluster there's no way to know a priori where the MongoDB container will end up. You have to ask Kubernetes where it is.  Further you don't have direct access to the Docker CLI.

This is where that publicIPs key in the mongodb-service.json file comes in.  I set the public IP value of the db service to an external IP address of one of the Kubernetes minions: 10.245.2.2.  This causes the proxy on that minion to accept inbound connections and forward them to the db service pods where ever they are.

The minion host is accessible from my desktop so I can test the connectivity directly.

echo "show dbs" | mongo 10.245.2.2
MongoDB shell version: 2.4.6
connecting to: 10.245.2.4/test
local 0.03125GB
bye

And now for QPID?


As with the Service object, creating and testing the QPID container within Kubernetes requires the same process.  Create a JSON file which describes the QPID service and another for the pod.  Submit them and test as before.

Summary


Now I have two running network services inside the Kubernetes cluster. This consists of a Kubernetes Service object and a Kubernetes Pod which is running the image I'd created for each service application.

I can prove to myself that the application services are running and accessible, though for some of the detailed tests I have to go under the covers of Kuberntes still.

I have the information I need to craft images for the other Pulp services so that they can consume the database and messenger services.

Next Up


In the next post I mean to create the first Pulp service image, the Celery Beat server.  There are elements that all of the remaining images will have in common, so I'm going to first build a base image and then apply the last layer to differentiate the beat server from the Pulp resource manager and the pulp workers.

References


5 comments:

  1. Thanks for the post. Instead of using that python one-liner to work with JSON in a terminal, you could try the portable, single-binary tool called jq (http://stedolan.github.io/jq/).

    ReplyDelete
  2. great post, very helpful as I attempt to understand and work with Kubernetes. I'm guessing that there is no current way to refer to a service by a DNS name in Kubernetes? thanks again.

    ReplyDelete
  3. This post is obsoleted by the introduction of the kubectl client. I'm hoping sometime soon to edit it and replace the references to kubecfg with kubectl.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Great blog and site, thanks for all the time spent on these!!

    I have one point of clarity I'd like to offer - In your example 'echo "show dbs" | mongo 10.245.2.2' you are using the minion IP. The pod may move containers around, especially if they are frequently restarting or being reloaded, so you should always use the "portal IP" for this. It performs slb to the public minion IPs which have that pod/container running. Otherwise, if a pod moves containers off that minion, it won't have that service anymore. The portal IP goes to the service and it always has a real-time list of backend IPs running that service's pod. I hope this helps (and makes sense)!

    ReplyDelete