Under The Hood of Cloud Computing: March 2013

Friday, March 22, 2013

Installing (but not configuring) the broker service by hand

I'm working through a totally(?) manual installation of the OpenShift Origin service on Fedora 18. The last post on this topic was about building the RPMs on your own Yum repository. This time I'm going to install the broker service and make a few tweaks that are still required.

One seriously major thing to note is that I don't recommend actually doing this. I'm doing it to shed some light on some of the things still going on in the development process and to highlight the ways in which you can get some visibility into the installation and monitoring of the service.

If you're interested in building and running your own development environment or service for real, I suggest starting by reading through Krishna Raman's article on creating a development environment using Vagrant and Puppet and the puppet script sources themselves to see what's involved. Finally there's a comprehensive document that describes the procedure with fewer warts.

Ingredients

As usual, I start with a clean minimal install of Fedora 18. In addition this time I also have a yum repository filled with a bleeding-edge build from source as I described previously. Finally I have a prepared MongoDB server waiting for a connection.

I'm replacing my real URLs and access information with dummies for demonstration purposes.

Yum repo URL
http://myrepo.example.com/origin-server

MONGO_HOST_PORT="mydbhost.example.com:27017"
MONGO_USER="openshift"
MONGO_PASSWORD="dontuseme"
MONGO_DB="openshift"

Preparation

Since I'm building my own packages from source and placing them in a Yum repository, I need to add that repo to the standard set. I'll add a new file to /etc/yum.repod.d referring to my yum server.

Even if you're building from your own sources, there are still some packages you need to get that aren't in either the stock Fedora repositories or in the OpenShift sources. These are generally packages with patches that are in the process of moving upstream or are in the acceptance process for Fedora. Right now a set is maintained by the OpenShift build engineers. I need to add the repo file for that too:

[origin-server]
name=OpenShift Origin Server
baseurl=http://myrepo.example.com/openshift-origin
enable=1
gpgcheck=0

[origin-extras]
name=Custom packages for OpenShift Origin Server
baseurl=https://mirror.openshift.com/pub/openshift-origin/fedora-18/x86_64/
enable=1
gpgcheck=0

At this point you can install the openshift-origin-broker package.

yum install openshift-origin-broker
...
  urw-fonts.noarch 0:2.4-14.fc18                                                
  v8.x86_64 1:3.13.7.5-1.fc18                                                   
  xorg-x11-font-utils.x86_64 1:7.5-10.fc18                                      

Complete!

There are a set of Rubygems that are not yet packaged as RPMs. I need to install these as gems for now.

gem install mongoid
Fetching: i18n-0.6.1.gem (100%)
Fetching: moped-1.4.4.gem (100%)
Fetching: origin-1.0.11.gem (100%)
Fetching: mongoid-3.1.2.gem (100%)
Successfully installed i18n-0.6.1
Successfully installed moped-1.4.4
Successfully installed origin-1.0.11
Successfully installed mongoid-3.1.2
3 gems installed
Installing ri documentation for moped-1.4.4...
Building YARD (yri) index for moped-1.4.4...
Installing ri documentation for origin-1.0.11...
Building YARD (yri) index for origin-1.0.11...
Installing ri documentation for mongoid-3.1.2...
Building YARD (yri) index for mongoid-3.1.2...
Installing RDoc documentation for moped-1.4.4...
Installing RDoc documentation for origin-1.0.11...
Installing RDoc documentation for mongoid-3.1.2...

There are a number of gem version restrictions in the broker Gemfile which are not met by the current rubygem RPMs. I have to remove the version restrictions so that the broker application will use what is available. This risks breaking things due to interface changes, but will at least allow the broker application to start.

sed -i -f - <<EOF /var/www/openshift/broker/Gemfile
/parseconfig/s/,.*//
/minitest/s/,.*//
/rest-client/s/,.*//
/mocha/s/,.*//
/rake/s/,.*//
EOF

For some reason, even with the --without clause for :test and :development, bundle still wants the mocha rubygem. This should not be required for production, but right now you need to install it so that the Rails application will start.

yum install rubygem-mocha
...
Installed:
 rubygem-mocha.noarch 0:0.12.1-1.fc18

Dependency Installed:
  rubygem-metaclass.noarch 0:0.0.1-6.fc18

Verifying The Dependencies

Now that all of the software dependencies have been installed (mostly by RPM requirements through Yum, and finally through gem requirements and some version tweaking of the Gemfile) I can check that all of them resolve when I start the application. Rails will call bundler when the application starts so I'll call it explicitly before hand. I'm only interested in the production environment, so I'll explicitly exclude development and test.

cd /var/www/openshift/broker
bundle --local
Using rake (0.9.6) 
Using bigdecimal (1.1.0)
....
Using systemu (2.5.2)
Using xml-simple (1.1.2)
Your bundle is complete! Use `bundle show [gemname]` to see where a bundled gem is installed.

If I try to start the rails console now, though, I'll be sad. It won't connect to the database.

Configure MongoDB access/authentication

The OpenShift broker is (right now) tightly coupled to MongoDB. Recently it switched to using the rubygem-mongoid ODM module (which is a definite plus if you have to work on the code).

The last thing I need to do before I can fire up the Rails console with the broker application is to set the database connectivity parameters. One side effect of using an ODM is that it establishes a connection to the database the moment the application starts.

NOTE: when this is done I will not have a complete working broker server. I still need to configure the other external services: auth, dns and messaging.

Set the values listed in the Ingredients into /etc/openshift/broker.conf.

/etc/openshift/broker.conf
...
# Eg: MONGO_HOST_PORT="<host1:port1>,<host2:port2>..."
MONGO_HOST_PORT="mydbhost.example.com:27017"
MONGO_USER="openshift"
MONGO_PASSWORD="dontuseme"
MONGO_DB="openshift"
MONGO_SSL="false"

...

Now I can try starting the rails console. It should connect to the mongodb and offer an irb prompt:

To verify the database connectivity, take a look at this recent blog post.

Next up is configuring each plugin, one by one.

Gist Scripts

I'm trying something new. Rather than including code snippets inline, I'm going to post them as Github Gist entries.

Add Yum Repos - oo-add-repo.sh
Fix broker requirements - oo-broker-fix-requirements.sh

References

Thursday, March 21, 2013

Verifying the MongoDB DataStore with the Rails Console: Mongoid Edition

A few months ago I did several posts about how to verify the operation of the back end services of an OpenShift Origin broker service. Today I discovered that this one (mongod) is obsolete.

The data store behind the broker is a MongoDB. That one back end service isn't pluggable. It's actually been made more tightly coupled to Mongo, but in this case that's a good thing. What changed is that all of the Rails application model objects have been converted to use the Mongoid ODM rubygem. All of the object persistence is now managed in the background and all of the logic can just deal with the objects as... well... objects.

There are a couple of implications for broker service verification.

The broker connects to the database on startup
This means that if the database access/auth information is wrong, the rails app will fail to start.
The only simple way to test the connection is to create an object and observe the database.
This is both simpler to do, and potentially more difficult to diagnose on failure.

I think the second point won't be as much of a downside as I would fear at first. I suspect that if connectivity is good, the rest will be. If it's not, it will be fairly clear why.

Configuring the Broker Data Store

Configuring the datastore access information hasn't changed. The configuration information is still stored in /etc/openshift/broker.conf. The settings all have the MONGO_ prefix:

MONGO_HOST_PORT="data1.example.com:27017"
MONGO_USER="openshift"
MONGO_PASSWORD="dontuseme"
MONGO_DB="openshift"
MONGO_SSL="false"

Adjust these for your mongodb implementation. Remember to open the firewall for the broker on your database host. Configure the database to listen and test the connectivity locally.

Verifying Simple Connectivity

You also want to check the connectivity from your broker host before trying to fire up the broker itself.

broker> echo "show collections" | mongo --username openshift --password dontuseme data1.example.com:27017/openshift
MongoDB shell version: 2.2.3
connecting to: data1.example.com:27017/openshift
system.indexes<
system.users
bye

You can do this repeatedly and observe the mongodb log on the database host.

Observing the Mongo Database Logs

On the database host, take a look at the mongodb logs. You should see a new entry (successful or failed) each time a client connects.

data1> tail /var/log/mongodb/mongodb.log
Thu Mar 21 20:24:26 [conn15] authenticate db: openshift { authenticate: 1, nonce: "20d6f85f33f03dee", user: "openshift", key: "60639c7ce56851a25be56bcebd98c3ed" }

Starting the Rails Console

Now that you're sure that the database is running and accessible from your broker host you can try firing up the Rails console. This assumes that you've resolved all of the gem requirements. If not, the Rails console will complain about them and exit.

broker> cd /var/www/openshift/broker
broker> rails console
Loading production environment (Rails 3.2.8)
irb(main):001:0>

If you go this far you should have seen one more authentication log record on the mongodb server. (see above)

Create a Database Object

Now we can create a CloudUser object and watch it appear in the database

irb(main):001:0> user = CloudUser.create(login: "testuser")
=> #<CloudUser _id: 514b6f6cf3da7fa491000001, created_at: 2013-03-21 20:37:00 UTC, updated_at: 2013-03-21 20:37:00 UTC, login: "testuser", capabilities: {"subaccounts"=>false, "gear_sizes"=>["small"], "max_gears"=>100}, parent_user_id: nil, plan_id: nil, pending_plan_id: nil, pending_plan_uptime: nil, usage_account_id: nil, consumed_gears: 0>

You can see that this is more than your typical Ruby object. The ID and created_at and updated_att fields are artifacts of the ODM persistence. You won't see another log message because the database connection is persistent using the ODM. You will find that there's now a document in the openshift.cloud_user collection.

data> echo "db.cloud_users.find()" | mongo --username openshift --password dbsecret localhost/openshift
MongoDB shell version: 2.2.3<
connecting to: localhost/openshift
{ "_id" : ObjectId("514b6f6cf3da7fa491000001"), "consumed_gears" : 0, "login" : "testuser", "capabilities" : { "subaccounts" : false, "gear_sizes" : [ "small" ], "max_gears" : 100 }, "updated_at" : ISODate("2013-03-21T20:37:00.546Z"), "created_at" : ISODate("2013-03-21T20:37:00.546Z") }
bye

Removing the Test Object

Cleaning up is just as easy

irb(main):002:0> user.delete<
=> true

And to verify that it's been removed:

echo "db.cloud_users.find()" | mongo --username openshift --password dbsecret localhost/openshift
MongoDB shell version: 2.2.3
connecting to: localhost/openshift
bye

At this point you know both that your database is running and that the broker application can connect and read and write it.

Much simpler with an ODM.

References:

Friday, March 15, 2013

Lessons about Default Values

I have lots of "Rules of System Administration". Some day I'll write them all down and publish a book.

Today I got a reminder about the nuance of one:

Always provide reasonable defaults.

I happen to have been building an OpenShift Broker on Fedora 18, but it applies elsewhere. The OpenShift broker service is configured with the /etc/openshift/broker.conf file. The configuration file is a traditional line-oriented set of key/value pairs. That is, each pair is on one line, separated by an equals sign (=).

I was trying to set custom values to the Mongo database, but when I tried starting the broker application it would try to connect to the data store and fail. It indicated that it was trying to use the default value, ignoring my settings.

I fished around in the Rails environments files and several other places where I saw the same value until I figured out which one it was coming from (the right one, config/environments/production.rb).

I knew I'd set the values in my configuration file right, so why weren't they showing up?

It turns out that the reason was that I hadn't set them right. I'd left the equal sign out when I made the substitutions. That's my problem. And that's not the real issue.

In looking at that default which looked like a real, good, usable value, I realized it shouldn't.

For most settings a "reasonable default value" is one that will nominally work. For access information and authentication, the reasonable default is "you didn't tell me what to do. I'm going to sit on my hands until you do". The system shouldn't work until you successfully set your own values.

There's one other thing. The system should not just spew garbage if you haven't set the values. It should very politely inform you that you're not done yet and what needs to be done.

So, that's something I'm going to be thinking about for OpenShift next week.

Wednesday, March 6, 2013

The Bleeding Edge: Building the OpenShift RPMs from source

While the OpenShift Online service has been up for... sheesh almost 2 years now? (corrections welcome) the development activity has only accelerated over time. More than ever the admin tasked with implementing an On-Premise OpenShift Origin service is shooting at a moving target. There are released RPMs in the Fedora 18 distribution and updates, but even the updates aren't keeping pace with the source changes. (This is good, it gives *some* stability).

The experimenter will often find now that the tiny feature she needs is already in the source tree but hasn't yet made it to the released packages. They may even find the need (desire?) to make changes and contribute them back to the base. In both of those cases she will have to be prepared to create local builds of the OpenShift packages for development and testing.

There is a build toolset also on github for the origin-server package set. It's in a separate repository named origin-dev-tools. This follows the model of the original internal build and test environment. It's an all-in-one wrap-it-to-go kind of toolset. But this is the Under the Hood blog, so I'm going to crack the case open and see what's inside.

This post uses Fedora 18 but should be applicable to RHEL6+ as well.

Building a Build Site

If you're customizing the OpenShift Origin software, whether because you want to work with committed but pre-release software or because you're making changes on your own, the best way to manage the software life cycle is to create a proper build server. To start with I'll describe how to create the build server and to make the RPM repo available to server boxes. There are tools to automate the build/test/publish process as well but I won't deal with them yet.

The goal of this post is to outline the requirements and process for creating your own build

Building on a Base

As usually I start on a minimal system. I add the software I need explicitly and let yum manage the dependencies. The build system will need a fixed IP address and a well known DNS name so that you can reach it later from your OpenShift Origin servers.

When configured for my work environment (including Kerberos 5, and LDAP authentication) I start with about 245 packages.

The Tool Box of Modern Software Development

There's a lot of stuff that goes into making software packages appear automatically. The development and build process today commonly includes remote repository updating, automated testing, software tagging on top of the compilers, interpreters and language libraries.

Note that the first set of tools listed below are just those needed to manage the build and packaging process. Each package will also have additional build requirements, but those will be dealt with later.

Once, long ago building software meant having a compiler (which you built yourself from source code), and tar for unpacking it and make to automate the process. Today the same tasks apply but there's a lot more formalism to the process. Collaboration has required the creation of distributed software revision control tools. Software testing has become everyone's job. People have recognized that software is never finished, it evolves and grows over time. Users need to be able to know what they're running and where to get updates. The modern tool set reflects these needs.

Software Revision Control
Task Automation
Unit Testing
Release and Package Management
Publication

While most of the time these tools will just work it's often important to know what tools are doing what jobs and how they interact. This is critical either when things don't go as planned, or when contributing new software packages to the set. First I'll take a look at which tools OpenShift uses and then demonstrate how to install them (which is actually pretty trivial)

Software Revision Control: Git and Github

A distributed project today requires some kind of remote software revision control system. This allows developers to work together without having to be in one place. The Revision Control System (RCS) manages changes and flags conflicts. It allows tagging of releases.

The OpenShift Origin project uses git for revision control. It uses the Github service to hold the master repository and development forks and branches. You can pull down a cloned copy of the source tree without having an account on Github. To manage your local changes and to contribute back you'll need an account of your own. There are a number of good books or sites on how to use git. See the Github site itself for help learning how to create your own development fork and branches.

Task Automation: Ruby, Rubygems and Rake

To automate the unit testing OpenShift uses a rubygem called rake after the original GNU make. Rake implements dependencies and tasks in a way similar in behavior (but syntactically entirely different) from make.

Rake is implemented as a rubygem which is in turn a module packaging mechanism for Ruby code.

Unit Testing: Rspec 2

Many OpenShift components include unit tests written using the RSpec framework. RSpec is another rubygem. It has components for writing special expectations, mocks and hooks for testing Rails applications. rubygem-rspec-rails requires all of the other components, so we can install that and let yum handle the dependencies.

Build, Packaging and Release: Tito and rpm-build and rubygem-bundler

All of the software in OpenShift must be packaged for delivery in RPM format. This is both a requirement for inclusion in Fedora and RHEL releases as well as good general practice (use the native software packaging format). A number of components are also packaged as Rubygems. This adds the requirement for the rubygem-bundler package for building but these are not the deliverable format.

OpenShift uses a tool called tito to manage package builds and revision tags. Tito works with the standard RPM spec files and with rpmbuild and createrepo. When it runs successfully, tito not only builds the requested package, it increments the package version number and inserts it in a yum repository.

Documentation: rubygem-yard

The ruby community have created a set of tools which allow documentation to be automatically generated. The author of the code inserts specially formatted markup comments which the documentation generator uses to produce HTML or other documentation formats.

OpenShift is using the yard documentation tool to markup and auto-generate documentation for the ruby packages. Yard is installed with the rubygem-yard RPM

Publication: thttpd

Once the packages are built they're useless if your OpenShift Origin servers can't reach them. I typically use Apache2 for web service but these are static, so a light weight server like lighttpd or thttpd are in order. I'm going to use thttpd because I can configure it to serve the default yum repo location with a single sed command.

If you don't want to share the builds from the build server you can instead use a tool like rsync to push them where you need them to be for publication.

Installing The Software

I can compose the list now:

git - revision control
rake

ruby
rubygems
rubygems-devel
rubygem-rake
rubygem-bundler

rubygem-yard

rspec

rubygem-rspec-core
rubygem-rspec-mocks
rubygem-rspec-expectations
rubygem-rspec-rails

tito
rpm-build
lighttpd

Note that RPM package dependencies make the actual install list fairly small if you pick carefully:

Now that I have my list, installing the toolset is easy enough:

yum install -y git rubygems-devel rubygem-rake rubygem-yard rubygem-rspec-rails tito rpm-build lighttpd

This will actually cause the installation of almost 100 more packages due to dependencies.

When this software is all installed on my build system, the next step is get myself a copy of the source code.

Getting the Source from Github

Git was created by Linus Torvalds himself to replace a proprietary software revision control system which had been used for years to manage the Linux kernel source tree. Since then a number of services have sprung up to offer a place for people to host their projects. OpenShift Origin is hosted on Github.

You can get the git URL for the OpenShift Origin service software without an account, but if you want to make modifications or contributions you'll need to register and then create your own project fork. Github has some greate help and tutorials here:

https://help.github.com/

You will probably also want to look at the process for setting up SSH keys for Github so you don't have to type a password for every operation.

The OpenShift Origin server source code is here:

https://github.com/openshift/origin-server

Cloning the Source Code Repository(s)

Once you've created your account and forked the origin-server project you should find a git@github.com: URL on your fork page. You can cut-and-paste that and use it to clone a local copy of your workspace. (In the example below, replace the URL with your own)

git clone git@github.com:/openshift/origin-server.git --tags

Now you've got everything that the build process needs, but not what the software you're building needs.

Task Automation

The current official process uses the origin-dev-tools and has a certain amount of overhead. It's made for rigourous exhaustive build/test/release cycles.

What we need here is much simpler and self-contained.

The exploration that follows is captured in a Rakefile script I put on gist.github.com. When it's placed at the top of the origin-server source tree and set executable, it will execute the tasks described below.

NOTE: the oo-rake script is not part of the official origin-server sources. It will likely not be maintained and comes with no warranty. Use at your own risk.

cd origin-server
wget http://gist.github.com/markllama/5225912/raw/abcbeebed584bc1aae56b9091fa977e8636c316c/oo-rake
chmod a+x oo-rake
./oo-rake --tasks
rake all:builddep[answer]       # install all build requirements
rake all:rpm[repodir,test,yum]  # generate all RPMs and create yum repository
rake all:testrpm[repodir,yum]   # generate all test RPMs and create yum rep...
rake all:yard[destdir]          # generate comprehensive documentation

Package Build Requirements

Building most software requires more than just the build tools. Most software depends on other tools or libraries for its own build process. Because OpenShift is set up to build into packages and because the RPM mechanism has a feature to allow developers to call out the dependencies, we can find out what's needed and install it.

Packages and ".spec" files

Every component of OpenShift Origin must be packaged as an RPM. It's just the way things are. This gives us a hook to help identify each package and ultimately, to find the set of build prerequisites for each package.

The contents of each package must reside in a directory within the source code tree. Each package must have exactly one RPM .spec file. We can search the directory tree for these files and we'll know both the names of the packages and their locations within the source tree.

Assuming you've just cloned the origin-server repository into your current working directory you can find the list of packages with a little shell snippet like this:

find origin-server -name \*.spec

Build Requirements

Among other things, a package .spec file defines a set of packages that must be installed before the new package can be built. The required packages are specified with BuildRequires lines.

The yum-builddep program which is part of the yum-utils package will install the build requirements for a package:

yum-builddep <specfile> [<specfile>...]

This will install all of the build requirements for the listed packages.

The oo-rake script offers the all:builddeps target. Invoking this task will install all of the build requirements for the packages under the tree.

Building the Packages

The packages (and yum repository) are built by tito. Tito has to run in the root directory for each package (where the .spec file resides.) Since we already know how to find all the spec files we can find the directories which contain them fairly simply:

find origin-server -name \*.spec | xargs -i {} dirname {}

This will produce a list of directories which contain potential packages. We can just loop over that and call tito in each one to build the packages.

for PKGDIR in $(find origin-server -name \*.spec | xargs -i {} dirname {}) ; do 
    (cd $PKGDIR ; tito build --rpm)
done

This will change to each directory, build the RPM and place it in a yum repository at /tmp/tito. You can change where the output goes either by adding -o <directory> to the tito command or by setting a variable named PREBUILD_BASEDIR in the build user's ~/.titorc file.

The oo-rake provides a target: all:rpm which will build all of the packages in the tree below it. You can provide arguments to rake targets. The first argument to the all:rpm task is the destination for the packages.

Git Tags and Test RPMs

Tito depends on git and specifically on release tags. If you get any messages indicating that a tag is missing for a package, fetch the tags from your git repository as well

cd origin-server ; git fetch origin --tags

When you run the all:rpm target tito will build tagged release packages. That is, it will build from the last tagged commit. If you have checked in new versions of files, they will not be used.

To build packages from the head of the current branch, you want to build test packages.

The oo-rack script provides another target all:testrpm which will build test packages for the entire tree (and place them in the yum repository). Test packages get hashed names so that yum update will install the newer packages from the repository.

Publishing the Yum Repository

You don't have to publish the RPMs in the yum repository but you have to make the RPMs available somehow. I'm going to add a step here to make the yum repo available by HTTP using a lightweight http server, thttpd.

By default thttpd serves the contents of /var/www/thttpd. I want it to serve /tmp/tito. A single line sed command makes the adjustment:

sed -i -e 's|^dir=.*$|dir=/tmp/tito|'

Fedora 18 comes with the firewall daemon limiting remote access. We have to open access to port 80 so that thttpd can answer queries.

firewall-cmd --zone public --add-service http

We just have to enable the thttpd and we'll be able to have servers pull from it.

systemctl enable thttpd
systemctl start thttpd

If you have a web server established you could instead use rsync or something like it to move the build results to the web server.

What this doesn't include?

This is just the barest minimum information to build OpenShift Origin RPMs on Fedora 18. There are a bunch of tasks that aren't handled:

Triggering automatic re-build on developer commit
Running unit tests
Interpreting and handling build errors
Handling new package build requirements
Installing and configuring OpenShift servers

This should be enough though for someone who wants to extend or contribute to the OpenShift Origin project and needs to build their own packages.

References

git - http://git-scm.com/
Github - https://github.com
Ruby - http://www.ruby-lang.org
Rubygems - http://rubygems.org
Ruby on Rails - http://rubyonrails.org
RSpec 2 - https://www.relishapp.com/rspec
Tito - https://github.com/dgoodwin/tito
rpmbuild - http://www.rpm.org/
yum - https://fedoraproject.org/wiki/Yum
thttpd - http://www.acme.com/software/thttpd/