Wednesday, March 6, 2013

The Bleeding Edge: Building the OpenShift RPMs from source

While the OpenShift Online service has been up for... sheesh almost 2 years now? (corrections welcome) the  development activity has only accelerated over time.  More than ever the admin tasked with implementing an On-Premise OpenShift Origin service is shooting at a moving target.  There are released RPMs in the Fedora 18 distribution and updates, but even the updates aren't keeping pace with the source changes. (This is good, it gives *some* stability).

The experimenter will often find now that the tiny feature she needs is already in the source tree but hasn't yet made it to the released packages.  They may even find the need (desire?) to make changes and contribute them back to the base.  In both of those cases she will have to be prepared to create local builds of the OpenShift packages for development and testing.

There is a build toolset also on github for the origin-server package set.  It's in a separate repository named origin-dev-tools.  This follows the model of the original internal build and test environment. It's an all-in-one wrap-it-to-go kind of toolset.  But this is the Under the Hood blog, so I'm going to crack the case open and see what's inside.

This post uses Fedora 18 but should be applicable to RHEL6+ as well.

Building a Build Site

If you're customizing the OpenShift Origin software, whether because you want to work with committed but pre-release software or because you're making changes on your own, the best way to manage the software life cycle is to create a proper build server.  To start with I'll describe how to create the build server and to make the RPM repo available to server boxes.  There are tools to automate the build/test/publish process as well but I won't deal with them yet.

The goal of this post is to outline the requirements and process for creating your own build 

Building on a Base

As usually I start on a minimal system.  I add the software I need explicitly and let yum manage the dependencies.  The build system will need a fixed IP address and a well known DNS name so that you can reach it later from your OpenShift Origin servers.

When configured for my work environment (including Kerberos 5, and LDAP authentication) I start with about 245 packages.

The Tool Box of Modern Software Development

There's a lot of stuff that goes into making software packages appear automatically.  The development and build process today commonly includes remote repository updating, automated testing, software tagging on top of the compilers, interpreters and language libraries.

Note that the first set of tools listed below are just those needed to manage the build and packaging process.  Each package will also have additional build requirements, but those will be dealt with later.

Once, long ago building software meant having a compiler (which you built yourself from source code), and tar for unpacking it and make to automate the process.  Today the same tasks apply but there's a lot more formalism to the process.  Collaboration has required the creation of distributed software revision control tools.  Software testing has become everyone's job.  People have recognized that software is never finished, it evolves and grows over time.  Users need to be able to know what they're running and where to get updates.  The modern tool set reflects these needs.

While most of the time these tools will just work it's often important to know what tools are doing what jobs and how they interact.  This is critical either when things don't go as planned, or when contributing new software packages to the set.  First I'll take a look at which tools OpenShift uses and then demonstrate how to install them (which is actually pretty trivial)

Software Revision Control: Git and Github

A distributed project today requires some kind of remote software revision control system.  This allows developers to work together without having to be in one place.  The Revision Control System (RCS) manages changes and flags conflicts.  It allows tagging of releases.

The OpenShift Origin project uses git for revision control.   It uses the Github service to hold the master repository and development forks and branches.  You can pull down a cloned copy of the source tree without having an account on Github.  To manage your local changes and to contribute back you'll need an account of your own.  There are a number of good books or sites on how to use git.  See the Github site itself for help learning how to create your own development fork and branches.

Task Automation: Ruby, Rubygems and Rake

To automate the unit testing OpenShift uses a rubygem called rake after the original GNU make.  Rake implements dependencies and tasks in a way similar in behavior (but syntactically entirely different) from make.

Rake is implemented as a rubygem which is in turn a module packaging mechanism for Ruby code.

Unit Testing: Rspec 2

Many OpenShift components include unit tests written using the RSpec framework. RSpec is another rubygem.  It has components for writing special expectations, mocks and hooks for testing Rails applications. rubygem-rspec-rails  requires all of the other components, so we can install that and let yum handle the dependencies.

Build, Packaging and Release: Tito and rpm-build and rubygem-bundler

All of the software in OpenShift must be packaged for delivery in RPM format.  This is both a requirement for inclusion in Fedora and RHEL releases as well as good general practice (use the native software packaging format).   A number of components are also packaged as Rubygems. This adds the requirement for the rubygem-bundler package for building but these are not the deliverable format.

OpenShift uses a tool called tito to manage package builds and revision tags.  Tito works with the standard RPM spec files and with rpmbuild and createrepo. When it runs successfully, tito not only builds the requested package, it increments the package version number and inserts it in a yum repository.

Documentation: rubygem-yard

The ruby community have created a set of tools which allow documentation to be automatically generated.  The author of the code inserts specially formatted markup comments which the documentation generator uses to produce HTML or other documentation formats.

OpenShift is using the yard documentation tool to markup and auto-generate documentation for the ruby packages.  Yard is installed with the rubygem-yard RPM

Publication: thttpd

Once the packages are built they're useless if your OpenShift Origin servers can't reach them.  I typically use Apache2 for web service but these are static, so a light weight server like lighttpd or thttpd are in order.  I'm going to use thttpd because I can configure it to serve the default yum repo location with a single sed command.

If you don't want to share the builds from the build server you can instead use a tool like rsync to push them where you need them to be for publication.

Installing The Software

I can compose the list now:

  • git  - revision control
  • rake
    • ruby
    • rubygems
    • rubygems-devel
    • rubygem-rake
    • rubygem-bundler
  • rubygem-yard
  • rspec
    • rubygem-rspec-core
    • rubygem-rspec-mocks
    • rubygem-rspec-expectations
    • rubygem-rspec-rails
  • tito
  • rpm-build
  • lighttpd
Note that RPM package dependencies make the actual install list fairly small if you pick carefully:

Now that I have my list, installing the toolset is easy enough:

yum install -y git rubygems-devel rubygem-rake rubygem-yard rubygem-rspec-rails tito rpm-build lighttpd

This will actually cause the installation of almost 100 more packages due to dependencies.

When this software is all installed on my build system, the next step is get myself a copy of the source code.

Getting the Source from Github

Git was created by Linus Torvalds himself to replace a proprietary software revision control system which had been used for years to manage the Linux kernel source tree.   Since then a number of services have sprung up to offer a place for people to host their projects.  OpenShift Origin is hosted on Github.

You can get the git URL for the OpenShift Origin service software without an account, but if you want to make modifications or contributions you'll need to register and then create your own project fork.  Github has some greate help and tutorials here:

https://help.github.com/

You will probably also want to look at the process for setting up SSH keys for Github so you don't have to type a password for every operation.

The OpenShift Origin server source code is here:


Cloning the Source Code Repository(s)

Once you've created your account and forked the origin-server project you should find a git@github.com: URL on your fork page.  You can cut-and-paste that and use it to clone a local copy of your workspace. (In the example below, replace the URL with your own)

git clone git@github.com:/openshift/origin-server.git --tags

Now you've got everything that the build process needs, but not what the software you're building needs.

Task Automation

The current official process uses the origin-dev-tools and has a certain amount of overhead. It's made for rigourous exhaustive build/test/release cycles.

What we need here is much simpler and self-contained.

The exploration that follows is captured in a Rakefile script I put on gist.github.com. When it's placed at the top of the origin-server source tree and set executable, it will execute the tasks described below.

NOTE: the oo-rake script is not part of the official origin-server sources. It will likely not be maintained and comes with no warranty. Use at your own risk.

cd origin-server
wget http://gist.github.com/markllama/5225912/raw/abcbeebed584bc1aae56b9091fa977e8636c316c/oo-rake
chmod a+x oo-rake
./oo-rake --tasks
rake all:builddep[answer]       # install all build requirements
rake all:rpm[repodir,test,yum]  # generate all RPMs and create yum repository
rake all:testrpm[repodir,yum]   # generate all test RPMs and create yum rep...
rake all:yard[destdir]          # generate comprehensive documentation

Package Build Requirements


Building most software requires more than just the build tools.  Most software depends on other tools or libraries for its own build process.  Because OpenShift is set up to build into packages and because the RPM mechanism has a feature to allow developers to call out the dependencies, we can find out what's needed and install it.

Packages and ".spec" files


Every component of OpenShift Origin must be packaged as an RPM.  It's just the way things are.  This gives   us a hook to help identify each package and ultimately, to find the set of build prerequisites for each package.

The contents of each package must reside in a directory within the source code tree.  Each package must have exactly one RPM .spec file.  We can search the directory tree for these files and we'll know both the names of the packages and their locations within the source tree.

Assuming you've just cloned the origin-server repository into your current working directory you can find the list of packages with a little shell snippet like this:

find origin-server -name \*.spec

Build Requirements


Among other things,  a package .spec file defines a set of packages that must be installed before the new package can be built.   The required packages are specified with BuildRequires lines.

The yum-builddep program which is part of the yum-utils package will install the build requirements for a package:

yum-builddep <specfile> [<specfile>...]

This will install all of the build requirements for the listed packages.

The oo-rake script offers the all:builddeps target. Invoking this task will install all of the build requirements for the packages under the tree.

Building the Packages


The packages (and yum repository) are built by tito. Tito has to run in the root directory for each package (where the .spec file resides.) Since we already know how to find all the spec files we can find the directories which contain them fairly simply:

find origin-server -name \*.spec | xargs -i {} dirname {}

This will produce a list of directories which contain potential packages.  We can just loop over that and call tito in each one to build the packages.

for PKGDIR in $(find origin-server -name \*.spec | xargs -i {} dirname {}) ; do 
    (cd $PKGDIR ; tito build --rpm)
done

This will change to each directory, build the RPM and place it in a yum repository at /tmp/tito.  You can change where the output goes either by adding -o <directory> to the tito command or by setting a variable named PREBUILD_BASEDIR in the build user's ~/.titorc file.

The oo-rake provides a target: all:rpm which will build all of the packages in the tree below it.  You can provide arguments to rake targets.  The first argument to the all:rpm task is the destination for the packages.

Git Tags and Test RPMs


Tito depends on git and specifically on release tags. If you get any messages indicating that a tag is missing for a package, fetch the tags from your git repository as well

cd origin-server ; git fetch origin --tags

When you run the all:rpm target tito will build tagged release packages.  That is, it will build from the last tagged commit.  If you have checked in new versions of files, they will not be used.

To build packages from the head of the current branch, you want to build test packages.

The oo-rack script provides another target all:testrpm which will build test packages for the entire tree (and place them in the yum repository).  Test packages get hashed names so that yum update will install the newer packages from the repository.

Publishing the Yum Repository


You don't have to publish the RPMs in the yum repository but you have to make the RPMs available somehow. I'm going to add a step here to make the yum repo available by HTTP using a lightweight http server, thttpd.

By default thttpd serves the contents of /var/www/thttpd. I want it to serve /tmp/tito. A single line sed command makes the adjustment:

sed -i -e 's|^dir=.*$|dir=/tmp/tito|'

Fedora 18 comes with the firewall daemon limiting remote access. We have to open access to port 80 so that thttpd can answer queries.

firewall-cmd --zone public --add-service http

We just have to enable the thttpd and we'll be able to have servers pull from it.

systemctl enable thttpd
systemctl start thttpd

If you have a web server established you could instead use rsync or something like it to move the build results to the web server.

What this doesn't include?


This is just the barest minimum information to build OpenShift Origin RPMs on Fedora 18.  There are a bunch of tasks that aren't handled:

  • Triggering automatic re-build on developer commit
  • Running unit tests
  • Interpreting and handling build errors
  • Handling new package build requirements
  • Installing and configuring OpenShift servers
This should be enough though for someone who wants to extend or contribute to the OpenShift Origin project and needs to build their own packages.

References

4 comments:

  1. Great article Mark! One idea might be to use yum-builddeps (installed via yum-utils) to process the spec file build dependencies. You should be able to just run that on each spec file and install all the deps. Keep up the great writing!

    ReplyDelete
  2. Thanks Matt,

    I hadn't heard of yum-builddeps.

    I'd also got thinking that it would be cool to add a Rake task that would emit the list of build deps, and another that would run tito. By adding tasks to the hierarchy we could replace the "external" build tools with "internal" ones. At least some.

    ReplyDelete
  3. I just submitted a pull-request that tries to implement a build process in-line with Rake.
    https://github.com/openshift/origin-server/pull/1640

    Take a look and feel free to comment (in the pull request please)

    ReplyDelete
    Replies
    1. The pull request has been cancelled as the gatekeepers prefer to avoid loading developers with the requirement to create a Rakefile for each package.

      Instead I've pout the top level build script which simplifies most of the steps in a gist: https://gist.github.com/markllama/5225912

      Pull that to the top of the origin-server build tree, set the permissions to allow execute and you'll have targets to install build requirements and build the complete source tree.

      Delete