One of the fundamental challenges of distributed coding is deciding what/when to integrate. Sure, that patch your colleague just sent you looks good, but is it actually ready to go into master? At Loggly, we’ve been feeling our way towards a disciplined integration process. A year ago, our frontend developers were all making commits directly to trunk in a single SVN repo. Once every few weeks, we’d run `svn up` on our servers, and hope for the best. Today our code goes through peer review, unit testing, and static analysis before it even touches our master branch.
Like most projects these days, the process starts on github. Fork. Push a feature branch to your repo. Open a pull request. Go through a couple rounds of discussion and revision. Merge. Every change to our code goes through this process. At first we thought it would slow us down, that we’d want pull requests for the nontrivial code and to just push to master for the easy stuff. After just a few days, we found the pull requests were slowing us down not at all, and that we all enjoyed the greater transparency into our colleagues’ work.
Once we merge, the automation kicks in — our default integration branch is ‘proposed’, so clicking merge doesn’t actually get the code into the master branch. Jenkins polls our ‘proposed’ branch once a minute, then runs a simple preflight script on the code.
Rather than keep that preflight in a jenkins configuration page, we have it checked into the codebase so that any developer can run it too; this way there’s no excuse for breaking the build — you should have seen it break locally =P
#!/bin/bash DIR="$( cd "$( dirname "$0" )" && pwd )" APP=$DIR/.. $DIR/purge_pyc $DIR/syncenv $DIR/runtests && $DIR/lint \ $APP/billing/forms.py \ $APP/billing/models.py \ $APP/billing/views.py \ $APP/customer/models.py \ $APP/heroku/* \ $APP/input/models.py \ $APP/input/views.py \ $APP/profile/models.py \ $APP/registration/views.py \ echo SUCCESS
Here’s our preflight script. Let’s go through it line by line.
DIR=”$( cd “$( dirname “$0″ )” && pwd )”
First we figure out where we’re running, so that we can find the other scripts distributed with the app.
Next we purge pyc files. This is done because if a user recently switched from a branch which contained files which don’t exist in our branch, the pyc files may still be around, and may be found by the interpreter.
Next we run a script to sync our python virtual environment, and ensure all requirements are present.
Here, of course, we run our unit tests. Each run prints a coverage report, so that as we recover from our testing debt, we can measure our progress.
Next, and this is important, we run pylint over the parts of our app that we expect to pass with no warnings. As we clean up our app, we continue to add modules to this list. Pylint does a few useful things for us. It looks for trivial name errors of the kind that could quickly cause code to stacktrace — using a module without importing it, etc. It also enforces certain kinds of coding discipline. Our functions and modules can’t exceed a certain length. The cyclomatic complexity of our functions is limited.
If all of this passes successfully, Jenkins automatically pushes the checked-out commit to master, which is where we base our development. Thus, we’re always basing our development on known-vetted code.
If any of it fails, Jenkins still has a couple more tricks to pull. Here’s our on-failure script:
This comes in two parts. The first runs a standard-issue git-bisect between origin/proposed and origin/master. Since origin/master has already been vetted by jenkins (that’s how it became master), we know there’ll be a regression somewhere between the commits. This goes into the session output, and is e-mailed to the relevant committers. Next, we roll the proposed branch back to the already-vetted master branch. Whatever pull request broke the build will have to be re-made from scratch.