Migrating to Jenkins 2: A Continuous Integration Odyssey

It doesn’t matter what language your application is written in or how it’s structured, if you want other people to use it, you need some way to take changes made by a developer and get them on to a production server connected to the rest of the world. Oh, and you probably want to test those changes along the way, too.

For a small, personal project that runs on a single t2.micro AWS EC2 instance or a shared server, this could be as simple as running tests locally and pulling your code onto the server with git or rsync or what have you. For anything much larger you’ll want a process that is more robust and automated along with a way to run that process, like a Continuous Integration server.

Here at Rover, we rely on Jenkins to manage our testing and deployment pipelines. Jenkins has been an industry standard for years but recently it had started to show its age. Jenkins may have kicked off the Continuous Integration movement and its vast ecosystem of plugins has kept it popular, but more recent entries to the field offer some innovative features, not to mention fancier UIs.

About a year and a half ago, Jenkins 2 was released. While it’s still the same digital butler we know and love, it borrows a few modern features from the newcomers. Most notably, Jenkins 2 supports code-driven pipeline jobs. This feature had been available for a while as a plugin, but now pipelines are a core feature and first-class citizen in Jenkins.

We didn’t upgrade immediatly, but some time later we began investigating alternatives to Jenkins and decided to try out Jenkins 2 as part of that process. Our data and ops team started to use Jenkins 2 to build their new projects while our old Jenkins server continued to run our main testing and deployment jobs. This worked for a while, but a few front-end bugs would soon change things.

If it ain’t broke…

We take testing pretty seriously at Rover. With over 21,000 unit tests, our python engineers can be relatively sure that potential bugs will be caught before code goes live. For the front-end we have unit tests along with a robust set of end-to-end tests, but when run sequentially after the python unit test, our master build job took around twenty minutes to run. For some organizations or applications, a twenty minute build time might be acceptable, but we ship code to production anywhere from ten to twenty times a day and aim for a build time of less than ten minutes.

For a while we got away with running the front-end end-to-end tests in a separate job every few hours or manually if we were about to make a big change, but it wasn’t ideal. After a particularly nasty bug made it to production, we decided to investigate moving our main testing and deployment pipeline to Jenkins 2, where we could take advantage of concurrent testing steps. This would allow us to run the unit tests and end-to-end tests in parallel.

As my fellow engineer, Albert described in a previous post, every full stack engineer has a two-week rotation on our data and operations team to help cross-train and understand the infrastructure that our code is deployed to. The ops and data team does a great job of pairing people up with work that interests them during their rotation and since I had managed Jenkins servers before, I was tasked with moving our testing and deployment pipeline to Jenkins 2 when my rotation came up.

Follow along, as I cover our migration and the Good, the Bad, and the Ugly sides of Jenkins 2.

The Old Setup

Our deployment process previously consisted of three different Jenkins “projects”, or as I like to call them, jobs. The three jobs that made up the old Rover build suite were the PR Builder, the Master Job, and the Web Deploy Job.

If you’re unfamiliar with what a job is in Jenkins, it’s essentially a set of steps to be executed and a set of conditions that trigger the job to run, which creates what is known as a build. The triggers that start a build can range from something as simple as clicking the “Build Now” button on the job’s page to
something more complex, like building when a new commit is pushed to a branch of a git repository.

In a traditional Jenkins job, you can define some pre-build actions, like checking out a specific branch of a git repository before a build starts. The build itself consists of a number of steps which might involve running a suite of tests or arbitrary shell scripts. Finally, you can set up post-build actions like archiving the test results, notifying Slack with the status of the build, or kicking off a build of a different job.

We use git as our version control solution and create a new branch for every ticket or feature we work on at Rover. When a branch is ready, we create a Pull Request. The old PR Builder Job was configured to look for new or updated Pull Requests and subject them to our entire unit test suite. Once a Pull Request has passed the tests and undergone a thorough code review, it gets merged into our master branch.

The Web Master Job polled for changes to the master branch and, like the PR builder, it ran tests against new commits. Unlike the PR builder, it ran a few additional tests and if they all passed, it would merge the master branch into the stable branch and trigger the Web Deploy Job.

The full details are a story for another time, but suffice to say the Web Deploy Job takes the stable branch and deploys it to our production servers. There’s a whole lot of hand waving, devops team magic, bash, ssh keys, salt states and fabfiles involved in the deployment process, but for us full-stack engineers, “it just works” (we’re also in the process of overhauling and modernizing the deployment process, which is super exciting).

The Plan

We had three big goals we wanted to accomplish by upgrading to Jenkins 2: running our test suites in parallel, moving our pipeline logic to a Jenkinsfile, and consolidating the Web Master and PR Builder jobs into a single job.

Parallel testing would allow us to run our end-to-end tests before every deployment.

Moving our pipeline logic to a file in our main repository takes us one more step down the road of infrastructure as code. We make heavy use of terraform and Salt to keep our infrastructure defined in a human-readable and version-controlled format but our existing Jenkins jobs were configured via a web form on the Jenkins server (sure, the configuration is technically saved as xml on the server, but it’s not exactly human-readable).

Our third goal was simply a matter of digital housekeeping and embracing the DRY philosopy (Don’t Repeat Yourself). The PR Builder and Web Master Jobs were already very similar, so it made sense to merge them.

The Execution

Migrating

The work to migrate consisted of a few major phases. Before making major changes to our Master job, we had to recreate the Web Deploy job and the rest of our supporting jobs on Jenkins 2. This was straightforward enough, as these jobs were all traditional Jenkins jobs. The most critical part of this work was verifying that the new versions were working.

I glossed over it a bit earlier, but there are a few other critical jobs we had on our old Jenkins server, such as the all important rollback job. Even tests and code reviews won’t catch every single bug, so it’s important to have a way to quickly undo a change made to production. We wanted to make sure that deploys and rollbacks were working properly before fully cutting over to the new system. As with a lot of operational work, a thorough plan, checklist, and manual verification were keys to success here.

Pipelines and Jenkinsfiles

With a working system in place on Jenkins 2, it was time to write a Jenkinsfile and take advantage of Jenkins’ new features. Generally placed at the root of your project, where Jenkins can auto-detect it, a Jenkinsfile just describes how your project should be built. Since we already had the beginnings of one for running our end-to-end tests, I took that and ran with it.

As Jenkins is written in Java and runs on the JVM, Jenkinsfiles are written in Groovy, a JVM scripting language that takes inspiration from Ruby and JavaScript. You define a list of steps for Jenkins to execute and specify where the steps should run (our test steps need to run on our beefy test runner boxes, for example). To support some additional features as well as making things even more confusing, Jenkinsfiles are actually written in a DSL built on top of Groovy that adds a few Jenkins-specific features. To further complicate the situation, Jenkins offers not one, but two similar and subtly different Pipeline DSL Syntaxes, Declaritive and Scripted.

The Declaritive syntax is newer and seems to be the direction that the Jenkins developers want to move in. It’s much more of a DSL than the Scripted syntax with a simpler, more opinionated syntax that focuses on common build pipeline operations. I attempted to shoehorn our build process into a Declaritive Jenkinsfile, but a few key operations we relied on were nigh impossible to implement using the more restrictive syntax. In theory, it seems like a great idea, but honestly it felt like an unfinished product.

The Scripted syntax, on the other hand, lets you do just about anything, but at the expense of making you do everything. There’s already a huge gulf between traditional Jenkins jobs, where with the click of a button you can do things like send out notifications when a build fails, and Jenkinsfile-driven jobs where you have to script these kind of actions. The Scripted Pipeline syntax widens that gulf.

Want to notify a Slack channel when a build fails? You need to wrap each build step in a try… catch block and call the notifySlack fuction if it fails. Oh, and that notifySlack function? You have to manually pass it the build result and even specify what color Jenkins should make the message. Want to only send a message to Slack if a build succeeds after a string of failed builds? You need to write a custom groovy function that executes as the root Jenkins user so that it has access to the build history. This function goes into a separate, privilaged repository, where it can be called from pipeline jobs, which execute in a sandbox.

This proved to be an overarching theme with creating the Jenkinsfile pipeline, every convenience you’re used to Jenkins providing needs to be re-implemented by your script. At the time, it felt like a bit of a pain, but when the Jenkinsfile was finished, some of the advantages became more apparent. By simply wrapping the end-to-end testing and the “merge to the stable branch” stages of the pipeline with an if (env.BRANCH_NAME == 'master') conditional, I had effectively merged the functionality of the old Web Master and PR Builder jobs. Just the other day, I was helping a fellow engineer modify an older Jenkins job and found myself saying, “if only this had a Jenkinsfile…”

I would say that the hardest part of the process was sorting out what you could and couldn’t do with Declaritive vs Scripted pipeline syntaxes. The examples in the Jenkins docs didn’t always state which syntax they were using and other references, tutorials, and Stack Overflow answers suffered from similar confusion. Creating an experimental Jenkinsfile to play around with was a really key part of the process. I can also definitely see why the Jenkins devs want to move to the Declaritive syntax. Compared to other modern CI tools which embrace declaritive configuration, the Scripted sytax is pretty verbose and quickly winds up looking more like application code than a configuration. It’s also a shame that a lot of existing Jenkins plugins can’t easily be used in pipelines, but it didn’t affect us too much.

Pulling the trigger

With a shiny, new Jenkinsfile in hand, I created a multibranch pipeline job and configured it to look at our repository on github and build the master branch and any PR branches whenever they were updated. Jenkins immediatly swooped into action, scanning our repository and seeing that it had yet to run any of our open PRs through the pipeline, it decided to RUN ALL OF THEM AT ONCE. Because we had around 100 open PRs at the time and Jenkins 2 was only using one of our three dedicated testing nodes, this caused Jenkins to crash. We quickly restarted things and cancelled the scheduled jobs which ended up being not a big deal, but it’s definitely something to be aware of. Similarly, since Jenkins is configured to rebase our PRs on top of the master branch before testing and master changes frequently, re-scanning the repository will re-trigger most of the open PR builds. We solved this issue by just telling Jenkins not to periodically re-scan, but it still happens when we make a chage to the job’s configuration.

Confident that the new pipeline job was doing what it needed to, I disabled all jobs on the old Jenkins machine, asked that my fellow engineers hold of on merging things for a bit, and added the final step to the Jenkinsfile, the step which merges successful master builds into our stable branch and triggers a deployment.

I crossed my fingers and ran the Master job manually.

The Aftermath

Everything went as well as it could have. The build succeeded and was deployed. I ran a rollback to verify that our new rollback job was still working and the engineering team went back to shipping code to production. Over the next few days we made a few small improvements and decommissioned our old Jenkins instance. Our hard work was vindicated within a few day when our end-to-end tests caught a subtle bug that would have otherwise shipped.

Despite some challenges along the way, Jenkins 2 has been a big win for our team. Our testing and deployment process has been streamlined and put under version control, we’re running more tests than ever before, and our time from merge to deploy has stayed in the 10-15 minute range. One unexpected benefit I’ve discovered is that developers who are unfamiliar with Jenkins are much more likely to feel comfortable making changes to a Jenkinsfile rather than modifying an old-style Job through Jenkins’ web UI.

If you like solving interesting problems, learning from your experiences, and working with a team of similarly dedicated individuals, not to mention a bunch of cute dogs, we’re hiring.