It doesn’t matter what language your application is written in or how it’s structured, if you want other people to use it, you need some way to take changes made by a developer and get them on to a production server connected to the rest of the world. Oh, and you probably want to test those changes along the way, too.
For a small, personal project that runs on a single t2.micro AWS EC2 instance or a shared server, this could be as simple as running tests locally and pulling your code onto the server with git or rsync or what have you. For anything much larger you’ll want a process that is more robust and automated along with a way to run that process, like a Continuous Integration server.
Here at Rover, we rely on Jenkins to manage our testing and deployment pipelines. Jenkins has been an industry standard for years but recently it had started to show its age. Jenkins may have kicked off the Continuous Integration movement and its vast ecosystem of plugins has kept it popular, but more recent entries to the field offer some innovative features, not to mention fancier UIs.
About a year and a half ago, Jenkins 2 was released. While it’s still the same digital butler we know and love, it borrows a few modern features from the newcomers. Most notably, Jenkins 2 supports code-driven pipeline jobs. This feature had been available for a while as a plugin, but now pipelines are a core feature and first-class citizen in Jenkins.
We didn’t upgrade immediatly, but some time later we began investigating alternatives to Jenkins and decided to try out Jenkins 2 as part of that process. Our data and ops team started to use Jenkins 2 to build their new projects while our old Jenkins server continued to run our main testing and deployment jobs. This worked for a while, but a few front-end bugs would soon change things.
If it ain’t broke…
We take testing pretty seriously at Rover. With over 21,000 unit tests, our python engineers can be relatively sure that potential bugs will be caught before code goes live. For the front-end we have unit tests along with a robust set of end-to-end tests, but when run sequentially after the python unit test, our master build job took around twenty minutes to run. For some organizations or applications, a twenty minute build time might be acceptable, but we ship code to production anywhere from ten to twenty times a day and aim for a build time of less than ten minutes.
For a while we got away with running the front-end end-to-end tests in a separate job every few hours or manually if we were about to make a big change, but it wasn’t ideal. After a particularly nasty bug made it to production, we decided to investigate moving our main testing and deployment pipeline to Jenkins 2, where we could take advantage of concurrent testing steps. This would allow us to run the unit tests and end-to-end tests in parallel.
As my fellow engineer, Albert described in a previous post, every full stack engineer has a two-week rotation on our data and operations team to help cross-train and understand the infrastructure that our code is deployed to. The ops and data team does a great job of pairing people up with work that interests them during their rotation and since I had managed Jenkins servers before, I was tasked with moving our testing and deployment pipeline to Jenkins 2 when my rotation came up.
Follow along, as I cover our migration and the Good, the Bad, and the Ugly sides of Jenkins 2.
The Old Setup
Our deployment process previously consisted of three different Jenkins “projects”, or as I like to call them, jobs. The three jobs that made up the old Rover build suite were the PR Builder, the Master Job, and the Web Deploy Job.
If you’re unfamiliar with what a job is in Jenkins, it’s essentially a set of steps to be executed and a set of conditions that trigger the job to run, which creates what is known as a build. The triggers that start a build can range from something as simple as clicking the “Build Now” button on the job’s page to something more complex, like building when a new commit is pushed to a branch of a git repository.
In a traditional Jenkins job, you can define some pre-build actions, like checking out a specific branch of a git repository before a build starts. The build itself consists of a number of steps which might involve running a suite of tests or arbitrary shell scripts. Finally, you can set up post-build actions like archiving the test results, notifying Slack with the status of the build, or kicking off a build of a different job.
We use git as our version control solution and create a new branch for every
ticket or feature we work on at Rover. When a branch is ready, we create a Pull
Request. The old PR Builder Job was configured to look for new or updated Pull
Requests and subject them to our entire unit test suite. Once a Pull Request has
passed the tests and undergone a
thorough code review, it
gets merged into our
The Web Master Job polled for changes to the
master branch and, like the PR
builder, it ran tests against new commits. Unlike the PR builder, it ran a few
additional tests and if they all passed, it would merge the
master branch into
stable branch and trigger the Web Deploy Job.
The full details are a story for another time, but suffice to say the Web Deploy
Job takes the
stable branch and deploys it to our production servers. There’s
a whole lot of hand waving, devops team magic, bash, ssh keys, salt states and
fabfiles involved in the deployment process, but for us full-stack engineers,
“it just works” (we’re also in the process of overhauling and modernizing the
deployment process, which is super exciting).
We had three big goals we wanted to accomplish by upgrading to Jenkins 2:
running our test suites in parallel, moving our pipeline logic to a
Jenkinsfile, and consolidating the Web Master and PR Builder jobs into a
Parallel testing would allow us to run our end-to-end tests before every deployment.
Moving our pipeline logic to a file in our main repository takes us one more step down the road of infrastructure as code. We make heavy use of terraform and Salt to keep our infrastructure defined in a human-readable and version-controlled format but our existing Jenkins jobs were configured via a web form on the Jenkins server (sure, the configuration is technically saved as xml on the server, but it’s not exactly human-readable).
Our third goal was simply a matter of digital housekeeping and embracing the DRY philosopy (Don’t Repeat Yourself). The PR Builder and Web Master Jobs were already very similar, so it made sense to merge them.
The work to migrate consisted of a few major phases. Before making major changes to our Master job, we had to recreate the Web Deploy job and the rest of our supporting jobs on Jenkins 2. This was straightforward enough, as these jobs were all traditional Jenkins jobs. The most critical part of this work was verifying that the new versions were working.
I glossed over it a bit earlier, but there are a few other critical jobs we had on our old Jenkins server, such as the all important rollback job. Even tests and code reviews won’t catch every single bug, so it’s important to have a way to quickly undo a change made to production. We wanted to make sure that deploys and rollbacks were working properly before fully cutting over to the new system. As with a lot of operational work, a thorough plan, checklist, and manual verification were keys to success here.
Pipelines and Jenkinsfiles
With a working system in place on Jenkins 2, it was time to write a
Jenkinsfile and take advantage of Jenkins’ new features. Generally placed at
the root of your project, where Jenkins can auto-detect it, a
describes how your project should be built. Since we already had the beginnings
of one for running our end-to-end tests, I took that and ran with it.
As Jenkins is written in Java and runs on the JVM,
Jenkinsfiles are written in
Groovy, a JVM scripting language that takes inspiration from Ruby and
the steps should run (our test steps need to run on our beefy test runner boxes,
for example). To support some additional features as well as making things even
Jenkinsfiles are actually written in a DSL built on top of
Groovy that adds a few Jenkins-specific features. To further complicate the
situation, Jenkins offers not one, but two similar and subtly different Pipeline
DSL Syntaxes, Declaritive and Scripted.
The Declaritive syntax is newer and seems to be the direction that the Jenkins
developers want to move in. It’s much more of a DSL than the Scripted syntax
with a simpler, more opinionated syntax that focuses on common build pipeline
operations. I attempted to shoehorn our build process into a Declaritive
Jenkinsfile, but a few key operations we relied on were nigh impossible to
implement using the more restrictive syntax. In theory, it seems like a great
idea, but honestly it felt like an unfinished product.
The Scripted syntax, on the other hand, lets you do just about anything, but at
the expense of making you do everything. There’s already a huge gulf between
traditional Jenkins jobs, where with the click of a button you can do things
like send out notifications when a build fails, and
where you have to script these kind of actions. The Scripted Pipeline syntax
widens that gulf.
Want to notify a Slack channel when a build fails? You need to wrap each build
step in a
catch block and call the
notifySlack fuction if it fails.
Oh, and that
notifySlack function? You have to manually pass it the build
result and even specify what color Jenkins should make the message. Want to only
send a message to Slack if a build succeeds after a string of failed builds? You
need to write a custom groovy function that executes as the root Jenkins user so
that it has access to the build history. This function goes into a separate,
privilaged repository, where it can be called from pipeline jobs, which execute
in a sandbox.
This proved to be an overarching theme with creating the
every convenience you’re used to Jenkins providing needs to be re-implemented by
your script. At the time, it felt like a bit of a pain, but when the
Jenkinsfile was finished, some of the advantages became more apparent. By
simply wrapping the end-to-end testing and the “merge to the stable branch”
stages of the pipeline with an
if (env.BRANCH_NAME == 'master') conditional, I
had effectively merged the functionality of the old Web Master and PR Builder
jobs. Just the other day, I was helping a fellow engineer modify an older
Jenkins job and found myself saying, “if only this had a
I would say that the hardest part of the process was sorting out what you could
and couldn’t do with Declaritive vs Scripted pipeline syntaxes. The examples in
the Jenkins docs didn’t always state which syntax they were using and other
references, tutorials, and Stack Overflow answers suffered from similar
confusion. Creating an experimental
Jenkinsfile to play around with was a
really key part of the process. I can also definitely see why the Jenkins devs
want to move to the Declaritive syntax. Compared to other modern CI tools which
embrace declaritive configuration, the Scripted sytax is pretty verbose and
quickly winds up looking more like application code than a configuration. It’s
also a shame that a lot of existing Jenkins plugins can’t easily be used in
pipelines, but it didn’t affect us too much.
Pulling the trigger
With a shiny, new
Jenkinsfile in hand, I created a multibranch pipeline job
and configured it to look at our repository on github and build the
branch and any PR branches whenever they were updated. Jenkins immediatly
swooped into action, scanning our repository and seeing that it had yet to run
any of our open PRs through the pipeline, it decided to RUN ALL OF THEM AT
ONCE. Because we had around 100 open PRs at the time and Jenkins 2 was only
using one of our three dedicated testing nodes, this caused Jenkins to crash. We
quickly restarted things and cancelled the scheduled jobs which ended up being
not a big deal, but it’s definitely something to be aware of. Similarly, since
Jenkins is configured to rebase our PRs on top of the
master branch before
master changes frequently, re-scanning the repository will
re-trigger most of the open PR builds. We solved this issue by just telling
Jenkins not to periodically re-scan, but it still happens when we make a chage
to the job’s configuration.
Confident that the new pipeline job was doing what it needed to, I disabled all
jobs on the old Jenkins machine, asked that my fellow engineers hold of on
merging things for a bit, and added the final step to the
step which merges successful master builds into our
stable branch and triggers
I crossed my fingers and ran the Master job manually.
Everything went as well as it could have. The build succeeded and was deployed. I ran a rollback to verify that our new rollback job was still working and the engineering team went back to shipping code to production. Over the next few days we made a few small improvements and decommissioned our old Jenkins instance. Our hard work was vindicated within a few day when our end-to-end tests caught a subtle bug that would have otherwise shipped.
Despite some challenges along the way, Jenkins 2 has been a big win for our
team. Our testing and deployment process has been streamlined and put under
version control, we’re running more tests than ever before, and our time from
merge to deploy has stayed in the 10-15 minute range. One unexpected benefit
I’ve discovered is that developers who are unfamiliar with Jenkins are much more
likely to feel comfortable making changes to a
Jenkinsfile rather than
modifying an old-style Job through Jenkins’ web UI.
If you like solving interesting problems, learning from your experiences, and working with a team of similarly dedicated individuals, not to mention a bunch of cute dogs, we’re hiring.