Re-Engineered: January 2016

Tuesday, January 19, 2016

Understanding the CD Pipeline

One of the first things I'm likely to be tasked with in my new roll is the creation of an automated continuous delivery/continuous deployment pipeline. I've always been a bit flummoxed by the term 'CD Pipeline,' as it's yet another one of those phrases that practitioners claim is a concise representation of a concept, but what I really suspect is a surreptitious way to claim 'this is my community - stay out!' Other words that quickly come to mind in that case are 'Enterprise,' 'Dependency Injection,' and 'Container.' This goes a bit hand in hand with my start of a documentation rant in a previous post wherein these terms are explained in the most byzantine way possible for people who "really" get it (lawyers and financiers aren't immune to this either, so it's not just the software community).

Anyway, back to the CD pipeline. After searching around, I found a few different posts that help outline what a pipeline is. They're mostly in agreement, which is good, but one in particular - even if it may be a bit older - does a very thorough job of outlining the steps. Here's the original post. Here's the relevant pipeline checklist:

Unit-tests

Acceptance tests

Code coverage and static analysis

Deployment to integration environment

Integration tests

Scenario/regression tests

Deployments to UAT and Performance test environment

More scenario/regression tests

Performance tests

Alerts, reports and Release Notes sent out

Deployment to release repository

All of which should be automated.

In addition to that article, I found a few other interesting links that break down things into bite-sized chunks for me. The first is the Atlassian series entitled A skeptic's guide to continuous delivery, which builds a pretty good use case for anyone who doesn't believe in infrastructure (non-user facing features) investment. It also builds a pretty good for those who do as well, or need a gentle reminder during project planning.

Atlassian also has an entire section on Continuous Delivery. In the interest of full disclosure, I've only browsed through the section, but it does look pretty comprehensive. This article also provides a good general outline of a CD pipeline. Thoughtworks, creators of The Bible on CD, also have a section on Continuous Delivery, but that seems to be a bit harder to navigate and more narrowly focused than Atlassian's section.

Finally, you may ask - why don't you read the aforementioned Continuous Delivery book? The answer's pretty simple. At this stage, I'm not willing to slap down $35 for a 512 page book that I may not finish. A lot of people I know found it extremely useful, but I'm also aware that my tastes in documentation differ from others, so it may not be as useful for me. However, its contents are listed on Amazon, so I hope to peck through each of the sections and do my independent research where needed.

Saturday, January 9, 2016

An Exercise in Dockerfile Refactoring

Finally! I may have something of value to contribute to the greater community based on my recent Docker documentation exploration. While going through the Docker tutorials, here's the Dockerfile that accompanied the training/webapp image that's available on Docker Hub (coincidentally, I've been getting no results from Docker Hub for the past several hours, either via the website or via the CLI. This happened a day or two ago as well, and I'm worried that either (a) the site isn't stable or (b) it's creating weird search parameters for me by default that restrict my viewing):

Original Docker File

FROM ubuntu:14.04 RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y -q python-all python-pip ADD ./webapp/requirements.txt /tmp/requirements.txt RUN pip install -qr /tmp/requirements.txt ADD ./webapp /opt/webapp/ WORKDIR /opt/webapp EXPOSE 5000 CMD ["python", "app.py"]

And here's my refactored file:

New Docker File

FROM ubuntu:14.04 ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y -q python-all python-pip COPY requirements.txt /tmp/requirements.txt RUN pip install -qr /tmp/requirements.txt COPY ./webapp/app.py /opt/webapp/ COPY ./webapp/tests.py /opt/webapp/ WORKDIR /opt/webapp EXPOSE 5000 ENTRYPOINT ["python"] CMD ["app.py"]

So, what's the difference between the two?

Well, obviously, the spacing, though that's less apparent in the above example, but it splits up the tasks into more natural groups. I hesitated as to whether or not comments here would be useful, given the size of the file, but I wouldn't be averse to them here.
DEBIAN_FRONTEND is now an ENV variable. Having it lie in the middle of the apt-get install looks a little unclear to me. For those of you who are unaware, it prevents apt-get from prompting for informational responses (not the [y/N] we're generally familiar with, but the 'this will download 23 packages' type of message).
Most surprisingly, apt-get update and apt-get install are not &&'d together in the original file. That bucks the advice found in the best practices. In the original file, if we add another package to the install line, 'apt-get update' isn't run due to caching practices. This means the new package could be based off of a different version than the rest of the app! It's probably a minor point in this case, but this is the go-to training document used throughout the literature.
Ditto on the literature for ADD vs. COPY.
Files should be COPY'd over 1 at a time, otherwise changes within a directory could be missed due to caching that's part of image layering. (See the above link for further details).
Though you can't see it here, the original webapp directory copies the requirements.txt file to /tmp. I put it in a top level directory when building the Docker image. It's strange to have it in two places (webapp and tmp).
I used ENTRYPOINT vs. CMD to start the app. In this particular case, I believe it's just semantics, but could prove to be a useful differentiation in cases where the ENTRYPOINT may be a bash script rather than an app. Further discussion on the difference starts here.
There was a Procfile for Flask in the webapp directory. I removed it. It looks like it's Heroku specific. I'm not really sure why it was lying around.
On my box, my image is actually larger at 358 MB vs. 348 MB. I'm not completely certain why that's the case, though I do suspect it may be due to the apt-get update vs. install nuance I mentioned above.

So, that's it! You may have known this all before, may not care, or may violently disagree, but I'm happy to say that I was finally able to read the literature and make a change based on what I've learned so far.

Friday, January 8, 2016

Easing Into 2015's Technology

Yeah, that 2015 isn't a typo - I'm starting to actually look at this interesting container ecosystem called 'Docker.' Maybe you've heard of it. I'm proud to say that, since my last post in December, I've become a pro at deploying my EC2 instance via the GUI. I still haven't figured out the AWS CLI yet, however. Considering that I only have the budget to support the Free Tier offering, though, that shouldn't be a massive concern at this time. Anyway, here are the two things I love most about my EC2 instance:

I have root access - no questions asked. It makes me feel vaguely naughty, but not too much, because I know I can only shoot myself in the face.
I can easily install docker via yum. (Yes, I'm easily entertained by simple things in life).

(2) is particularly helpful, as it allows me to actually set up docker. Unsurprisingly, this is difficult to do on a Chromebook that doesn't allow kernel virtualization with ease. Sadly, this is also difficult to do on my current Linux workstation at work.

So, I went on a bit of a ramble about poor technical documentation in my last post. As far as Docker is concerned, I'm pleased to say that, like the Chef documentation, it's very well laid out for someone who needs to digest things in bite-sized chunks. I'm currently making my way through the user guide, and have actually understood over 95% of what I've read.

I hope to end my post with something a bit more interesting than "Look at me, I can use yum!" In one of my previous posts I wondered aloud as to what the pros and cons of Chef vs. Docker are. I'm still not confident that I know all of the nuances, but here's my limited opinion as a dilettante:

Both allow for version control - Chef essentially via its entire repo, Docker via a Dockerfile.
Dockerfiles, without some serious exec bash script hackery, are more restrictive in their syntax and image set-up.
Chef, particularly because it leverages Ruby, (hopefully) allows more elegance for tricky configuration issues.
Chef, through its databags, elegantly solves the problem of password leakage via version control. Docker requires additional tools.
Chef certainly works well for long running VM images/instances that need configuration changes over time.
Docker's great for more ephemeral/simple set ups.

Based on the above, I'm leaning towards drinking more of the Docker Kool-aid. Hopefully, any image of a production instance that an organization maintains doesn't need configuration layer after configuration layer on top of it. I'd assume (at this naive level of my journey), that a Dockerfile should be sufficient to start up a production instance. Of course, I'll let you know once a deploy something into production that isn't an exercise in navel gazing.