Monday, December 26, 2016

On Documenting Software Engineering Projects

I started working through my own fork of the Open Eats recipe site software.  To date, I've only made a few minor changes, such as placing (almost) the entire repository in an openeats module, so that the manage.py file can run correctly without further user setup.  I've also started documenting a lot of what I'm finding here.

One of the first things I questioned was where my documentation should live.  I've essentially got access to this here blog, the above wiki, the issue tracker attached to Bitbucket, and the code itself.  I'm sure my preferences will change over time, but here are the current rules for documentation usage:

  • Code - Documentation - much to the surprise of a younger version of myself - should be minimal here.  If the 'how' of the code is unclear, the code should be refactored.  If there's a specific 'why' behind an algorithmic design, that's fair game, though I could argue that documentation belongs in the wiki.  Architectural/project design should be left to the wiki.  So, the code should really only behave as an actor, not as an explainer.
  • Issue Tracker - Document immediate issues that are either enhancements (which I'm also treating as stories) or bugs.  I'm not using any other types.  If they existed in this simple bug tracker, I'd potentially use epics to group bugs and enhancements, but they don't, so I won't.  Things here should be immediate calls to action with very specific exit criteria for fixing.  There may be some overlap with the wiki on information, but that's OK.  We don't refactor issues or wikis, so duplicate information is OK.  Although, we should be very diligent in editing the wiki to ensure that information is correct.  Issue tracking requires less diligent editing, as issues are expected to be ephemeral and out of date at some not too far distant point.  Of course, anyone who's ever managed a backlog knows this isn't always true, but the principle still holds.
  • Wiki - This is the canonical documentation and requires the most care.  Wait, what!? Don't you mean the code requires the most care, since it runs the site?  Well, maybe.  In most cases, you'll be reading a lot more code than you'll be writing.  If you're not, even as part of greenfield development, you're doing it wrong, because you're copying and pasting from stackoverflow without understanding what's going on.  Second, the code is to some extent, self-correcting.  If you add a feature and it doesn't work correctly, you'll eventually find out.  If your documentation is wrong, you'll only wind up frustrating people in the long run who expect it to impart correct information.  The wiki can be used for just about anything, and I'd say that shading toward over-information is preferable to less information, as long as you're following along with a decent narrative (so, don't just barf up some documentation from somewhere else and expect that it will be sufficient for your wiki).  People are going to come to your wiki to understand just about everything, so you should give it the appropriate care.  What's more likely than not to show up in a wiki:
    • Architectural overviews and decisions.
    • Notes that are expected to last a long time and show why the decisions that are in place are in place.  
    • Notes describing particularly inscrutable or idiomatic code choices.  This reduces the text volume overhead in the code itself.
    • Discussions for future enhancements or extensive bug work.
  • Blog - I've debated a bit about whether or not both a blog and a wiki are necessary for documentation in this case.  If I had to jettison one, I'd lose the blog, but I think that a blog does a good job of complementing the wiki.  A blog generally shades toward thought process, while the wiki shows the output of that thought (though a wiki can also show thought process).  A blog is good for discussing meta topics (like this one about choosing documentation standards).  A blog is time based, where a wiki is more generally a static snapshot.  So, I can write a lengthy post about documentation standards and then completely contradict myself next week without making edits to the wiki documentation.  If I were promoting the recipe website, a blog serves as another separate medium for marketing, which serves a useful purpose.  Finally, a blog allows me to go off-script with various posts (say, if I want to review Rogue One tomorrow) without seeming too out of place.
A few final notes on documentation - 

Given the comprehensive abilities of markdown files, I'm fine with the argument that a good README.md with children could supplant a traditional wiki.  In fact, now, as I write this, that may be a better option, as then the repository and documentation is entirely self contained.  For the moment, Bitbucket makes it easy enough to integrate issue tracking, code, and a wiki that the argument for markdown is a little academic.

The most important part of this post (talk about burying the lede), is that documentation is difficult.  It requires a lot of writing and editing.  It's extremely time intensive, and can be frustrating when you put in a lot of effort, but people either don't get your point or don't read it.  However, those aren't excuses not to document.  Code without documentation is essentially dead code.  Code is part of a living ecosystem and needs support - it never lives in isolation.

This holds more true in the commercial space than in the hobbyist space, but, even though you may know the code like the back of your hand, eventually you'll quit or move on to something else and someone will need to support what you've built.

Saturday, December 17, 2016

A New Hope

Yes, of course that's a crappy Rogue One joke, because a new Star Wars movie came out.  More to the point, though, it's sorta true.  I've been working in earnest on my own recipe website for the last few months.  Originally, it was Python on Google App Engine (no wait, it was Angular!), then Python, then Python on a VPS solution.  I've learned a lot - especially around standing up infrastructure and basic security principles.  Enough so, that I've spent more time on the operations and infrastructure than I have on the actual feature code itself.  However, to move forward a little faster, I decided to see if there was another open source solution available.  Lo and behold, there was.  Before I get into that, though, let's go through my overall motivations for this project:

Motivations

  • I'm a software engineering manager now, not a software engineer.  In order to keep current on some skills, and learn some new ones, I wanted to get involved in a project.
  • I wanted to work on a project that has some nominal value for me.  Because I cook a lot, and because recipe sites are rife with ads that make browsing unbearably slow, I decided to write a recipe site.
  • Yes, I know I could do a ton of other things like install ad blocking software, utilize Google Drive, or probably just buy software, but I'm cheap and, also, see bullet point one.
  • I also want to learn as much as possible to run a small website.  This isn't start up territory where I lose money when I don't ship features.  This is a labor of love and software engineering.  Here's a list of things I want to explore and improve: back end feature development, front end feature development and UX design, mobile development, DB administration, scalability, reliability, monitoring, infrastructure as code, data science and documentation.  So, basically everything I can get my hands on (I'm sure there will be more).  However, I realize I may never get to all of those things, but it should be fun to try.
  • I do actually want to use the site though, and, if I concentrate on all of those things, I'll learn a lot (which is ok in the case), but won't actually have a product.
I realized after doing this, though, that one thing I liked about being a software engineer was being a code detective.  It annoys me to no end when people throw out old, working code to start over in search of a 'better' system rather than spend a little effort to improve what's there already.  In order to meet my goals above and adhere to my software engineering principles, I decided to find an open source recipe system I could work from.  In five minutes I found Open Eats by Googling for 'open source recipe software'.  Here's my copy of the repository.  Wish me luck!

Tuesday, January 19, 2016

Understanding the CD Pipeline

One of the first things I'm likely to be tasked with in my new roll is the creation of an automated continuous delivery/continuous deployment pipeline.  I've always been a bit flummoxed by the term 'CD Pipeline,' as it's yet another one of those phrases that practitioners claim is a concise representation of a concept, but what I really suspect is a surreptitious way to claim 'this is my community - stay out!'  Other words that quickly come to mind in that case are 'Enterprise,' 'Dependency Injection,' and 'Container.'  This goes a bit hand in hand with my start of a documentation rant in a previous post wherein these terms are explained in the most byzantine way possible for people who "really" get it (lawyers and financiers aren't immune to this either, so it's not just the software community).

Anyway, back to the CD pipeline.  After searching around, I found a few different posts that help outline what a pipeline is.  They're mostly in agreement, which is good, but one in particular  - even if it may be a bit older - does a very thorough job of outlining the steps. Here's the original post. Here's the relevant pipeline checklist:
  • Unit-tests
  • Acceptance tests
  • Code coverage and static analysis
  • Deployment to integration environment
  • Integration tests
  • Scenario/regression tests
  • Deployments to UAT and Performance test environment
  • More scenario/regression tests
  • Performance tests
  • Alerts, reports and Release Notes sent out
  • Deployment to release repository
  • All of which should be automated.

    In addition to that article, I found a few other interesting links that break down things into bite-sized chunks for me.  The first is the Atlassian series entitled  A skeptic's guide to continuous delivery, which builds a pretty good use case for anyone who doesn't believe in infrastructure (non-user facing features) investment.  It also builds a pretty good for those who do as well, or need a gentle reminder during project planning.

    Atlassian also has an entire section on Continuous Delivery.  In the interest of full disclosure, I've only browsed through the section, but it does look pretty comprehensive.  This article also provides a good general outline of a CD pipeline.  Thoughtworks, creators of The Bible on CD, also have a section on Continuous Delivery, but that seems to be a bit harder to navigate and more narrowly focused than Atlassian's section.

    Finally, you may ask - why don't you read the aforementioned Continuous Delivery book?  The answer's pretty simple.  At this stage, I'm not willing to slap down $35 for a 512 page book that I may not finish.  A lot of people I know found it extremely useful, but I'm also aware that my tastes in documentation differ from others, so it may not be as useful for me.  However, its contents are listed on Amazon, so I hope to peck through each of the sections and do my independent research where needed.

    Saturday, January 9, 2016

    An Exercise in Dockerfile Refactoring

    Finally! I may have something of value to contribute to the greater community based on my recent Docker documentation exploration.  While going through the Docker tutorials, here's the Dockerfile that accompanied the training/webapp image that's available on Docker Hub (coincidentally, I've been getting no results from Docker Hub for the past several hours, either via the website or via the CLI.  This happened a day or two ago as well, and I'm worried that either (a) the site isn't stable or (b) it's creating weird search parameters for me by default that restrict my viewing):

    Original Docker File
    FROM ubuntu:14.04
    RUN apt-get update
    RUN DEBIAN_FRONTEND=noninteractive apt-get install -y -q python-all python-pip
    ADD ./webapp/requirements.txt /tmp/requirements.txt
    RUN pip install -qr /tmp/requirements.txt
    ADD ./webapp /opt/webapp/
    WORKDIR /opt/webapp
    EXPOSE 5000
    CMD ["python", "app.py"]
    And here's my refactored file:

    New Docker File
    FROM ubuntu:14.04

    ENV DEBIAN_FRONTEND=noninteractive
    RUN apt-get update && apt-get install -y -q python-all python-pip

    COPY requirements.txt /tmp/requirements.txt
    RUN pip install -qr /tmp/requirements.txt

    COPY ./webapp/app.py /opt/webapp/
    COPY ./webapp/tests.py /opt/webapp/

    WORKDIR /opt/webapp
    EXPOSE 5000
    ENTRYPOINT ["python"]
    CMD ["app.py"]
    So, what's the difference between the two?

    • Well, obviously, the spacing, though that's less apparent in the above example, but it splits up the tasks into more natural groups. I hesitated as to whether or not comments here would be useful, given the size of the file, but I wouldn't be averse to them here.
    • DEBIAN_FRONTEND is now an ENV variable.  Having it lie in the middle of the apt-get install looks a little unclear to me.  For those of you who are unaware, it prevents apt-get from prompting for informational responses (not the [y/N] we're generally familiar with, but the 'this will download 23 packages' type of message).
    • Most surprisingly, apt-get update and apt-get install are not &&'d together in the original file.  That bucks the advice found in the best practices.  In the original file, if we add another package to the install line, 'apt-get update' isn't run due to caching practices.  This means the new package could be based off of a different version than the rest of the app!  It's probably a minor point in this case, but this is the go-to training document used throughout the literature.
    • Ditto on the literature for ADD vs. COPY.
    • Files should be COPY'd over 1 at a time, otherwise changes within a directory could be missed due to caching that's part of image layering.  (See the above link for further details).
    • Though you can't see it here, the original webapp directory copies the requirements.txt file to /tmp.  I put it in a top level directory when building the Docker image.  It's strange to have it in two places (webapp and tmp).
    • I used ENTRYPOINT vs. CMD to start the app.  In this particular case, I believe it's just semantics, but could prove to be a useful differentiation in cases where the ENTRYPOINT may be a bash script rather than an app.  Further discussion on the difference starts here.
    • There was a Procfile for Flask in the webapp directory.  I removed it.  It looks like it's Heroku specific.  I'm not really sure why it was lying around.
    • On my box, my image is actually larger at 358 MB vs. 348 MB.  I'm not completely certain why that's the case, though I do suspect it may be due to the apt-get update vs. install nuance I mentioned above.
    So, that's it!  You may have known this all before, may not care, or may violently disagree, but I'm happy to say that I was finally able to read the literature and make a change based on what I've learned so far.

    Friday, January 8, 2016

    Easing Into 2015's Technology

    Yeah, that 2015 isn't a typo - I'm starting to actually look at this interesting container ecosystem called 'Docker.' Maybe you've heard of it. I'm proud to say that, since my last post in December, I've become a pro at deploying my EC2 instance via the GUI. I still haven't figured out the AWS CLI yet, however. Considering that I only have the budget to support the Free Tier offering, though, that shouldn't be a massive concern at this time. Anyway, here are the two things I love most about my EC2 instance:

    1. I have root access - no questions asked.  It makes me feel vaguely naughty, but not too much, because I know I can only shoot myself in the face.
    2. I can easily install docker via yum. (Yes, I'm easily entertained by simple things in life).
    (2) is particularly helpful, as it allows me to actually set up docker.  Unsurprisingly, this is difficult to do on a Chromebook that doesn't allow kernel virtualization with ease.  Sadly, this is also difficult to do on my current Linux workstation at work.  

    So, I went on a bit of a ramble about poor technical documentation in my last post.  As far as Docker is concerned, I'm pleased to say that, like the Chef documentation, it's very well laid out for someone who needs to digest things in bite-sized chunks.  I'm currently making my way through the user guide, and have actually understood over 95% of what I've read.

    I hope to end my post with something a bit more interesting than "Look at me, I can use yum!"  In one of my previous posts I wondered aloud as to what the pros and cons of Chef vs. Docker are.  I'm still not confident that I know all of the nuances, but here's my limited opinion as a dilettante:
    • Both allow for version control - Chef essentially via its entire repo, Docker via a Dockerfile.
    • Dockerfiles, without some serious exec bash script hackery, are more restrictive in their syntax and image set-up.
    • Chef, particularly because it leverages Ruby, (hopefully) allows more elegance for tricky configuration issues.
    • Chef, through its databags, elegantly solves the problem of password leakage via version control. Docker requires additional tools.
    • Chef certainly works well for long running VM images/instances that need configuration changes over time.
    • Docker's great for more ephemeral/simple set ups.
    Based on the above, I'm leaning towards drinking more of the Docker Kool-aid.  Hopefully, any image of a production instance that an organization maintains doesn't need configuration layer after configuration layer on top of it.  I'd assume (at this naive level of my journey), that a Dockerfile should be sufficient to start up a production instance.  Of course, I'll let you know once a deploy something into production that isn't an exercise in navel gazing.