Re-Engineered

Sunday, January 22, 2017

The Software Detective - Django Authentication

As I move through the Django upgrade of my OpenEats fork, I also continue to double down on my philosophy that it's better to build from a previous foundation than it is to just write new code. In today's argument to that effect, I'll use the site's authentication system as an example.

The current OpenEats site uses a combination of django registration redux and the framework's authentication module. It also places login and sign up logic on the same page as separate tabs and makes calls to those individual functions via ajax. I don't find anything wrong with this approach, but it seems a bit complicated in its design, so I decided to separate the two functions out into separate pages without ajax calls.

Now, if I were to do this as a green field project, I likely would have looked up the framework's authentication module and got a lot of the login/logout/password reset goodness that comes with it, but I wouldn't have found the additional registration functionality that comes with the redux app. I may have stumbled on to redux if I were researching alternative authentication modules, but it would have been one of many. The fact that I may have chosen something else isn't bad, but I probably would have spent too much time looking at the pros and cons of each module than simply selecting something, testing it, and moving on. Because redux was already there, it reduced the decision space I had to wander through. Also, because the redux functionality is pretty comprehensive, but the documentation hasn't yet caught up, I needed to read through the redux source code (which is pretty easy to follow, btw). The end of the README also led me to a possible replacement when I want to move over to OAuth2.

Finally, because the OpenEats code references a lot of different snippets of authentication, and these made no sense to me at the beginning of the project, I needed to delve into the Django documentation to learn more about its authentication mechanisms. Even more to the point, I needed to learn how the code in front of me worked, so I had a clear stopping point when researching documentation. Again, if this were green field, I either would have copied and pasted code I don't quite understand, or attempted to read as much documentation as possible in order to ensure that I wasn't missing out on some valuable detail.

So, to recap, here's what I got out of not simply writing an authentication app from scratch:

I was able to leverage someone else's code to skip writing functionality that's generally considered tedious.
I was able to delve into other code on github to better understand my own project (and, as we've been told time and time again, it's good to read others' code and get familiar with github projects).
I had a ready definition of requirements for this portion of my project.
I was able to delve somewhat deep into the authentication documentation while making sure I didn't go too deep that I couldn't come up for air.
I found a lead for a possible future feature on my site.

Not too shabby for something that's generally considered maintenance programming.

Sunday, January 8, 2017

Everything New is Broken Again

I'm tackling the biggest programming task I've faced in quite some time as part of my fork of Open Eats. The code is in good working order, but it's a few years old and relies on a lot of outdated libraries, most notably, Django.

It's at this stage that I could take a few paths - I could ignore what's there and completely write my own site; I could take the requirements and just rewrite everything from scratch with those requirements; I could leave everything in place and build off of the older foundation; or I could undertake the maintenance effort to upgrade the system. I chose the last option. Though I do generally opt for fixing what's there, it's not always straightforward, and I think this article does a good job of outlining some of the reasons why.

In a professional atmosphere, one of the big questions companies need to ask themselves is - do we move forward to get the latest feature set out there and risk incurring technical debt, or do we work on technical debt and risk falling behind on the feature curve? There's no definitive answer, as it requires a balancing act. Start ups, because they only have a toe hold in an industry, usually opt for features. More established companies have the luxury of working on infrastructure projects (in fact, anyone at a start up who's advocating the use of microservices is probably an overzealous advocate of software engineering trends).

However, I feel that in both cases, small and large companies tend to do maintenance programming wrong. Even large companies attempt rewrites of portions of their code base and through their 'best' (usually read most prolific or most pedantic) programmers at the problem. This seems to be based on the human perception that if something's broken, starting anew will not only fix it, but improve it. Unfortunately, when rewriting something, people often ignore the lessons of Dr. Heidegger and continue making the same mistakes over again while contributing new bugs to the rewrite effort.

The only time I could ever advocate rewriting something from scratch is given the following conditions - (a) the requirements are well defined, meaning the problem set the software solves is relatively self-contained (b) no one at the organization either understands the project's code or has the ability to delve into it and (c) everyone realizes that the new code will miss existing features and surface new bugs.

If those conditions aren't met, it's better to maintain what's there than it is launch a new greenfield project. If you're making arguments like (a) it'll take too much time to maintain the existing project, or (b) it's too hard to rewrite because there's no safe way to rewrite the code (i.e. there's insufficient testing around the project), you're probably making the wrong argument. For (a) you're ignoring the time you'll need to learn a new technology or process, and you're certainly ignoring the time to fix the new bugs and update the features you forgot about during the initial rewrite. For (b), if you can't safely figure out what existing code can do, why do you think you can figure out something that has yet to exist?

This isn't to say that, via maintenance channels, you can't completely rewrite your entire stack. You just go about it from a renovation point of view. Don't bulldoze the house and blow up the foundation just to re-pour it. If the floors are worn (or you need an ORM to replace your aging JDBC code), sand the floors and refinish them. If the kitchen is outdated (or you need some semblance of an MVC framework to replace your hand-crafted version from 2000), install new counters and appliances. If you need a new addition to the house (or need to get off Java 1.4 and have the option to use something newer, like Go), draw up the plans and do the work to make sure it fits well with the existing frame.

Sure, maintenance programming doesn't have the starry-eyed narrative of perfectly formed code springing forth from the head of a genius architect, but the ultimate result takes no less intellectual effort and is no less rewarding in the long run.

Monday, December 26, 2016

On Documenting Software Engineering Projects

I started working through my own fork of the Open Eats recipe site software. To date, I've only made a few minor changes, such as placing (almost) the entire repository in an openeats module, so that the manage.py file can run correctly without further user setup. I've also started documenting a lot of what I'm finding here.

One of the first things I questioned was where my documentation should live. I've essentially got access to this here blog, the above wiki, the issue tracker attached to Bitbucket, and the code itself. I'm sure my preferences will change over time, but here are the current rules for documentation usage:

Code - Documentation - much to the surprise of a younger version of myself - should be minimal here. If the 'how' of the code is unclear, the code should be refactored. If there's a specific 'why' behind an algorithmic design, that's fair game, though I could argue that documentation belongs in the wiki. Architectural/project design should be left to the wiki. So, the code should really only behave as an actor, not as an explainer.
Issue Tracker - Document immediate issues that are either enhancements (which I'm also treating as stories) or bugs. I'm not using any other types. If they existed in this simple bug tracker, I'd potentially use epics to group bugs and enhancements, but they don't, so I won't. Things here should be immediate calls to action with very specific exit criteria for fixing. There may be some overlap with the wiki on information, but that's OK. We don't refactor issues or wikis, so duplicate information is OK. Although, we should be very diligent in editing the wiki to ensure that information is correct. Issue tracking requires less diligent editing, as issues are expected to be ephemeral and out of date at some not too far distant point. Of course, anyone who's ever managed a backlog knows this isn't always true, but the principle still holds.
Wiki - This is the canonical documentation and requires the most care. Wait, what!? Don't you mean the code requires the most care, since it runs the site? Well, maybe. In most cases, you'll be reading a lot more code than you'll be writing. If you're not, even as part of greenfield development, you're doing it wrong, because you're copying and pasting from stackoverflow without understanding what's going on. Second, the code is to some extent, self-correcting. If you add a feature and it doesn't work correctly, you'll eventually find out. If your documentation is wrong, you'll only wind up frustrating people in the long run who expect it to impart correct information. The wiki can be used for just about anything, and I'd say that shading toward over-information is preferable to less information, as long as you're following along with a decent narrative (so, don't just barf up some documentation from somewhere else and expect that it will be sufficient for your wiki). People are going to come to your wiki to understand just about everything, so you should give it the appropriate care. What's more likely than not to show up in a wiki:

Architectural overviews and decisions.
Notes that are expected to last a long time and show why the decisions that are in place are in place.
Notes describing particularly inscrutable or idiomatic code choices. This reduces the text volume overhead in the code itself.
Discussions for future enhancements or extensive bug work.

Blog - I've debated a bit about whether or not both a blog and a wiki are necessary for documentation in this case. If I had to jettison one, I'd lose the blog, but I think that a blog does a good job of complementing the wiki. A blog generally shades toward thought process, while the wiki shows the output of that thought (though a wiki can also show thought process). A blog is good for discussing meta topics (like this one about choosing documentation standards). A blog is time based, where a wiki is more generally a static snapshot. So, I can write a lengthy post about documentation standards and then completely contradict myself next week without making edits to the wiki documentation. If I were promoting the recipe website, a blog serves as another separate medium for marketing, which serves a useful purpose. Finally, a blog allows me to go off-script with various posts (say, if I want to review Rogue One tomorrow) without seeming too out of place.

A few final notes on documentation -

Given the comprehensive abilities of markdown files, I'm fine with the argument that a good README.md with children could supplant a traditional wiki. In fact, now, as I write this, that may be a better option, as then the repository and documentation is entirely self contained. For the moment, Bitbucket makes it easy enough to integrate issue tracking, code, and a wiki that the argument for markdown is a little academic.

The most important part of this post (talk about burying the lede), is that documentation is difficult. It requires a lot of writing and editing. It's extremely time intensive, and can be frustrating when you put in a lot of effort, but people either don't get your point or don't read it. However, those aren't excuses not to document. Code without documentation is essentially dead code. Code is part of a living ecosystem and needs support - it never lives in isolation.

This holds more true in the commercial space than in the hobbyist space, but, even though you may know the code like the back of your hand, eventually you'll quit or move on to something else and someone will need to support what you've built.

Saturday, December 17, 2016

A New Hope

Yes, of course that's a crappy Rogue One joke, because a new Star Wars movie came out. More to the point, though, it's sorta true. I've been working in earnest on my own recipe website for the last few months. Originally, it was Python on Google App Engine (no wait, it was Angular!), then Python, then Python on a VPS solution. I've learned a lot - especially around standing up infrastructure and basic security principles. Enough so, that I've spent more time on the operations and infrastructure than I have on the actual feature code itself. However, to move forward a little faster, I decided to see if there was another open source solution available. Lo and behold, there was. Before I get into that, though, let's go through my overall motivations for this project:

Motivations

I'm a software engineering manager now, not a software engineer. In order to keep current on some skills, and learn some new ones, I wanted to get involved in a project.
I wanted to work on a project that has some nominal value for me. Because I cook a lot, and because recipe sites are rife with ads that make browsing unbearably slow, I decided to write a recipe site.
Yes, I know I could do a ton of other things like install ad blocking software, utilize Google Drive, or probably just buy software, but I'm cheap and, also, see bullet point one.
I also want to learn as much as possible to run a small website. This isn't start up territory where I lose money when I don't ship features. This is a labor of love and software engineering. Here's a list of things I want to explore and improve: back end feature development, front end feature development and UX design, mobile development, DB administration, scalability, reliability, monitoring, infrastructure as code, data science and documentation. So, basically everything I can get my hands on (I'm sure there will be more). However, I realize I may never get to all of those things, but it should be fun to try.
I do actually want to use the site though, and, if I concentrate on all of those things, I'll learn a lot (which is ok in the case), but won't actually have a product.

I realized after doing this, though, that one thing I liked about being a software engineer was being a code detective. It annoys me to no end when people throw out old, working code to start over in search of a 'better' system rather than spend a little effort to improve what's there already. In order to meet my goals above and adhere to my software engineering principles, I decided to find an open source recipe system I could work from. In five minutes I found Open Eats by Googling for 'open source recipe software'. Here's my copy of the repository. Wish me luck!

Tuesday, January 19, 2016

Understanding the CD Pipeline

One of the first things I'm likely to be tasked with in my new roll is the creation of an automated continuous delivery/continuous deployment pipeline. I've always been a bit flummoxed by the term 'CD Pipeline,' as it's yet another one of those phrases that practitioners claim is a concise representation of a concept, but what I really suspect is a surreptitious way to claim 'this is my community - stay out!' Other words that quickly come to mind in that case are 'Enterprise,' 'Dependency Injection,' and 'Container.' This goes a bit hand in hand with my start of a documentation rant in a previous post wherein these terms are explained in the most byzantine way possible for people who "really" get it (lawyers and financiers aren't immune to this either, so it's not just the software community).

Anyway, back to the CD pipeline. After searching around, I found a few different posts that help outline what a pipeline is. They're mostly in agreement, which is good, but one in particular - even if it may be a bit older - does a very thorough job of outlining the steps. Here's the original post. Here's the relevant pipeline checklist:

Unit-tests

Acceptance tests

Code coverage and static analysis

Deployment to integration environment

Integration tests

Scenario/regression tests

Deployments to UAT and Performance test environment

More scenario/regression tests

Performance tests

Alerts, reports and Release Notes sent out

Deployment to release repository

All of which should be automated.

In addition to that article, I found a few other interesting links that break down things into bite-sized chunks for me. The first is the Atlassian series entitled A skeptic's guide to continuous delivery, which builds a pretty good use case for anyone who doesn't believe in infrastructure (non-user facing features) investment. It also builds a pretty good for those who do as well, or need a gentle reminder during project planning.

Atlassian also has an entire section on Continuous Delivery. In the interest of full disclosure, I've only browsed through the section, but it does look pretty comprehensive. This article also provides a good general outline of a CD pipeline. Thoughtworks, creators of The Bible on CD, also have a section on Continuous Delivery, but that seems to be a bit harder to navigate and more narrowly focused than Atlassian's section.

Finally, you may ask - why don't you read the aforementioned Continuous Delivery book? The answer's pretty simple. At this stage, I'm not willing to slap down $35 for a 512 page book that I may not finish. A lot of people I know found it extremely useful, but I'm also aware that my tastes in documentation differ from others, so it may not be as useful for me. However, its contents are listed on Amazon, so I hope to peck through each of the sections and do my independent research where needed.

Saturday, January 9, 2016

An Exercise in Dockerfile Refactoring

Finally! I may have something of value to contribute to the greater community based on my recent Docker documentation exploration. While going through the Docker tutorials, here's the Dockerfile that accompanied the training/webapp image that's available on Docker Hub (coincidentally, I've been getting no results from Docker Hub for the past several hours, either via the website or via the CLI. This happened a day or two ago as well, and I'm worried that either (a) the site isn't stable or (b) it's creating weird search parameters for me by default that restrict my viewing):

Original Docker File

FROM ubuntu:14.04 RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y -q python-all python-pip ADD ./webapp/requirements.txt /tmp/requirements.txt RUN pip install -qr /tmp/requirements.txt ADD ./webapp /opt/webapp/ WORKDIR /opt/webapp EXPOSE 5000 CMD ["python", "app.py"]

And here's my refactored file:

New Docker File

FROM ubuntu:14.04 ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y -q python-all python-pip COPY requirements.txt /tmp/requirements.txt RUN pip install -qr /tmp/requirements.txt COPY ./webapp/app.py /opt/webapp/ COPY ./webapp/tests.py /opt/webapp/ WORKDIR /opt/webapp EXPOSE 5000 ENTRYPOINT ["python"] CMD ["app.py"]

So, what's the difference between the two?

Well, obviously, the spacing, though that's less apparent in the above example, but it splits up the tasks into more natural groups. I hesitated as to whether or not comments here would be useful, given the size of the file, but I wouldn't be averse to them here.
DEBIAN_FRONTEND is now an ENV variable. Having it lie in the middle of the apt-get install looks a little unclear to me. For those of you who are unaware, it prevents apt-get from prompting for informational responses (not the [y/N] we're generally familiar with, but the 'this will download 23 packages' type of message).
Most surprisingly, apt-get update and apt-get install are not &&'d together in the original file. That bucks the advice found in the best practices. In the original file, if we add another package to the install line, 'apt-get update' isn't run due to caching practices. This means the new package could be based off of a different version than the rest of the app! It's probably a minor point in this case, but this is the go-to training document used throughout the literature.
Ditto on the literature for ADD vs. COPY.
Files should be COPY'd over 1 at a time, otherwise changes within a directory could be missed due to caching that's part of image layering. (See the above link for further details).
Though you can't see it here, the original webapp directory copies the requirements.txt file to /tmp. I put it in a top level directory when building the Docker image. It's strange to have it in two places (webapp and tmp).
I used ENTRYPOINT vs. CMD to start the app. In this particular case, I believe it's just semantics, but could prove to be a useful differentiation in cases where the ENTRYPOINT may be a bash script rather than an app. Further discussion on the difference starts here.
There was a Procfile for Flask in the webapp directory. I removed it. It looks like it's Heroku specific. I'm not really sure why it was lying around.
On my box, my image is actually larger at 358 MB vs. 348 MB. I'm not completely certain why that's the case, though I do suspect it may be due to the apt-get update vs. install nuance I mentioned above.

So, that's it! You may have known this all before, may not care, or may violently disagree, but I'm happy to say that I was finally able to read the literature and make a change based on what I've learned so far.

Friday, January 8, 2016

Easing Into 2015's Technology

Yeah, that 2015 isn't a typo - I'm starting to actually look at this interesting container ecosystem called 'Docker.' Maybe you've heard of it. I'm proud to say that, since my last post in December, I've become a pro at deploying my EC2 instance via the GUI. I still haven't figured out the AWS CLI yet, however. Considering that I only have the budget to support the Free Tier offering, though, that shouldn't be a massive concern at this time. Anyway, here are the two things I love most about my EC2 instance:

I have root access - no questions asked. It makes me feel vaguely naughty, but not too much, because I know I can only shoot myself in the face.
I can easily install docker via yum. (Yes, I'm easily entertained by simple things in life).

(2) is particularly helpful, as it allows me to actually set up docker. Unsurprisingly, this is difficult to do on a Chromebook that doesn't allow kernel virtualization with ease. Sadly, this is also difficult to do on my current Linux workstation at work.

So, I went on a bit of a ramble about poor technical documentation in my last post. As far as Docker is concerned, I'm pleased to say that, like the Chef documentation, it's very well laid out for someone who needs to digest things in bite-sized chunks. I'm currently making my way through the user guide, and have actually understood over 95% of what I've read.

I hope to end my post with something a bit more interesting than "Look at me, I can use yum!" In one of my previous posts I wondered aloud as to what the pros and cons of Chef vs. Docker are. I'm still not confident that I know all of the nuances, but here's my limited opinion as a dilettante:

Both allow for version control - Chef essentially via its entire repo, Docker via a Dockerfile.
Dockerfiles, without some serious exec bash script hackery, are more restrictive in their syntax and image set-up.
Chef, particularly because it leverages Ruby, (hopefully) allows more elegance for tricky configuration issues.
Chef, through its databags, elegantly solves the problem of password leakage via version control. Docker requires additional tools.
Chef certainly works well for long running VM images/instances that need configuration changes over time.
Docker's great for more ephemeral/simple set ups.

Based on the above, I'm leaning towards drinking more of the Docker Kool-aid. Hopefully, any image of a production instance that an organization maintains doesn't need configuration layer after configuration layer on top of it. I'd assume (at this naive level of my journey), that a Dockerfile should be sufficient to start up a production instance. Of course, I'll let you know once a deploy something into production that isn't an exercise in navel gazing.