Saturday, January 9, 2016

An Exercise in Dockerfile Refactoring

Finally! I may have something of value to contribute to the greater community based on my recent Docker documentation exploration.  While going through the Docker tutorials, here's the Dockerfile that accompanied the training/webapp image that's available on Docker Hub (coincidentally, I've been getting no results from Docker Hub for the past several hours, either via the website or via the CLI.  This happened a day or two ago as well, and I'm worried that either (a) the site isn't stable or (b) it's creating weird search parameters for me by default that restrict my viewing):

Original Docker File
FROM ubuntu:14.04
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y -q python-all python-pip
ADD ./webapp/requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt
ADD ./webapp /opt/webapp/
WORKDIR /opt/webapp
EXPOSE 5000
CMD ["python", "app.py"]
And here's my refactored file:

New Docker File
FROM ubuntu:14.04

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y -q python-all python-pip

COPY requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt

COPY ./webapp/app.py /opt/webapp/
COPY ./webapp/tests.py /opt/webapp/

WORKDIR /opt/webapp
EXPOSE 5000
ENTRYPOINT ["python"]
CMD ["app.py"]
So, what's the difference between the two?

  • Well, obviously, the spacing, though that's less apparent in the above example, but it splits up the tasks into more natural groups. I hesitated as to whether or not comments here would be useful, given the size of the file, but I wouldn't be averse to them here.
  • DEBIAN_FRONTEND is now an ENV variable.  Having it lie in the middle of the apt-get install looks a little unclear to me.  For those of you who are unaware, it prevents apt-get from prompting for informational responses (not the [y/N] we're generally familiar with, but the 'this will download 23 packages' type of message).
  • Most surprisingly, apt-get update and apt-get install are not &&'d together in the original file.  That bucks the advice found in the best practices.  In the original file, if we add another package to the install line, 'apt-get update' isn't run due to caching practices.  This means the new package could be based off of a different version than the rest of the app!  It's probably a minor point in this case, but this is the go-to training document used throughout the literature.
  • Ditto on the literature for ADD vs. COPY.
  • Files should be COPY'd over 1 at a time, otherwise changes within a directory could be missed due to caching that's part of image layering.  (See the above link for further details).
  • Though you can't see it here, the original webapp directory copies the requirements.txt file to /tmp.  I put it in a top level directory when building the Docker image.  It's strange to have it in two places (webapp and tmp).
  • I used ENTRYPOINT vs. CMD to start the app.  In this particular case, I believe it's just semantics, but could prove to be a useful differentiation in cases where the ENTRYPOINT may be a bash script rather than an app.  Further discussion on the difference starts here.
  • There was a Procfile for Flask in the webapp directory.  I removed it.  It looks like it's Heroku specific.  I'm not really sure why it was lying around.
  • On my box, my image is actually larger at 358 MB vs. 348 MB.  I'm not completely certain why that's the case, though I do suspect it may be due to the apt-get update vs. install nuance I mentioned above.
So, that's it!  You may have known this all before, may not care, or may violently disagree, but I'm happy to say that I was finally able to read the literature and make a change based on what I've learned so far.

No comments:

Post a Comment