Docker, A Misunderstood Tool

The last few years of development, at least in the PHP world, has been dominated by the idea of quickly and easily duplicating development environments. It is a great problem that a lot of developers, especially PHP, have to face when it comes to building tools to work on specific systems. While it is better than the days of not being sure if you were running PHP 5.1, 5.2, or 5.3 (though that is still a big issue for some people), our web applications are becoming more than a few collective PHP scripts being thrown up on a server. Not only do we need to contend with issues of developer stacks, but we need to make sure that the environments match closely in development as they do production.

The go-to tool for many people is Vagrant, which is a wrapper around the Virtualbox virtualization software from Oracle. Vagrant makes it very easy to create and destroy virtual machines and configure them with a text file instead of having to fiddle around with a GUI interface. Vagrant also coupled with Puppet, a configuration management tool, so that you could have a virtual machine start up and automatically provision itself. You could use the same puppet manifests on your virtual machines as you did in production, or start to use Puppet in production based on your development manifests. Non-techies could easily download a few files from a repository, type vagrant up, and have a full stack ready to go with all the correct software and tools needed. I've moved more than one group to using Vagrant with great success.

The thing is, virtualization is not a new tool. I know I've played around with it since around 2000 when VMWare released their Workstation software. Vagrant made everything simpler for the average developer and removed the barrier of needing to know how to set up a server. You have even got sites like Puphpet and Phansible that will build a Vagrant set for you.

Virtualization has a huge downside though - resources. Virtualization takes an entire computer and attempts to emulate it. Virtualization takes up a lot of resources, even with modern technologies from Intel and AMD and better software from Oracle, VMWare, or from the devs working on KVM/Qemu. There is also a huge performance hit you take since many parts of the computer are being virtualized, like hard drives and networking. This a trade-off for having an easily reproducible stack.

Docker, The Old New Kid on the Block

Last year a new software burst onto the scene, named Docker. Docker, much like Vagrant, is a wrapper around another technology that was a bit harder to use - LXC (LinuX Containers). What LXC, and other that existed before it like BSD Jailes or Solaris Containers, do is replicate the operating system and not the underlying hardware. This means that while the container looks like a full computer, it isn't. It's sharing it's host resources directly while providing a "virtual" Operating System. The OS is generally some flavor of Linux since that is what LXC was built to provide.

There is a big difference though. Where Vagrant/Virtualbox emulate an entire PC, LXC simply runs the OS inside of another OS. And really it's not even doing that - it's running processes inside of a container, so it's not even a full operating system being run. You aren't booting Ubuntu inside of LXC (though you can), you are simply providing an OS environment to a process. This makes it much, much, much more lightweight than virtualization.

Like virtualization, containerization is not anything really new. Solaris had it back in 2004 with Solaris 10, and FreeBSD has had jails for I want to say longer. LXC is relatively new by itself, and Docker made LXC much easier to work with by providing a great wrapper aroud it.

A Quick Overview

There are plenty of blog posts out there about getting started with Docker, so I'll skip that for the most part and get to the powerful part. One of Docker's features is providing an easy-to-use and easy-to-find list of base images available at https://registry.hub.docker.com/. These images are generally everything needed to run something like, say, NodeJS or nginx or MySQL. Want to fire up a temporary MySQL server without installing MySQL?

docker run -d -p 3306:3306 --name mysql -e MYSQL_ROOT_PASSWORD=password mysql

Docker will download the MySQL image, build it quickly, and start the MySQL server. It will expose port 3306 on the local machine so your apps can use localhost to connect to it. You can then start and stop it as needed, and the data will persist until you destroy the container. The best part is that resource hit isn't much more than running MySQL normally, and it won't install anything on your local machine.

You can then link things together. Let's say you have a PHP app inside of a Docker container that boots nginx and php-fpm and want it to talk to the MySQL container we just started:

docker run -d -p 80:80 --name webapp -link mysql:mysql php:5.5

Now your PHP container can see the MySQL container by using 'mysql' as a hostname.

You don't even have to use the existing containers. You can create Dockerfiles (think Puppet manifest files + Vagrant files) that will set up an entire system for you:

FROM phusion/baseimage:0.9.10

ENV HOME /root

RUN /etc/my_init.d/00_regen_ssh_host_keys.sh

CMD ["/sbin/my_init"]

# Nginx-PHP Installation
RUN apt-get update
RUN apt-get install -y vim git curl wget build-essential python-software-properties\
               php5-cli php5-fpm php5-mysql php5-pgsql php5-sqlite php5-curl\
               php5-gd php5-mcrypt php5-intl php5-imap php5-tidy mysql-client

RUN sed -i "s/;date.timezone =.*/date.timezone = UTC/" /etc/php5/fpm/php.ini
RUN sed -i "s/;date.timezone =.*/date.timezone = UTC/" /etc/php5/cli/php.ini

RUN sed -i "s/upload_max_filesize =.*/upload_max_filesize = 250M/" /etc/php5/fpm/php.ini
RUN sed -i "s/post_max_size =.*/post_max_size = 250M/" /etc/php5/fpm/php.ini

RUN apt-get install -y nginx

RUN echo "daemon off;" >> /etc/nginx/nginx.conf
RUN sed -i -e "s/;daemonize\s*=\s*yes/daemonize = no/g" /etc/php5/fpm/php-fpm.conf
RUN sed -i "s/;cgi.fix_pathinfo=1/cgi.fix_pathinfo=0/" /etc/php5/fpm/php.ini

RUN mkdir           /var/www
ADD build/default   /etc/nginx/sites-available/default
RUN mkdir           /etc/service/nginx
ADD build/nginx.sh  /etc/service/nginx/run
RUN chmod +x        /etc/service/nginx/run
RUN mkdir           /etc/service/phpfpm
ADD build/phpfpm.sh /etc/service/phpfpm/run
RUN chmod +x        /etc/service/phpfpm/run

EXPOSE 80 22
# End Nginx-PHP

VOLUME /var/www
VOLUME /etc/nginx
VOLUME /etc/php/
VOLUME /var/log

RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

I'm glossing over a ton of stuff, and I really recommend looking through the Docker documentation. It is incredibly well written and easy to follow.

Why Should You Use Docker?

Docker is a great tool for containing processes. I'm currently using it for hosting of many of my projects because I know that every time I build the container, it will be 100% the same as the last time. If you are familiar with basic system administration (and if you aren't, tell your local conference to let me give my Sysadmin talk :)) you can see that the Dockerfile is basically a list of shell commands to run, with some extra sugar for ports and such.

This isn't much different than Puppet and Vagrant. At my current job we use Vagrant so that every person has the correct version of PHP and the same settings across the board so we can avoid the "It works on my machine" debugging. In this use case, Docker and Vagrant/Puppet are almost identical use cases.

Docker shines in the container aspect. In my above Dockerfile, there is no MySQL installed on that system. I can link to an external container, or simply point it to a standalone server, and it acts just like any other PHP process. What I can do though is build my app using Docker, and push it up to the server and build the image and it will work. I can't do that with Puppet/Vagrant because I don't run Virtualbox on my webserver. Yes, I can use Puppet to enforce configuration, but it's not exactly the same as production.

If I want to separate out my processes I can do that as well. I am no longer running an entire virtual machine to run a single website, I can run MySQL, nginx, and PHP in different containers with much less overhead. Want to test a different version of PHP? Boot up a different Docker container.

In my specific use case, my build system for my web apps will build a Docker container using the specified version of PHP, or without PHP. If I need to move it, a few commands to quickly back up, copy, and restore the container are all that is needed. The base server only has Docker and ssh installed, everything else is contained.

Docker is best used in situations where you need to package up something to distribute in a controlled way, not build a virtual machine to contain your application.

Why You Shouldn't Use Docker

Docker is not a replacement for virtualization, and they make that very clear. That doesn't stop people from equating it to virtualization though. If you need to virtualize an entire computer, use virtualization. If your shop runs VMWare ESXi, by all means use VMWare Workstation to build VMs and push them to ESXi. If you use HyperV and your devs like HyperV, use HyperV.

Don't treat Docker like a virtualization system. It's not. It's a containment system for keeping processes out of each other's face. MySQL doesn't need to worry about the PHP configuration, and nginx doesn't need to worry about MySQL.

Docker is also a Linux-based tool. Yes, there are things like boot2docker which will boot a virtual machine that will run Docker inside of it, faking a Docker environment on OSX, and to an extend Windows, but the added complexity of running Docker inside of a VM introduces its own special set of problems that you won't run into in production. Mounting volumes with boot2docker is an incredible pain. Docker really shines in a pure Linux environment.

Many people also treat the containers as entire operating systems and want things like FTP or SSH inside of them. That's not what the spirit of Docker is about, though you can easily start an SSH shell inside of a container. Docker is more about separating out the concerns of the parts of your application, or providing an environment for your application to live and run in.

If you don't plan on running Docker in production, I have a hard time recommending it. It's not that Docker isn't awesome, as I'm building great things with it, but it's a tool that solves a deployment issue more than a development issue.

Docker is Awesome, But Not a Silver Bullet

I love Docker. I build a bunch of tools around it and it's made my sysadmin life much easier. I have it installed on all my Linux boxes. It's not going to solve any development problems you have though, and if you aren't running Linux I would honestly skip it entirely. boot2docker is a decent stopgap for things until you need to really build complex systems.

If you are interested in using Docker for development, check out the documentation. Spend some time reading up on it and how to use it before saying "I have to use Docker for everything, it's the new hotness!" Chances are it won't solve your development problems, but it can be a great tool in your development toolbox.


Comments