Friday, September 04, 2015

I migrated my blogs to http://vinceyuan.github.io

This site is ABANDONED. Please visit http://vinceyuan.github.io.

Monday, June 08, 2015

Managing Docker images with tags

When you build Docker images many times, you will find it is very necessary to manage images with tags.
If you build an image without a tag, the default tag 'latest' is created automatically.
docker build -t company/myimage .
The complete name is company/myimage:latest

When you need to update a new version of company/myimage, you don't need to create an image company/myimage_v2. You should create an image with the same name and a new tag.
docker build -t company/myimage:v2 .
And set it latest
docker tag -f company/myimage:v2 company/myimage:latest

So your new image has two tags: v2 and latest. When you start a container based on company/myimage, company/myimage:latest will be used.

Saturday, May 30, 2015

Docker Tricks and Tips

Recently I deployed my node.js app, Redis, Postgres, Nginx with Docker and wrote a tutorial. I want to share you with some tricks and tips about Docker.

Choose Debian as the base image.

Debian image is much smaller than other OS images. It is recommended in Docker's official best practices. A problem is the packages in Debian apt are not always latest. I think Debian only likes the very stable versions. You may need to spend some time installing the proper packages.

Choose the same base image for all your own images.

If your images are based on the same base image, it will save you a lot of disk spaces. Your images share the same base image on disk.

Build images for each step.

If your node.js app needs to access Redis, Postgres, you'd better create a Redis client image, a Postgres client image, a node.js image, and your app image. Each image is based on the previous one. If you have 2 node.js app, you don't need to install Redis, Postgres clents, node.js again, just use the node.js image as the base.

Put your important data at the host instead of the container.

If a container is deleted, all files in the container will be deleted too. Use volume for the important data.

Be very careful with the image whose Dockerfile has VOLUME.

If you use this image without providing a folder of the host, the container will create a volume at /var/lib/docker/vfs/dir/. When the container is deleted, that volume will not be deleted automatically. If big data is stored there, it will use up your disk space. You can install this useful tool https://github.com/cpuguy83/docker-volumes on your host to find and remove the dangling volumes.

You can run docker without sudo.

# Add theuser to docker group to run docker as a non-root user
# MUST logout and re-login to let it effective
usermod -aG docker theuser

Delete the unnecessary files in the Dockerfile to save disk space.

For example,
FROM myredisclient
RUN apt-get update \
 && apt-get install -y wget \
 && echo 'deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main' >> /etc/apt/sources.list.d/pgdg.list \
 && wget --no-check-certificate --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \
 && apt-get update \
 && apt-get install -y --force-yes postgresql-client \
 && apt-get clean \
 && apt-get autoremove \
 && rm -rf /var/lib/apt/lists/*



Friday, May 29, 2015

Restoring/Backing up Postgres Database in a Docker Container

In the previous tutorial, I show you how to deploy a web app, Redis, Postgres, Nginx with Docker. This post shows you how to restore and backup Postgres database which is running in a Docker container. You can get all source code at https://github.com/vinceyuan/DockerizingWebAppTutorial.

I use pg_dump to dump a database to a text file. And then zip and upload it to Amazon S3 with a handy command line tool s3cmd. In this tutorial, I can't provide access_key and secret_key of S3. So you may not really run it until you get your own.

Let's install s3cmd on the host. 
apt-get install -y s3cmd
Run s3cmd to configure. You should input your access_key and secret_key of S3. It creates .s3cfg at your home directory.


Restore

In the previous tutorial, after running Postgres in a container, we should restore database from the backup. We need to get the backup first.
cd /mydata && mkdir db_restore && cd db_restore
List backup files in your s3.
s3cmd ls s3://your_db_dumps/
Download a backup file and unzip.
s3cmd get s3://db_dumps/dump2015-05-22T09:15:13+0800.txt.gz
gunzip dump2015-05-22T09\:15\:13+0800.txt.gz

Run a container with Postgres client to restore db from dump.

docker run -d --name myredispgclient --link mypostgres:postgres -v /mydata/db_restore/:/tmp/ -i -t myredispgclient /bin/bashIf it does not show the console of the container, run this to get it
# docker exec -i -t myredispgclient bash $ env | grep POSTGRES_ $ psql -h $POSTGRES_PORT_5432_TCP_ADDR -p $POSTGRES_PORT_5432_TCP_PORT -U postgres > \l # List all databases. Make sure mynodeappdb exists > \q # Quit #MIN (0-59)  HOUR (0-23)  DoM (1-31)  MONTH (1-12)  DoW (0-7)     CMD #Dump pophub db and upload to S3 every Tuesday at 10:00 Hong Kong time      0            2           *           *            2          docker start mydbbackup2s3
$ psql -h $POSTGRES_PORT_5432_TCP_ADDR -p $POSTGRES_PORT_5432_TCP_PORT -U postgres -d mynodeappdb < /tmp/dump2015-05-22T09\:15\:13+0800.txt
$ exit # Exit the console of the container
# docker stop myredispqclient

Backup

Let's build mys3cmd image which has s3cmd 1.0 installed. The latest version is 1.5.2. But 1.5.2 fails to upload big files for me. So I have to use 1.0. This is the Dockerfile. It is based on myredispgclient image.
FROM myredispgclient
RUN apt-get update \
 && wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | apt-key add - \
 && wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list \
 && apt-get update \
 && apt-get install -y --force-yes --no-install-recommends s3cmd=1.0.0-4 \
 && apt-get clean \
 && apt-get autoremove \
 && rm -rf /var/lib/apt/lists/*

Build it.
cd /DockerizingWebAppTutorial/dockerfiles/mys3cmd
docker build -t mys3cmd .

Let's build mydbbackup2s3 image for backing up. I tried many times to write a working Dockerfile for mydbbackup2s3. That's why I create mys3cmd image. With mys3cmd image, it doesn't need to download and install s3cmd when I re-build mydbbackup2s3 again and again. In the Dockerfile of mydbbackup2s3, we copied .pgpass and .s3cfg to /root/ (You need to provide .s3cfg yourself). .pgpass stores the password of the database and we have to chmod 600 for it. dump_db_and_upload.sh is a script I wrote to dump, zip, and upload the backup to s3.
FROM mys3cmd
COPY .pgpass /root/
COPY .s3cfg /root/
COPY dump_db_and_upload.sh /root/
RUN chmod 600 /root/.pgpass
VOLUME /db_dumps
CMD ["/root/dump_db_and_upload.sh"]

The first 'postgres' in .pgpass is the linked container.
postgres:5432:mynodeappdb:postgres:postgres

dump_db_and_upload.sh
#!/bin/bash
# A script to dump db and compress it and then upload the file to S3.
# should change mode like 'chmod 777 dump_db_and_sync.sh'
FILENAME=$(TZ=Asia/Hong_Kong date +"dump%Y-%m-%dT%H:%M:%S+0800.txt.gz")
FULLDIR="/db_dumps/"
FULLPATH="$FULLDIR$FILENAME"
S3PATH="s3://db_dumps/"
echo "Begin to dump mynodeappdb to $FULLPATH"
# We don't use $POSTGRES_PORT_5432_TCP_ADDR for host, but use postgres which is linked
# $POSTGRES_PORT_5432_TCP_ADDR will change, but link name postgres does not change.
# We also use the link name postgres in .pgpass
pg_dump -h postgres -U postgres mynodeappdb | gzip > $FULLPATH
echo "Done"
echo "Begin to upload the dump to $S3PATH"
s3cmd put $FULLPATH $S3PATH
echo "Done"
echo "Delete the local dump"
rm $FULLPATH
echo "Finished dump and upload"

Build the image.
cd /DockerizingWebAppTutorial/dockerfiles/mydbbackup2s3
docker build -t mydbbackup2s3 .

Run the container. We don't add --restart=always here.
docker run -d --name mydbbackup2s3 --link mypostgres:postgres -v /mydata/db_dumps:/db_dumps mydbbackup2s3

It should dump database and backup immediately. The container will quit after backing up is done.

Auto-backup

We should backup the database regularly and automatically. Let's create a cron job by running this command:
crontab -e

Input the following lines, save and quit. It will run mydbbackup2s3 container to backup database every Tuesday.
#MIN (0-59)  HOUR (0-23)  DoM (1-31)  MONTH (1-12)  DoW (0-7)     CMD
#Dump pophub db and upload to S3 every Tuesday at 10:00 Hong Kong time
     0            2           *           *            2          docker start mydbbackup2s3

Here are some tips about Docker.

Deploying a Web App, Redis, Postgres and Nginx with Docker

This tutorial introduces how to deploy a web app, Redis, Postgres and Nginx with Docker on the same server. In this tutorial, the web app is a node.js(express) app. We use Redis as a cache store, Postgres as the database, and Nginx as the reverse proxy server. You can get all source code at https://github.com/vinceyuan/DockerizingWebAppTutorial.


Why Docker

Docker is a virtualization technology. The key feature I like most is it provides resource isolation. The traditional way of building a (low-traffic) website is we install the web app, cache, database, Nginx directly on a server. It's not easy to change the settings or the content a lot, because they are in the same environment. Changing one may impact others. With Docker, we can put each service in a container. It keeps the host server very clean. We can easily create/delete/change/re-create containers.


Install Docker on the host

Docker runs on a 64-bit Linux OS only. If your Linux is 32-bit, you have to re-install the 64-bit version. My original OS was 32-bit CentOS. Now I am using 64-bit Debian 8. The main reason I choose Debian is its distribution size is small and Docker recommends it in Best Practices(it's ridiculous that almost all examples at docker.com use ubuntu). Actually the host's OS can be different to the container's OS. I choose Debian instead of 64-bit CentOS because I don't want to spend any time on the differences. For example, the package management tools on Debian and CentOS are different. One is apt, the other is yum.

Currently, Docker's official installation on Debian 8 does not work. You need to run the following commands as root. theuser is the user of host OS.
sudo su
curl -sSL https://get.docker.com/ | sh
# Add theuser to docker group to run docker as a non-root user
# MUST logout and re-login to let it effective
usermod -aG docker theuser
# Start docker service
systemctl enable docker.service
systemctl start docker.service

Prepare

cd /
git clone https://github.com/vinceyuan/DockerizingWebAppTutorial.git
The folder /DockerizingWebAppTutorial contains all we need. mynodeapp is a very simple node.js (express) app. It just reads a number from Redis, and gets a query result from Postgres. There are several Dockerfiles in the dockerfiles folder. We will use them to build images.
Create folders:
cd / && mkdir mydata && cd myata
mkdir redis_data && mkdir postgres_data && mkdir nginx_data
root@pophubserver:/mydata# mkdir log_mynodeapp && mkdir log_nginx
Let's run the first container.

Redis

We use the official Redis image. Run it directly with this command:
docker run -d -v /mydata/redis_data:/data --name myredis --restart=always redis
-v /mydata/redis_data:/data means we mount a folder /mydata/redis_data of the host as a volume /data in a container. Nginx will save dump.rdb at /mydata/redis_data in the host. If we don't mount a volume, Nginx will save dump.rdb in the container. When this container is deleted, dump.rdb will be deleted too. So we should always mount a volume for the important data e.g. database file, logs.
--name myredis means we name this container myredis
--restart=always means the container will restart after it quits unexpectedly. It also makes the container start automatically after the server reboots.

That command outputs:
$ docker run -d -v /mydata/redis_data:/data --name myredis --restart=always redis
Unable to find image 'redis:latest' locally
latest: Pulling from redis
7a3e804ed6c0: Pull complete 
b96d1548a24e: Pull complete 
5ba9a5b9710f: Pull complete 
37f07aacbfe5: Pull complete 
ec7f3a6b5dc6: Pull complete 
499b313c4d4e: Pull complete 
4416945429c6: Pull complete 
0daf71066555: Pull complete 
1f86439b265d: Pull complete 
9e6288fa06c0: Pull complete 
3c083702089f: Pull complete 
71cc4c7123fc: Pull complete 
91e5e3734476: Pull complete 
8d7fb9bd09ab: Pull complete 
e6b7cf8bf1b1: Pull complete 
96182c1bd121: Pull complete 
4b7672067154: Already exists 
redis:latest: The image you are pulling has been verified. Important: image verification is a tech preview feature and should not be relied on to provide security.
Digest: sha256:01b59520487a9ada4b8e31558c0580930a4e5f2a565a1cb85b66efe7c6ce810d
Status: Downloaded newer image for redis:latest

a96b6d2555e9f9fb1f70fea60f8cf75326cd331ebef9d4b667e322cea899d48c

It downloads redis:latest image from Docker Hub. Let's check if myredis container is running.
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES

a96b6d2555e9        redis:latest        "/entrypoint.sh redi   16 minutes ago      Up 16 minutes       6379/tcp            myredis
We can see myredis is running.

We need to run redis-cli in this container to set a value in Redis.
$ docker exec -i -t myredis bash
root@a96b6d2555e9:/data# redis-cli
127.0.0.1:6379> set number 1
OK
127.0.0.1:6379> save
OK
127.0.0.1:6379> exit
root@a96b6d2555e9:/data# exit


Postgres

We use the official Postgres image too. Just run it directly.
docker run -d --name mypostgres -e POSTGRES_PASSWORD=postgres -v /mydata/postgres_data:/var/lib/postgresql/data --restart=always postgres
-e POSTGRES_PASSWORD=postgres means we set the environment variable POSTGRES_PASSWORD to postgres.
-v /mydata/postgres_data:/var/lib/postgresql/data means we mount /mydata/postgres_data as a volume. This is very important. It's safe to keep database files in the host.
Create mynodeappdb:
$ docker exec -i -t mypostgres bash
root@11602c44f706:/# psql -U postgres
psql (9.4.2)
Type "help" for help.

postgres=# create database mynodeappdb;
CREATE DATABASE
postgres=# \q

root@11602c44f706:/# exit

We can see mypostgres and myredis are running.
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                CREATED              STATUS              PORTS               NAMES
11602c44f706        postgres:latest     "/docker-entrypoint.   About a minute ago   Up About a minute   5432/tcp            mypostgres          

a96b6d2555e9        redis:latest        "/entrypoint.sh redi   32 minutes ago       Up 32 minutes       6379/tcp            myredis


Redis client and Postgres client

The Dockerfile for redis client:
FROM debian:7
RUN apt-get update \
 && apt-get install -y redis-server \
 && apt-get clean \
 && apt-get autoremove \
 && rm -rf /var/lib/apt/lists/*
RUN service redis-server stop
It's based on debian:7. It actually installs both redis server and client. But we only need the client. So it stops redis-server.
Build it:
cd /DockerizingWebAppTutorial/dockerfiles/myredisclient
docker build -t myredisclient .

The Dockerfile for Postgres client:
FROM myredisclient
RUN apt-get update \
 && apt-get install -y wget \
 && echo 'deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main' >> /etc/apt/sources.list.d/pgdg.list \
 && wget --no-check-certificate --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \
 && apt-get update \
 && apt-get install -y --force-yes postgresql-client \
 && apt-get clean \
 && apt-get autoremove \
 && rm -rf /var/lib/apt/lists/*
It's based on myredisclient, because our web app needs to access both redis and postgres. The annoying thing is the default postgresql-client in Debian apt is a very old version (pg_dump will not work, because the version does not match the server's version). This Dockerfile installs the latest version (currently 9.4).

Build it
cd /DockerizingWebAppTutorial/dockerfiles/myredispgclient
docker build -t myredispgclient .

We can see there are 5 images in the host.
$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
myredispgclient     latest              78b18351c561        6 minutes ago       132.5 MB
myredisclient       latest              bb2ac4846244        8 minutes ago       87.7 MB
postgres            latest              1636d90f0662        2 days ago          214 MB
redis               latest              4b7672067154        4 days ago          111 MB

debian              7                   b96d1548a24e        9 days ago          84.97 MB


Node.js

Let's build a Node.js image. In the Dockerfile for mynodejs image, we install node.js, express, forever and then set NODE_ENV production. In this example, I am not using the latest version.
FROM myredispgclient
RUN apt-get update \
 && apt-get install -y --force-yes --no-install-recommends \
      apt-transport-https \
      build-essential \
      curl \
      ca-certificates \
      git \
      lsb-release \
      python-all \
      rlwrap \
 && apt-get clean \
 && apt-get autoremove \
 && rm -rf /var/lib/apt/lists/*
RUN curl https://deb.nodesource.com/node/pool/main/n/nodejs/nodejs_0.10.30-1nodesource1~wheezy1_amd64.deb > node.deb \
 && dpkg -i node.deb \
 && rm node.deb
RUN npm install -g express@3.4.7 \
 && npm install -g forever \
 && npm cache clear
ENV NODE_ENV production

Build it.
cd /DockerizingWebAppTutorial/dockerfiles/mynodejs
docker build -t mynodejs .

mynodeapp

Then we build an image for mynodeapp.  In Dockerfile, we run npm install, and use forever to run the node.js app. We don't use forever start, because we don't run it as a daemon (otherwise, the container will quit immediately).
FROM mynodejs
COPY . /src
RUN cd /src && npm install
VOLUME /log
CMD ["forever", "-l", "/log/forever.log", "-o", "/log/out.log", "-e", "/log/err.log", "/src/app.js"]

Build it
cd /DockerizingWebAppTutorial/mynodeapp
docker build -t mynodeapp .

Actually we can merge these 4 Dockerfiles into one to create one image. I build 4 images for re-using images. For example, if we want to build an image for another node.js app, we can write a Dockerfile based on mynodejs image. If we want to replace node.js with Go, we can write a Dockerfile based on myredispgclient.

The core code of mynodeapp:
var conString;
if ('development' == app.get('env')) {
  app.use(express.errorHandler());
  conString = "postgres://vince:@localhost/mynodeappdb"; // Use your db, user and password
} else {
conString = "postgres://postgres:postgres@localhost/mynodeappdb"; // Use your db, user and password
}
var pgClient = new pg.Client(conString);
pgClient.connect(function(err) {
  if(err) return console.error('Could not connect to postgres', err);
  console.log('Connected to postgres');
});
var redisClient = redis.createClient(6379, '127.0.0.1', {})
app.get('/', function(req, res) {
pgClient.query('SELECT NOW() AS "theTime"', function(err1, result) {
    redisClient.get("number", function(err2, reply) {
    res.render('index', { pgTime: result.rows[0].theTime, number: reply });
});
  }); // Errors Ignored in this example. You should check errors in a real project. 
});
http.createServer(app).listen(app.get('port'), function() {
  console.log('Express server listening on port ' + app.get('port'));
});

There is a problem. We are using localhost or 127.0.0.1 for redis and postgres' host address. It works only when they are installed on the same server. But now they are in different containers. Even if we use --link, we still cannot access them via localhost and 127.0.0.1. We can use the following code to get correct host and port.
var redis_host = process.env.REDIS_PORT_6379_TCP_ADDR || '127.0.0.1';
var redis_port = process.env.REDIS_PORT_6379_TCP_PORT || 6379;
var db_host = process.env.POSTGRES_PORT_5432_TCP_ADDR || 'localhost';

REDIS_PORT_6379_TCP_ADDR is created by Docker if you run a container with --link myredis:redis. You can get Postgres user account, password, port from the environment variables too.

Run a container based on mynodeapp image. We also name the container mynodeapp. You can rename it whatever you like.
docker run -d --name mynodeapp --link mypostgres:postgres --link myredis:redis -v /mydata/log_mynodeapp:/log -p 3000:3000 --restart=always mynodeapp

By default, each container is isolated. --link allows a container access another container. --link mypostgres:postgres means we can access mypostgres container with the alias 'postgres' just like localhost for 127.0.0.1.
-v /mydata/log_mynodeapp:/log mounts a volume. We want to keep logs in the host.
-p 3000:3000 maps host's port 3000 to container's port 3000. It is not mandatory. But with it, we can use curl localhost:3000 in the host to check if mynodeapp container runs correctly.
$ curl localhost:3000
<!DOCTYPE html><html><head><title></title><link rel="stylesheet" href="/stylesheets/style.css"></head><body><H1>mynodeapp</H1><p>Number from Redis: 1</p><p>Time from Postgres: Fri May 29 2015 09:47:54 GMT+0000 (UTC)</p></body></html>

The web app runs correctly in the container. 

Nginx

Now we install Nginx. In the Dockerfile, we make directory /mynodeapp/public. A folder in the host will be mounted here.
FROM nginx
# Create folder for static files
RUN mkdir /mynodeapp && mkdir /mynodeapp/public
# copy sslcert files to /etc/nginx/ for https
#COPY mydomain.* /etc/nginx/
# copy conf
COPY nginx-docker.conf /etc/nginx/nginx.conf

In nginx-docker.conf, we use mynodeapp for the server address, because it is linked.
    upstream mynodeapp_upstream {
        server mynodeapp:3000;
        keepalive 64;
    }

Build the image and run the container.
cd /DockerizingWebAppTutorial/dockerfiles/mynginx
docker build -t mynginx .

Run mynginx container.
docker run -d --name mynginx --link mynodeapp:mynodeapp -v /mydata/nginx_data:/var/cache/nginx -v /mydata/log_nginx:/var/log/nginx -v /DockerizingWebAppTutorial/mynodeapp/public:/mynodeapp/public -p 80:80 -p 443:443 --restart=always mynginx
--link mynodeapp:mynodeapp means we link mynodeapp container to mynginx container. We don't link myredis and mypostgres because mynginx does not access them directly.
We also mount 2 folders for logging. 
-p 443:443 is for https. However, this example does not provide ssl certificate files.

Run curl localhost and curl localhost/stylesheets/style.css to check if mynginx runs correctly.
# curl localhost
<!DOCTYPE html><html><head><title></title><link rel="stylesheet" href="/stylesheets/style.css"></head><body><H1>mynodeapp</H1><p>Number from Redis: 1</p><p>Time from Postgres: Fri May 29 2015 10:12:35 GMT+0000 (UTC)</p></body></html>root@pophubserver:/DockerizingWebAppTutorial/dockerfiles/mynginx# 
root@pophubserver:/DockerizingWebAppTutorial/dockerfiles/mynginx# curl localhost/stylesheets/style.css
body {
  padding: 50px;
  font: 14px "Lucida Grande", Helvetica, Arial, sans-serif;
}

a {
  color: #00B7FF;

}

Now we finished deploying a web app, Redis, Postgres and Nginx with Docker. It took me a lot of time to really deploy my real app with Docker. Luckily I tested in a VirtualBox VM. I can delete/create images/containers back and forth easily with Docker.

An important part is missing. That's restoring and backing up database. I will show you in another tutorial. Here are some tips about Docker.


Friday, May 08, 2015

High memory usage of Emojis on iOS

Emoji is very popular. I am making a keyboard which contains many Emojis. However, I found the memory usage of Emoji is too high. When the keyboard's memory usage is high enough, iOS will kill the keyboard immediately.  (Looks like the limit is 40mb on iOS 8.3)

The emojis are just unicode characters. Apple renders them with lovely icons, which causes the high memory usage. It's OK, because we all know icons use much memory. But the problem is after you have destroyed the view which uses Emoji, the memory of Emoji will not be released. Looks like iOS keeps the Emoji cache in the app or app extension. It's acceptable in an app, but not acceptable in an app extension.

What I have to do is to delete many emojis from my keyboard.