One of the challenges surrounding developing applications on Mesos is how to handle stateful applications. Mesos describes itself as a project that abstracts out CPU, memory, storage and other resources to enable a fault-tolerant and elastic distributed systems. This is certainly the goal, but the project is also still evolving. It works well with the Docker ecosystem.

While Mesos does claim an abstraction from storage, cpu, etc. the reality is that it is currently not that clean. Mesos and Docker have their own approaches to networking, storage is a bit more complicated. The biggest challenge with storage is how to enable a container to have persistent storage.

Some of the storage solutions include:

  • Docker Data Volumes
  • Flocker
  • Local file system - This requires reserving specific machines for specific applications
  • Distributed file systems
  • Docker Volume Drivers (redesigned and updated in Docker 1.9)

Docker 1.9 also brought updates to networking. Prior to the updates in 1.9, there were a number of approaches to address multi-host docker environments including ambassador containers, pipework, and Weave. Many of these solutions require more complex application design or that specific infrastructure items be installed.

Within Mesos, another approach would be to rely on application ids and use Mesos-DNS for service discovery. Mesos-DNS is another Mesosphere product which allows service discovery in Mesos clusters. It is one more piece of software that is installed on the cluster, and DNS resolution is then configured within the agent nodes to use Mesos-DNS.

For the development of this application, the following decisions were made:

  • A simple crud application - The default PostgreSQL docker container for the backend, using the GlusterFS for storage. - A simple NodeJS app for the front end.
  • Service discovery using Mesos-DNS
  • Set up a GlusterFS distributed file systems
  • Leverage Docker Volume Drivers to mount the GlusterFS volume in the container

The basis for the Mesos deployment is the this ARM script. The ARM script, based on chosen inputs creates a cluster resembling:

Mesos Architecture.

Once you have the cluster configured and running, the next step is to configure and deploy each component of the application via Marathon. Both Mesos and Marathon have UIs for monitoring and tweaking bits of the system, that said, it is recommeneded to deploy applications to Marathon using the REST API [1].

To use the Marathon REST API, a json config file needs to be set up with the specifics of the container. A general overview can be found here.

For the PostgreSQL container, we need to determine:

  • A path for where to store the information in the GlusterFS
  • Username and pasword for the PostgreSQL database
  • An ID for the container. The ID will be used to figure out where the database is located in the Mesos system.

For the data path, recall in the above ARM script that GlusterFS storage nodes were deployed and a volume within the GlusterFS defined. The default is “gfsvol”. On each of the agent machines, the Docker volume plugin for GlusterFS has been deployed and is running as a service. On the container side, the container (and docker instance) needs to be configured to access the GlusterFS filesystem. Typically, on the Docker CLI one would use the volume-driver and volume command line parameters. If one were to use the Docker CLI to launch the PostgreSQL container with those settings, it would resemble:

$ docker run -e PGDATA=/data/postgres --volume-driver glusterfs --volume gfsvol:/data postgres

The above mounts the GlusterFS volume at /data and PostgreSQL will use /data/postgres for storing it’s data.

To configure the same for Marathon, you would create the a json configuration file. Adopting the above Docker CLI and adding a user and password for the database, would result in a Marathon config file resembling:

{
  "id": "postgresnode",
  "cpus": 1,
  "mem": 1024.0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "postgres",
      "network": "BRIDGE",
      "parameters": [
        { "key": "hostname", "value": "postgresnode" },
        { "key": "volume-driver", "value": "glusterfs" },
        { "key": "volume", "value": "gfsvol:/gfs" }
      ],
      "portMappings": [
        { "containerPort": 5432, "hostPort": 5432, "protocol": "tcp" }
      ]
    }
  },
  "env": {
    "PGDATA": "/gfs/node/postgresnode/postgresql/data",
    "POSTGRES_USER": "pgadmin",
    "POSTGRES_PASSWORD": "postgresnodefoofoo"
  }
}

In this case, the GlusterFS is mounted within the container at /gfs and PostgreSQL is instructed to use a directory within the mount. Visually, it’s like this:

Storage node.

If you look here, the full application source and config files are within the repository. The NodeJS application and the Dockerfile for creating a container of the NodeJS app is also there. The actual container has been pushed up to Docker Hub as jmspring/nt3.

The Marathon configuration file for the NodeJS application has an interesting tid bit disguised as the DBHOST. The configuration is:

{
  "id": "crudnode",
  "cpus": 0.5,
  "mem": 512.0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "jmspring/nt3",
      "network": "BRIDGE",
      "parameters": [
        { "key": "hostname", "value": "crudnode" }
      ],
      "portMappings": [
        { "containerPort": 8080, "hostPort": 8080, "protocol": "tcp" }
      ]
    }
  },
  "env": {
    "DBUSER": "pgadmin",
    "DBPASSWORD": "postgresnodefoofoo",
    "DBHOST": "postgresnode.marathon.mesos",
    "DBPORT": "5432",
    "DBNAME": "kv"
  }
}

Note that the DBHOST has the value postgresnode.marathon.mesos. If you examine the PostgreSQL Marathon configuration, notice that the “id” for the container is postgresnode. When the service goes to resolve the DBHOST value, the marathon.mesos will trigger the lookup within Mesos-DNS and route requests to the proper host/container. It should be noted that if one runs mutiple instances of a container, each of those has the same name and Mesos-DNS will return the address of each in a random order, see here.

Launching The Application

Once the Marathon config files are created and everything in place, it is time to launch the application. To use the REST API, one needs to be able to reach one of the Master nodes. In the Azure Resource Manager Template that set up the Mesos cluster, there is a jumpbox which one can log into and reach machines in the cluster.

Logging into the jumpbox and launching each of the above, resembles something like this:

$ curl -X POST http://10.0.0.5:8080/v2/apps -d @postgresnode.json -H "Content-type: application/json"
{
    "id": "/postgresnode",
    "cmd": null,
    "args": null,
    "user": null,
    "env": {
        "PGDATA": "/gfs/node/postgresnode/postgresql/data",
        "POSTGRES_USER": "pgadmin",
        "POSTGRES_PASSWORD": "postgresnodefoofoo"
    },
    "instances": 1,
    "cpus": 1.0,
    "mem": 1024.0,
    "disk": 0.0,
    "executor": "",
    "constraints": [],
    "uris": [],
    "storeUrls": [],
    "ports": [0],
    "requirePorts": false,
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": {
        "type": "DOCKER",
        "volumes": [],
        "docker": {
            "image": "postgres",
            "network": "BRIDGE",
            "portMappings": [{
                "containerPort": 5432,
                "hostPort": 5432,
                "servicePort": 0,
                "protocol": "tcp"
            }],
            "privileged": false,
            "parameters": [{
                "key": "hostname",
                "value": "postgresnode"
            }, {
                "key": "volume-driver",
                "value": "glusterfs"
            }, {
                "key": "volume",
                "value": "gfsvol:/gfs"
            }],
            "forcePullImage": false
        }
    },
    "healthChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
        "minimumHealthCapacity": 1.0,
        "maximumOverCapacity": 1.0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "version": "2015-12-02T23:00:03.778Z",
    "tasksStaged": 0,
    "tasksRunning": 0,
    "tasksHealthy": 0,
    "tasksUnhealthy": 0,
    "deployments": [{
        "id": "daae17c6-276d-4c06-95b8-83aefeba3c86"
    }],
    "tasks": []
}

$ curl -X POST http://10.0.0.5:8080/v2/apps -d @crudnode.json -H "Content-type" application/json"
{
    "id": "/crudnode",
    "cmd": null,
    "args": null,
    "user": null,
    "env": {
        "DBPASSWORD": "postgresnodefoofoo",
        "DBHOST": "postgresnode.marathon.mesos",
        "DBUSER": "pgadmin",
        "DBNAME": "kv",
        "DBPORT": "5432"
    },
    "instances": 1,
    "cpus": 0.5,
    "mem": 512.0,
    "disk": 0.0,
    "executor": "",
    "constraints": [],
    "uris": [],
    "storeUrls": [],
    "ports": [0],
    "requirePorts": false,
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": {
        "type": "DOCKER",
        "volumes": [],
        "docker": {
            "image": "jmspring/nt3",
            "network": "BRIDGE",
            "portMappings": [{
                "containerPort": 8080,
                "hostPort": 8080,
                "servicePort": 0,
                "protocol": "tcp"
            }],
            "privileged": false,
            "parameters": [{
                "key": "hostname",
                "value": "crudnode"
            }],
            "forcePullImage": false
        }
    },
    "healthChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
        "minimumHealthCapacity": 1.0,
        "maximumOverCapacity": 1.0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "version": "2015-12-02T23:02:15.364Z",
    "tasksStaged": 0,
    "tasksRunning": 0,
    "tasksHealthy": 0,
    "tasksUnhealthy": 0,
    "deployments": [{
        "id": "0de50da4-62eb-4b6d-ae5e-6266eb0236a2"
    }],
    "tasks": []
}

If there is an issue with the deployment, the resulting json would contain the error.

Now, if one were to open the Marathon UI, http://<master node ip>:5050/, one would see the status of the the two containers just launched:

Marathon application status.

Further, if one were to open the Mesos UI, http://<master node ip>:8080/, one can not only see the status of the apps launched, but also what agent each is running on:

Mesos application status.

Note that in the screenshot, crudnode is running on c1agent5 and postgresnode is running on c1agent4.

Exploring Service Discovery and DNS

Recall previously the mention of Mesos-DNS and how a container can be looked up by it’s id appended with marathon.mesos. Note, this needs to be done on one of the nodes configured for Mesos-DNS, so an agent or master. Let’s take a quick look:

admin@c1agent1:~$ dig postgresnode.marathon.mesos

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> postgresnode.marathon.mesos
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63396
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;postgresnode.marathon.mesos.       IN      A

;; ANSWER SECTION:
postgresnode.marathon.mesos. 1      IN      A       10.0.0.53

;; Query time: 69 msec
;; SERVER: 10.0.0.5#53(10.0.0.5)
;; WHEN: Wed Dec 02 23:15:18 UTC 2015
;; MSG SIZE  rcvd: 61

Notice that the IP address for the postgresnode is 10.0.0.53. In the Mesos status above, postgresnode was running on c1agent4. Pinging the agent directly:

admin@c1agent1:~$ ping c1agent4
PING c1agent4.45rs3zsobv3e5pp05qusvjqjxg.dx.internal.cloudapp.net (10.0.0.53) 56(84) bytes of data.
C64 bytes from 10.0.0.53: icmp_seq=1 ttl=64 time=4.20 ms

Both have the same IP address. The PostgreSQL Marathon configuration also specified a particular port be exposed on the agent, 5432.

Using telnet, either directly to the host or to the container name and port 5432 will result in a connection.

admin@c1agent1:~$ telnet c1agent4 5432
Trying 10.0.0.53...
Connected to c1agent4.45rs3zsobv3e5pp05qusvjqjxg.dx.internal.cloudapp.net.
Escape character is '^]'.
^]

telnet>  Connection closed.

admin@c1agent1:~$ telnet postgresnode.marathon.mesos 5432
Trying 10.0.0.53...
Connected to postgresnode.marathon.mesos.
Escape character is '^]'.
^]

telnet> Connection closed.

Playing With The Application

The NodeJS application exposes a very simple API for storing key / value pairs. The API is:

  • GET /api/kv — returns all stored keys and values
  • GET /api/kv/<key> — returns the value of a specific key
  • POST /api/kv/<key>/<value> — store (or update) a value for the specified key
  • DELETE /api/kv/<key> — delete a particular key

Creating a couple of values:

admin@c1agent1~$ for f in {1..5};
    do curl -X POST http://crudnode.marathon.mesos:8080/api/kv/foo$f/bar$f;
    done
{"insert":"succeeded"}
{"insert":"succeeded"}
{"insert":"succeeded"}
{"insert":"succeeded"}
{"insert":"succeeded"}

Retrieve one of the values:

admin@c1agent1:~$ curl http://crudnode.marathon.mesos:8080/api/kv/foo1
{"pkey":"foo1","pval":"bar1"}

Now, let’s restart the PostgreSQL container onto another agent.

Mesos application status update.

Notice that postgresnode is now running on c1agent1.

Let’s retrieve the same key:

admin@c1agent1:~$ curl http://crudnode.marathon.mesos:8080/api/kv/foo1
{"pkey":"foo1","pval":"bar1"}

The values are identical.

Take A Look At A Storage Node

Exploring the setup script for the storage nodes, you might note that two disks are allocated / set up in a RAID0 configuration and exposed at /datadrive. Additionally, the GlusterFS file system is just under that path as brick. The PostgreSQL container specified PGDATA to be node/postgresnode/postgresql/data.

Going to the storage node:

root@c1storage1:~# cd /datadrive/brick
root@c1storage1:/datadrive/brick# ls
node
root@c1storage1:/datadrive/brick# cd node/postgresnode/postgresql/data
root@c1storage1:/datadrive/brick/node/postgresnode/postgresql/data# ls -l
total 144
drwx------ 7 999 999  4096 Dec  1 19:00 base
drwx------ 2 999 999  4096 Dec  2 23:37 global
drwx------ 2 999 999  4096 Dec  1 18:57 pg_clog
drwx------ 2 999 999  4096 Dec  1 18:57 pg_dynshmem
-rw------- 2 999 999  4496 Dec  1 18:59 pg_hba.conf
-rw------- 2 999 999  1636 Dec  1 18:57 pg_ident.conf
drwx------ 4 999 999  4096 Dec  1 18:57 pg_logical
drwx------ 4 999 999  4096 Dec  1 18:57 pg_multixact
drwx------ 2 999 999  4096 Dec  2 23:37 pg_notify
drwx------ 2 999 999  4096 Dec  1 18:57 pg_replslot
drwx------ 2 999 999  4096 Dec  1 18:57 pg_serial
drwx------ 2 999 999  4096 Dec  1 18:57 pg_snapshots
drwx------ 2 999 999  4096 Dec  1 18:59 pg_stat
drwx------ 2 999 999  4096 Dec  2 23:48 pg_stat_tmp
drwx------ 2 999 999  4096 Dec  1 18:57 pg_subtrans
drwx------ 2 999 999  4096 Dec  1 18:57 pg_tblspc
drwx------ 2 999 999  4096 Dec  1 18:57 pg_twophase
-rw------- 2 999 999     4 Dec  1 18:57 PG_VERSION
drwx------ 3 999 999  4096 Dec  1 18:57 pg_xlog
-rw------- 2 999 999    88 Dec  1 18:57 postgresql.auto.conf
-rw------- 2 999 999 21288 Dec  1 18:59 postgresql.conf
-rw------- 2 999 999    37 Dec  2 23:37 postmaster.opts
-rw------- 2 999 999    99 Dec  2 23:37 postmaster.pid
root@c1storage1:/datadrive/brick/node/postgresnode/postgresql/data#

PostgreSQL data in all it’s glory.

Some Caveats About The Process

Persistent storage on Mesos is still evolving. There is an issue to track the adding of persistent storage. However, there isn’t yet an ETA as to when it will land. Additionally, given Mesosphere and Docker are separate companies with differing agendas and competing products, it is unclear what the best method for support is. Sticking with a clustered/distributed file system and using the Docker Volume Drivers is probably the best current approach.

Mentioned previously, if you launch multiple instances of a container in Marathon all instances will end up with the same name. Doing a Mesos-DNS lookup will get you the list of all instances. However, if you are wanting the container to be stateful, it is necessary to derive a path to the storage system unique for that container. Two PostgreSQL contianers sharing the same data directory would be a bad thing. One will either need to just run a single instance or do some form of discovery to figure out which resource to use. This will be a topic for a future blog post.

Networking in Docker and Mesos is a bit of a mess. Docker has it’s own ideas about networking that were introduced in Docker 1.9. Looking at the docs for multi-host networking, it’s unclear how this fits nicely within the Mesos infrastructure. The application as well as the cluster setup in this post are pretty basic / straight-forward. More complicated networking scenarios will also be explored more in the future.

The deployment template and scripts used in this demo relied on the latest versions of Docker, Marathon, Mesos, etc. An earlier Mesos cluster was running Docker 1.8 and within the same machine / docker host, pinging containers by the hostname specified for the Docker container worked. That same functionality went away when the cluster was rebuilt and Docker 1.9 was installed. The key take away, things are changing quickly and functionality that works in one release may no longer work in the next.

Resources Used In This Post

The primary resources used in this post are:

  • An Azure Resource Manager template that deploys and configures the Mesos cluster with GlusterFS storage nodes, here.
  • The NodeJS code and Dockerfile as well as the Marathon configuration files, here.
[1]Currently, deploying a docker container via the Marathon UI and not specifying a command to run is broken. See Issue 2374.