Migrating a CouchDB database with Joyent & Stud

tl;dr

Step by step how to install couchdb in ubuntu. But really, you should use iriscouch for your production couchdb needs. If you need help don’t forget to go to #couchdb on irc.freenode.org, these guys are incredibly helpful

If you love node.js don’t forget to give nodejitsu a try too!

Intro

This week i had to migrate my first production database to a new environment. this documents the process in the hope that others find it useful

In my case i was updating a old couchdb mostly for two reasons:

This particular version of couchdb had a bug in handling ssl
Couchdb versions prior to 1.2.0 didn’t automatically resume replication after restart or auto compaction

In this tutorial you will find information on how to upgrade your couchdb, keep production running and safely “switch needles” after your new environment is tested and in production

Node.js

In this tutorial i’m going to use a lot of node.js tools. if you don’t have it installed you can do:

mkdir /opt/install  
cd /opt/install  
wget http://nodejs.org/dist/node-latest.tar.gz  
tar xvf node-latest.tar.gz  
cd node-v*/  
./configure 
make  
make install  
#
# Install some cool tools i use all the time
# and might be referenced in this article
#
npm install -g jsontool nave ghcopy nd futon cdir

Joyent

I’m a big fan of joyent so decided to use them in this tutorial. however i decided not to use smartos since at the time of this writting the support for openssl 1.0.1 does not exist

To use joyent you first need to download and install the smartdc client from npm:

npm install -g smartdc

sdc-setup

This should have installed smartdc and configured it with your help. however if you need some more pointers please refer to the official smartdc documentation

If you are also intending to create a hot standby replica of your production system you will want to follow these steps but place them in different data centers. you can see the list of available data centers by doing:

sdc-listdatacenters \  
  -u https://us-east-1.api.joyentcloud.com \
  -a username \
  -k keyname

This assumes your username is username and you wish to authenticate using the key information you store in joyent at set up time as keyname

Here is what the response to this request currently looks like:

{
  "us-east-1": "https://us-east-1.api.joyentcloud.com",
  "us-west-1": "https://us-west-1.api.joyentcloud.com",
  "us-sw-1": "https://us-sw-1.api.joyentcloud.com",
  "eu-ams-1": "https://eu-ams-1.api.joyentcloud.com"
}

In this tutorial i’m going to use the https://us-east-1.api.joyentcloud.com data center for the production couchdb and https://eu-ams-1.api.joyentcloud.com for the hot standby replica

We now need to select the operative system we are going to install as well as the size of our virtual machine. in joyent they call the available bundled virtual machine images dataset and the virtual machine sizes as packages. if you are curious about what other do, you can check the documentation of pkgcloud for a unified vocabulary

sdc-listdatasets \  
  -u https://us-east-1.api.joyentcloud.com \
  -a username \
  -k keyname \
  | json -a urn \
  | grep ubuntu

We are going to use the latest ubuntu, a.k.a. sdc:jpc:ubuntu-12.04:2.3.1

sdc:sdc:ubuntu-10.04:1.0.1
sdc:jpc:ubuntu-12.04-enstratus-public:2.0.2
sdc:jpc:ubuntu-12.04:2.1.2
sdc:admin:ubuntu-10.04-enstratus-public:1.0.1
sdc:jpc:ubuntu-12.04:2.2.1
sdc:jpc:ubuntu-12.04:2.3.1

Now to select the size of our virtual machine:

sdc-listpackages \  
  -u https://us-east-1.api.joyentcloud.com \
  -a username \
  -k keyname \
  | json -a name

Here is the list as of today. to understand this fully check more details at the joyent website

extra small 512 mb
small 1gb
medium 2gb
medium 4gb
large 8gb
large 16gb
xxl 48gb
xl 32gb

Depending on how big your databases are you should select a different image. Unfortunately it seems like in joyent disk space is coupled with memory and number of vcpus, which is not great for couchdb. Feel free to reach out to them as ask them why (or for a custom build with more disk space)

In this tutorial i’m picking the medium 2gb for the hot standby replica, an a large 8gb for the live system

Now you can create your live couchdb:

sdc-createmachine \  
  -u https://us-east-1.api.joyentcloud.com \
  -a username \
  -k keyname \
  --name couch-joyent-0 \
  --dataset sdc:jpc:ubuntu-12.04:2.3.1 \
  --package "Large 8GB"

This will output server details. make sure you log these somewhere

Now create the replica

sdc-createmachine \  
  -u https://eu-ams-1.api.joyentcloud.com \
  -a username \
  -k keyname \
  --name couch-joyent-1 \
  --dataset sdc:jpc:ubuntu-12.04:2.3.1 \
  --package "Medium 2GB"

Ubuntu

Let’s start by connecting to our virtual machines. i would recommend iterm2 so you can browse between local, live and replica.

sshroot@165.255.222.111

‍ssh root@37.255.222.112

I would also change your ps1 so you can easily distinguish between the two machines:

vi ~/.bashrc  
# Edit the PS1 lines and replace with something like:
# Live:
# PS1='${debian_chroot:+($debian_chroot)}\u@couch-live-us:\w\$ '
# Replica:
# PS1='${debian_chroot:+($debian_chroot)}\u@couch-replica-eu:\w\$ '
. ~/.bashrc

Some ubuntu machines don’t ship with git and make, so let’s upgrade all our packages and install these two:

apt-get update  
apt-get upgrade  
apt-get install git make gcc build-essential -y

These machines might not have node.js, so follow the steps you did before to install

CouchDB

I would recommend you follow the couchdb wiki on installing couchdb on ubuntu.

However i’m going to document here the exact steps i took

mkdir /opt/install  
cd /opt/install  
# make sure you update this if a new version is out
wget http://mirrors.fe.up.pt/pub/apache/couchdb/releases/1.2.0/apache-couchdb-1.2.0.tar.gz  
apt-get install -y erlang-dev erlang-manpages erlang-base-hipe erlang-eunit erlang-nox erlang-xmerl erlang-inets libmozjs185-dev libicu-dev libcurl4-gnutls-dev libtool  
tar xvzf apache-couchdb-1.2.0.tar.gz  
cd apache-couchdb-*  
./configure
make  
make install

CouchDB is now built but we still need to create a user for couch to use, and set appropriate permissions and ownership

useradd -d /var/lib/couchdb couchdb  
chown -R couchdb: /usr/local/var/{lib,log,run}/couchdb /usr/local/etc/couchdb  
chmod 0770 /usr/local/var/{lib,log,run}/couchdb/  
chmod 664 /usr/local/etc/couchdb/*.ini  
chmod 775 /usr/local/etc/couchdb/*.d

Finally we want to set up init.d scripts so we can daemonize couchdb and manage it’s service like all other ubuntu processes

# In case Ubuntu has some trash from default instalation
rm /etc/logrotate.d/couchdb /etc/init.d/couchdb  
ln -s /usr/local/etc/logrotate.d/couchdb /etc/logrotate.d/couchdb  
ln -s /usr/local/etc/init.d/couchdb  /etc/init.d/couchdb  
update-rc.d couchdb defaults

Let’s checkpoint here and make sure everything worked:

service couchdb start  
curl localhost:5984  
service couchdb stop

If something failed, it’s likely you will want to kill couchdb processes you left lying around. You can execute this command to crash all things related to couchdb ps -u couchdb -o pid= | xargs kill -9

Let’s put our couchdb running:

service couchdb start

stud

stud stands for the scalable tls unwrapping daemon, and it’s a great ssl terminator that works on top of libev and openssl

I decided not to expose couchdb via regular http. as for https stud will be our front end to couchdb.

Installing stud in ubuntu is incredibly simple:

apt-get install libev4 libssl-dev libev-dev -y  
cd /opt/install  
git clone git://github.com/bumptech/stud.git  
cd stud  
make  
make install

stud doesn’t come bundled with all the nice things couchdb does, so we need to create similar artifacts:

mkdir /var/run/stud  
mkdir /usr/local/var/run/stud  
mkdir /usr/local/etc/stud  
touch /usr/local/etc/stud/stud.conf

You will also need a valid certificate for the domain you wish to use to expose your couchdb database. Get the pemfile and place it in /usr/local/etc/stud/stud.pem. a pemfile will include a private key and certificate information

touch /usr/local/etc/stud/stud.pem  
vi /usr/local/etc/stud/stud.pem

Let’s make sure we handle security properly:

useradd -d /var/lib/_stud _stud  
chown _stud: /usr/local/etc/stud/stud.pem  
chown _stud: /var/run/stud  
chown -R _stud: /usr/local/var/run/stud /usr/local/etc/stud  
chmod 0770 /usr/local/var/run/stud/  
chmod 664 /usr/local/etc/stud/*.conf  
chmod 600 /usr/local/etc/stud/stud.pem  
mkdir /etc/stud  
mkdir /etc/default  
touch /etc/stud/stud.conf

Ubuntu has a init.d script for stud. however i had to tweak it a bit to make it work with a custom installation, namely because it checked for the daemon before allowing be to changed the configuration.

You can download the init.d script from this gist

rm /etc/init.d/stud  
curl https://gist.github.com/dscape/4470972/raw/stud > /etc/init.d/stud  
chmod +x /etc/init.d/stud

We installed stud from source and we need to provide the script the paths of our custom installation:

vi /etc/default/stud

In my case these where the changes i needed to make:

PATH=/usr/local/bin:/sbin:/usr/sbin:/bin:/usr/bin  
DAEMON=/usr/local/bin/stud  
CHROOT="/usr/local/var/run/stud"  
COMMON_OPTIONS="-r $CHROOT -u $USER --config /usr/local/etc/stud/stud.conf"

Final step is to put our stud configuration in /usr/local/etc/stud/stud.conf

frontend="[*]:6984"  
backend="[127.0.0.1]:5984"  
pem-file="/usr/local/etc/stud/stud.pem"  
ssl=on  
workers=2  
syslog=on

Start stud:

update-rc.d stud defaults  
service stud start

We can test this is working. go to your local machine and try it out:

$  curl 165.255.222.111:5984
curl: (7) couldn't connect to host  
# -k means ignore ssl errors, cause the certificate is for a domain not ip
$  curl https://165.255.222.111:6984 -k
{"couchdb":"Welcome","version":"1.2.0"}

You should do the same check for the replica database 37.255.222.112

Now go to your dns provider and make sure you point something likemy-ouch.mydomain.com to the ip of the machine (a record). do the same for your replica database. If you try to do curl against the domain you will see it now works with the -k option

An unsolicited advice: use multiple dns providers case one of them goes down. It happened once, might happen again

Configuring CouchDB

We now need to configure our couchdb server. by default it comes in admin party mode but normally we want couchdb to be accessible only with a valid username and password

Browse to your futon:

https://my-couch.mydomain.com:6984/_utils/

And click on fix this to create our admin username and password

This will create an admin user but futon will still be visible in a read only capacity without authentication. to force authentication you should edit the local.ini file:

vi /usr/local/etc/couchdb/local.ini

Add

[couchdb]
delayed_commits = false
[couch_httpd_auth]
# some lines before
require_valid_user = true

This will work on server restart, but since we are editing this file let’s add our auto-compaction configuration. compaction is a cpu/disk intensive operation so should be scheduled accordingly. The auto-compaction feature was introduced in couchdb 1.2.0. in here we are going to use a simple configuration, but i strongly advise you to check the documentation instead of blindly copying

[daemons]
compaction_daemon={couch_compaction_daemon, start_link, []}
[compaction_daemon]
check_interval = 300  
min_file_size = 131072
[compactions]
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}]

Having worked in previous databases that do compactions, I would advise you to have at least 2 cpu’s per database and do compactions when the writes in your database are only a few

Now restart couchdb:

service couchdb stop

‍service couchdb start

Ok, browse back to futon and see that it now requires username and password to access

Migrating the data

We now need to migrate our data from our production CouchDB to our new live system. For this we will set a replicator job that will continuously replicate from the active production couch

You don’t want to use futon for this, because continuous replications get cancelled on restart when done from futon. To do this right you need to use the _replicator database which was introduced in couchdb 1.2.0.

Just do this for each production database you want to continuously replicate.

function register_replication() {  
  # $1 is database name
  # $2 is https://localuser:localpass@localhost:6984
  # $3 is https://remoteuser:remotepass@remote:6984
  DATA='{"source": "'$3'/'$1'","target": "'$1'","connection_timeout": 60000,"retries_per_request": 20,"http_connections": 30, "continuous":true, "user_ctx": { "roles": [ "_admin" ] }}'
  echo "database: "$1
  echo "local: "$2
  echo "remote: "$3
  echo "data: "$DATA
  echo
  echo "proceed? (control+c to cancel)"
  read
  curl -k -vX PUT $2/$1
  curl \
    -X POST \
    -k \
    -H "Content-type: application/json" \
    $2/_replicator \
    --data "$DATA"
}

You can now call this for each of the databases you want to replicate

register_replication \  
  foobar \
  https://u1:pw2@localhost:6984 \
  https://u2:pw2@my-couch.mydomain.com:6984

And for your replica:

register_replication \  
  foobar \
  https://u3:pw3@my-couch-replica.mydomain.com:6984
  https://u1:pw2@localhost:6984 \

Pulling the switch

When we decide to migrate to the new couch we can change the pointer to the new database and the old one will stop getting documents. after that we can remove the replications from the new live system. As for our replica, will will pull from our new live system and will be in standby mode always

You will need to change your configuration files so the correct server gets called. however, before doing that it is advisable that you start up your views. In couchdb are only first created on the first request. This means that if you migrate your system and you have a lot of traffic the first couple of requests will probably timeout, which is not that great

We need to connect to both the live and replica servers and make sure all views are created. (sidenote: if you are also adding new stuff to design documents don’t forget to do it right or have exactly the same problem as described above)

You can now build the couchdb views by using the couchdb-build-views script:

npm install -g couchdb-build-views

couchdb-build-views --help

Now just call the script:

couchdb-build-views --couch https://u3:pw3@my-couch-replica.mydomain.com:6984  
couchdb-build-views --couch https://u2:pw2@my-couch.mydomain.com:6984

Now that you are done, don’t forget to delete that silly directory and clean your history:

cd  
rm -rf ~/deletemelater/  
rm ~/.bash_history  
history -c  
touch ~/.bash_history

We are all done and ready for a new adventure: test this new environment in terms of load, and api. So don’t forget to check the couchdb changelog and test appropriately before switching