Engineering

Node.js databases: embedded database using LevelDB

Node.js has been designed to do quick and efficient network I/O. Its event-driven streams make it ideal to be used as a kind of smart proxy, often working as the glue between back-end systems and clients. Node was originally designed with that intention in mind, but meanwhile it has been successfully used to build traditional Web applications: an HTTP server that serves HTML pages or replies with JSON messages and uses a database to store the data. Even though Web frameworks in other platforms and languages have preferred to stick with traditional open-source relational databases like MySQL or PostgreSQL, most of the existing Node Web frameworks (like Express, Hapi and others) don’t impose any particular database, or even any type of database at all. This bring-your-own-database approach has been fed in part by the explosion in the variety of database servers now available, but also by the ease with which the Node module system and NPM allow you to install and use third-party libraries.

In this series we will analyse some of the existing solutions for interacting with some types of databases.

Meet LevelDB

LevelDB is a fast key-value store written by Google engineers that has the following characteristics:

  • Keys and values are arbitrary byte arrays.
  • It stores data sorted by key.
  • It provides basic put(key, value), get(key) and delete(key) primitives.
  • It allows multiple changes to be performed in one atomic batch.
  • It allows you to read a transient consistent snapshot of the data.

LevelDB can be used from within Node.js by installing the level package. When doing so, it compiles and installs LevelDB as a library, which you can then access through JavaScript.

LevelDB is thread-safe, but is not suited to being accessed from multiple processes. This means that you should only have a LevelDB database open from a single Node process. If you have multiple Node processes, a LevelDB database cannot be shared between them.

Also, LevelDB just exposes a simple key-value store API, on top of which a multitude of extension plugins exist, implementing functions as diverse as job queues, map-reduce views, partitioning, replication, locking, network exposure, and others. In this chapter we will explore some of these.

Installing LevelDB

To install LevelDB you should add the level package to your package.json application manifest, using NPM from the command line:

$ npm install level --save

This last commands adds the level dependency to your manifest and installs it. Now you can require it and use it inside your application:

db.js:

var level = require('level');  
var path = require('path');
var dbPath = process.env.DB_PATH || path.join(__dirname, 'mydb');  
var db = level(dbPath);
module.exports = db;


Here we created a singleton module that exports a LevelDB database. We start by requiring the levelmodule and use it to instantiate the database, providing it with a path. This path is the path of the directory where LevelDB will store its files. This directory should be fully dedicated to LevelDB, and may not exist at the beginning. In our case the path for the database is contained in the DB_PATH environment variable or, if not defined, defaults to the mydb directory inside the directory of the current source file.

Using LevelDB

Now, from another module we can get a reference to the database by requiring our singleton db.jsmodule:

app.js:

var db = require('./db');

After this we can use it immediately to put some values:

db.put('key 1', 'value 1');  
db.put('key 2', 'value 2');

You can also use the db object to get some values:

db.get('key 1', function(err, value) {  
  if (err) {
    return handleError(err);
  }
  console.log('value:', value);
});

or even delete some entries:

db.del('key 1');

Any of these operations take an optional callback as the last argument. When an error happens while performing the operation, this callback gets the error object as the first argument:

db.put('key 1', 'value 1', function(err) {  
  if (err) {
    console.error('error putting key 1:', err);
  }
});

Encodings

LevelDB supports any binary value as a key or a value: you can pass in buffer objects, but in our previous cases we’re using strings which, by default, LevelDB assumes are UTF-8 encoded. You can, if you wish to, use other key or value encodings by setting the keyEncoding or the valueEncoding in the constructor options object:

var options = {  
  keyEncoding: 'hex',
  valueEncoding: 'base64'
};
var db = level(dbPath, options);

Here we’re telling LevelDB to assume that keys are hex-encoded and that values are base64-encoded. Alternatively, you can use any encoding that the Buffer.toString() method allows: namely "hex", "utf8", "ascii", "binary", "base64", "ucs2", or "utf16le".

db.get('6b6579', function(err, value) {  
  if (err) {
    return handleError(err);
  }
  console.log('base64-encoded value:', value);
});

Using JSON for encoding values

You can specify json as a special value-encoding type, which allows you to use arbitrary JavaScript objects as values:

use_json.js:

var level = require('level');  
var path = require('path');  
var assert = require('assert');
var dbPath = process.env.DB_PATH || path.join(__dirname, 'mydb');
var options = {  
  keyEncoding: 'binary',
  valueEncoding: 'json'
};
var db = level(dbPath, options);
db.put(new Buffer([1, 2, 3]), { some: 'json' }, function(err) {  
  if (err) {
    return console.error(err);
  }
  db.get(new Buffer([1, 2, 3]), function(err, value) {
    if (err) {
      return console.error(err);
    }
    assert.deepEqual(value, { some: 'json' });
    console.log(value);
  });
});

One common use of the json encoding is storing and retrieving complex JavaScript objects without the need to do the encoding and decoding yourself.

Batch operations

A powerful feature of LeveLDB is that it allows you to group operations in a batch to be executed automatically. Here is an example of a batch operation:

var batch = db.batch();  
batch.put('key 1', 'value 1');  
batch.put('key 2', 'value 2')  
batch.del('key 3')  
batch.write(function(err) {  
  if (err) {
    console.error('batch operation failed:', err);
  }
});

Here you can see that we created a batch by calling db.batch(), and that we queued put and deloperations. At the end we try to execute the entire batch by calling the batch.write.

Any batch queuing operation always returns the batch itself, permitting us to add operations using a chained style:

db.batch()  
  .put('key 1', 'value 1')
  .put('key 2', 'value 2')
  .del('key 3')
  .write(function(err) {
    if (err) {
      console.error('batch operation failed:', err);
    }
  });

In this form you need to terminate the batch of commands with batch.write, providing it with a completion callback. If you get an error on the callback you provided to batch.write(), you can safely assume that all the operations contained in the batch have failed. If you don't get any error, you can safely assume that all the operations have succeeded and have been committed.

Instead of using the chaining API, you can use the array version of .batch(), which can be useful for mapping objects into operations:

var users = // ...
var operations = users.map(function(user) {  
  return { type: 'put', key: user.id, value: user };
});
db.batch(operations, function(err) {  
  if (err) {
    console.error('error in user batch:', err);
  }
});

Here you are passing an array containing all the operations of the batch. Each operation is described by an object containing a type attribute that specifies the operation type (either put or del). Any operation must define a key attribute, and an operation with type put must also provide a valueattribute.

Using a readable stream

We can create a readable stream from a LevelDB database like this:

var stream = db.createReadStream();

This streams emits one data object per record in the database:

stream.on('data', function(data) {  
  console.log('%s = %j', data.key, data.value);
});

The emitted data is an object that contains a key attribute and a value attribute. Since LevelDB stores data sorted by key, we can expect this stream to emit the keys in a sorted order.

Besides emitting data, this readable stream emits some other interesting events:

stream.once('end', function() {  
  console.log('no more data');
});
stream.once('close', function() {  
  console.log('stream closed');
});
stream.once('error', function(err) {  
  console.error('stream emitted error:', err);
});

The end event is emitted once the stream has emitted all the data, and the close event is immediately after. Also, the error event may be emitted if an error happens while reading the data.

Using ranges

In the previous example we were streaming the entire contents of the database. You can, alternatively, limit the emitted data to a range of keys by using the gt (greater-than), gte (greater-than-or-equal-to), lt (less-than), and lte (less-than-or-equal-to) options:

ranges.js:

var assert = require('assert');
var stream = db.createReadStream({  
  gte: 'a',
  lte: 'z'
});
stream.on('data', function(record) {  
  assert(record.key >= 'a' && record.key <= 'z');
});

Here we’re creating a read stream that will emit a data event for every record that contains a key that is both greater than or equal to the string "a", and less than or equal to the string "z". Because of the inclusiveness we're using, if our database contains a record with a key "a" or "z", that record will be emitted. On the other hand, if it includes a record with a key named "z1", it will not be emitted (because "z1" is greater than "z").

Limiting the number of results

Instead of streaming the entire contents of your database or your range, you can limit the maximum number of emitted records using the limit option:

limit.js:

var assert = require('assert');
var stream = db.createReadStream({  
  limit: 10 // only interested in the first 10 records
});
var count = 0;  
stream.on('data', function(record) {  
  assert(++ count <= 10);
});

A consistent snapshot

One of the many advantages of using a readable stream is that you get a consistent snapshot read. This means that, if any write or batch write happens after the stream begins, these writes will not be reflected in the stream. Let’s see this in action:

consistent.js:

var assert = require('assert');  
var db = require('./db');
var batch = [  
  {type: 'put', key: 'a', value: 'A'},
  {type: 'put', key: 'b', value: 'B'},
  {type: 'put', key: 'c', value: 'C'},
  {type: 'put', key: 'd', value: 'D'}
];
var index = 0;
db.batch(batch, function(err) {  
  var stream;
  if (err) {
    console.error(err);
  }
  else {
    stream = db.createReadStream({gte: 'a', lte: 'd'});
    stream.on('data', onData);
    db.batch(batch.map(function(op) {
      return {type: op.type, key: op.key, value: 'other value'};
    }), function(err) {
      if (err) {
        console.error(err);
      }
      else {
        console.log('saved batch replacing with other value, resuming');
      }
    });
  }
});
function onData(record) {  
  console.log('read: %j', record);
  assert.equal(record.value, batch[index ++].value);
}

If you run this script you will get the following result:

$ node consistent
read: {"key":"a","value":"A"}  
saved batch replacing with other value, resuming  
read: {"key":"b","value":"B"}  
read: {"key":"c","value":"C"}  
read: {"key":"d","value":"D"}

Some time between reading the values, the batch write replacing all the values with the string other value succeeded, but it is never reflected on the readable stream that was already open.

Using ranges to partition the data

LevelDB is a key-value store: an arbitrary value is stored under an arbitrary key, but this doesn’t need to limit the type of values you store. Since it stores all the keys in sorted order, you can partition the key namespace into an unlimited set of separate namespaces.

For instance, if you’re programming some kind of a social network and you have a database where you store user records, you can also use it to store all the user relationships.

bookface_populate.js

var path = require('path');  
var level = require('level');
var dbPath = process.env.DB_PATH || path.join(__dirname, 'bookface');  
var db = level(dbPath, {  
  valueEncoding: 'json'
});
var batch = [];
var users = require('./users');
users.forEach(function(user) {  
  batch.push({
    type: 'put',
    key: user.email,
    value: {
      email: user.email,
      createdAt: new Date
    }
  });
  user.friends.forEach(function(friend) {
    batch.push({
      type: 'put',
      key: user.email + '!friendships!' + friend,
      value: {
        source: user.email,
        target: friend,
        createdAt: new Date
      }
    })
  });
});
db.batch(batch, function(err) {  
  if (err) {
    throw err;
  }
  console.log('populated successfully');
});

In this last file there are some peripheral things going on. First, we’re populating a batch operation that contains all the initial users and their friendships, storing it in the batch variable. Next we're issuing the db.batch command, inserting all the users and their friendships into our database in one go. For this to work, you will need to define the users.js module:

users.js:

module.exports = [  
  {
    email: 'person1@example.com',
    friends: [
      'person2@example.com',
      'person3@example.com'
    ]
  },
  {
    email: 'person2@example.com',
    friends: [
      'person1@example.com',
      'person3@example.com'
    ]
  },
  {
    email: 'person3@example.com',
    friends: [
      'person1@example.com'
    ]
  }
];

We can now populate our bookface database:

$ node bookface_populate
populated successfully

Next, we can devise a command-line query script:

bookface.js:

var path = require('path');  
var level = require('level');
var dbPath = process.env.DB_PATH || path.join(__dirname, 'bookface');  
var db = level(dbPath, {  
  valueEncoding: 'json'
});
function user(email, cb) {  
  db.get(email, cb);
}
function friendships(email) {  
  var key = email + '!' + 'friendships!';
  return db.createValueStream({
    gte: key,
    lte: key + '\xff'
  });
}
var email = process.argv[2];
user(email, function(err, user) {  
  console.log('got user:', user);
});
var friends = friendships(email).on('data', function(friend) {  
  console.log('friend:', friend.target);
});
friends.once('end', function() {  
  console.log('no more friends');
});

We can now use this script to query users and their relationships from our database:

$ node bookface person1@example.com
got user: { email: 'person1@example.com',  
  createdAt: '2015-01-06T15:43:01.196Z' }
friend: person2@example.com  
friend: person3@example.com  
no more friends

Here we’re storing each friendship inside the <EMAIL>!friendship! namespace. Generically, we're using the ! character to separate elements in our keys. The trick in retrieving the data is in the range query, as in this example:

function friendships(email) {  
  var key = email + '!' + 'friendships!';
  return db.createValueStream({
    gte: key,
    lte: key + '\xff'
  });
}

Perhaps counter-intuitive, but particulary important is the higher-end limiting (key + '\xff'), so that we don't get more records than we want.

Using level-sublevel

Instead of creating the keys by hand, you can use the level-sublevel NPM module to partition the key space at will. Let's then install it:

$ npm install level-sublevel --save

We’ll also need to generate random unique IDs, so let’s take the chance to install the cuid NPM module:

$ npm install cuid --save

We can now use level and level-sublevel to create the two sub-databases — one for holding user records and another for holding user messages:

sublevels.js:

var level = require('level');  
var path = require('path');  
var sublevel = require('level-sublevel');
var dbPath = process.env.DB_PATH || path.join(__dirname, 'sublevels');  
var db = sublevel(level(dbPath, {  
  valueEncoding: 'json'
}));
exports.base = db;  
exports.users = db.sublevel('users');  
exports.messages = db.sublevel('messages');

Here we’re wrapping the database returned by level with sublevel, storing it in a temporary variable named db. We then create the two sub-databases by calling db.sublevel('<DATABASE NAME>'), one for users and another for messages.

Now we can use this sublevels module to populate the database:

sublevels_populate.js:

var cuid = require('cuid');  
var db = require('./sublevels');
var user = {  
  name: 'John',
  email: 'user1@example.com'
};
db.users.put(user.email, user, function() {  
  for(var i = 1 ; i <= 20; i ++) {
    var userMessages = db.messages.sublevel(user.email);
    userMessages.put(cuid(), {
      from: 'user' + i + '@example.com',
      to: 'user1@example.com',
      subject: 'Hey!',
      body: 'hey there, how you doing?'
    });
  }
});

Here we’re creating a user record, and then creating 20 messages addressed to them. You can see how we can create a sublevel inside another sublevel: we’re storing the user messages in a sub-database named after the user email inside the messages sub-database.

You may also have noticed that each message has a unique ID given to us by the cuid module.

Let’s call our script:

$ node sublevels\_populate

Now we can create a script to query our database:

sublevels_query.js:

var db = require('./sublevels');
var email = process.argv[2];
db.users.get(email, function(err, user) {  
  if (err) {
    throw err;
  }
  console.log('User: %j', user);
  var userMessages = db.messages.sublevel(email);
  userMessages.createValueStream().on('data', function(message) {
    console.log('Message: %j', message);
  })
  .once('end', function() {
    console.log('no more messages');
  });
});

We can now test this script, using it to query our user:

$ node sublevels_query.js user1@example.com
User: {"name":"John","email":"user1@example.com"}  
Message: {"from":"user1@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user2@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user3@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user4@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user5@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user6@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user7@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user8@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
Message: {"from":"user9@example.com","to":"user1@example.com","subject":"Hey!","body":"hey there, how you doing?"}  
...
no more messages

Batch in different sublevels

Besides being able to use one LevelDB database as a set of different databases, we can also use level-sublevel to atomically perform a batch of updates into two or more distinct sublevels, all in an atomic way. For instance, when a message is created, you may want to atomically insert it into the sender outbox and the receiver inbox:

sublevels_insert_message.js:

var db = require('./sublevels');  
var cuid = require('cuid');
exports.insert = insertMessage;
function insertMessage(to, from, subject, body, cb) {  
  var id = cuid();
  var message = {
    to: to,
    from: from,
    subject: subject,
    body: body
  };
  var batch = [
    {
      type: 'put',
      key: id,
      value: message,
      prefix: db.messages.sublevel(from).sublevel('out')
    },
    {
      type: 'put',
      key: id,
      value: message,
      prefix: db.messages.sublevel(to).sublevel('in')
    }
  ];
  db.base.batch(batch, cb);
}

Here we can see that we’re using db.batch to atomically insert two records, each into different sublevels. We specify which sublevel the operation is going to affect in the prefix property of each batch entry.

You will also notice that we’re inserting the batch into the base database, and not into any sublevel. This is because the sublevel is already specified inside each batch entry.

Hooks

The level-sublevel is really the Swiss army knife for LevelDB, and it doesn't end here. We can use it to automatically make changes before a change occurs by using db.pre. We can use this to store a log of all the changes performed in a database (or a specific sub-level) in a separate sub-level. For instance, we could hook into our users sub-level and record every change made to it:

sublevels_user_hook.js:

var cuid = require('cuid');  
var db = require('./sublevels');  
var userChanges = db.base.sublevel('userchanges');
db.users.pre(function(change, add) {  
  add({
    type: 'put',
    key: cuid(),
    value: {
      when: new Date(),
      change: change
    },
    prefix: userChanges.sublevel(change.key)
  });
});

We can now activate this hook by including this module in our sublevels.js file:

bottom of sublevels.js:

exports.base = db;  
exports.users = db.sublevel('users');  
exports.messages = db.sublevel('messages');
require('./sublevels_user_hook');

We can now make some number of user changes using our sublevel_populate script:

$ node sublevels_populate
$ node sublevels_populate
$ node sublevels_populate

We can now create a script to query our changes to a given user:

sublevels_user_log.js:

var db = require('./sublevels').base.sublevel('userchanges');
var email = process.argv[2];
var userChanges = db.sublevel(email);
userChanges.createValueStream().on('data', function(message) {  
  console.log('User Change: %j', message);
})
.once('end', function() {
  console.log('no more changes');
});

And use it:

$ node sublevels_user_log.js user1@example.com
User Change: {"when":"2015-01-08T12:19:43.154Z","change":{"key":"user1@example.com","value":{"name":"John","email":"user1@example.com"},"prefix":["users"],"type":"put"}}  
User Change: {"when":"2015-01-08T12:19:43.946Z","change":{"key":"user1@example.com","value":{"name":"John","email":"user1@example.com"},"prefix":["users"],"type":"put"}}  
User Change: {"when":"2015-01-08T12:19:44.555Z","change":{"key":"user1@example.com","value":{"name":"John","email":"user1@example.com"},"prefix":["users"],"type":"put"}}  
no more changes

In the next article we’ll look into an open-source in-memory database named Redis and how we can interface with it using Node.js.

Node.js databases: embedded database using LevelDB
was originally published in YLD Blog on Medium.
Share this article: