Node.js has been designed to do quick and efficient network I/O. Its event-driven streams make it ideal to be used as a kind of smart proxy, often working as the glue between back-end systems and clients. Node was originally designed with that intention in mind, but meanwhile it has been successfully used to build traditional Web applications: an HTTP server that serves HTML pages or replies with JSON messages and uses a database to store the data. Even though Web frameworks in other platforms and languages have preferred to stick with traditional open-source relational databases like MySQL or PostgreSQL, most of the existing Node Web frameworks (like Express, Hapi and others) don’t impose any particular database, or even any type of database at all. This bring-your-own-database approach has been fed in part by the explosion in the variety of database servers now available, but also by the ease with which the Node module system and NPM allow you to install and use third-party libraries.
In this series we will analyse some of the existing solutions for interacting with some types of databases.
Meet LevelDB
LevelDB is a fast key-value store written by Google engineers that has the following characteristics:
- Keys and values are arbitrary byte arrays.
- It stores data sorted by key.
- It provides basic
put(key, value)
,get(key)
anddelete(key)
primitives. - It allows multiple changes to be performed in one atomic batch.
- It allows you to read a transient consistent snapshot of the data.
LevelDB can be used from within Node.js by installing the level package. When doing so, it compiles and installs LevelDB as a library, which you can then access through JavaScript.
LevelDB is thread-safe, but is not suited to being accessed from multiple processes. This means that you should only have a LevelDB database open from a single Node process. If you have multiple Node processes, a LevelDB database cannot be shared between them.
Also, LevelDB just exposes a simple key-value store API, on top of which a multitude of extension plugins exist, implementing functions as diverse as job queues, map-reduce views, partitioning, replication, locking, network exposure, and others. In this chapter we will explore some of these.
Installing LevelDB
To install LevelDB you should add the level package to your package.json application manifest, using NPM from the command line:
$ npm install level --save
This last commands adds the level dependency to your manifest and installs it. Now you can require it and use it inside your application:
db.js:
Here we created a singleton module that exports a LevelDB database. We start by requiring the levelmodule and use it to instantiate the database, providing it with a path. This path is the path of the directory where LevelDB will store its files. This directory should be fully dedicated to LevelDB, and may not exist at the beginning. In our case the path for the database is contained in the DB_PATH environment variable or, if not defined, defaults to the mydb directory inside the directory of the current source file.
Using LevelDB
Now, from another module we can get a reference to the database by requiring our singleton db.jsmodule:
app.js:
var db = require('./db');
After this we can use it immediately to put some values:
You can also use the db object to get some values:
or even delete some entries:
db.del('key 1');
Any of these operations take an optional callback as the last argument. When an error happens while performing the operation, this callback gets the error object as the first argument:
Encodings
LevelDB supports any binary value as a key or a value: you can pass in buffer objects, but in our previous cases we’re using strings which, by default, LevelDB assumes are UTF-8 encoded. You can, if you wish to, use other key or value encodings by setting the keyEncoding or the valueEncoding in the constructor options object:
Here we’re telling LevelDB to assume that keys are hex-encoded and that values are base64-encoded. Alternatively, you can use any encoding that the Buffer.toString() method allows: namely "hex", "utf8", "ascii", "binary", "base64", "ucs2", or "utf16le".
Using JSON for encoding values
You can specify json as a special value-encoding type, which allows you to use arbitrary JavaScript objects as values:
use_json.js:
One common use of the json encoding is storing and retrieving complex JavaScript objects without the need to do the encoding and decoding yourself.
Batch operations
A powerful feature of LeveLDB is that it allows you to group operations in a batch to be executed automatically. Here is an example of a batch operation:
Here you can see that we created a batch by calling db.batch(), and that we queued put and deloperations. At the end we try to execute the entire batch by calling the batch.write.
Any batch queuing operation always returns the batch itself, permitting us to add operations using a chained style:
In this form you need to terminate the batch of commands with batch.write, providing it with a completion callback. If you get an error on the callback you provided to batch.write(), you can safely assume that all the operations contained in the batch have failed. If you don't get any error, you can safely assume that all the operations have succeeded and have been committed.
Instead of using the chaining API, you can use the array version of .batch(), which can be useful for mapping objects into operations:
Here you are passing an array containing all the operations of the batch. Each operation is described by an object containing a type attribute that specifies the operation type (either put or del). Any operation must define a key attribute, and an operation with type put must also provide a valueattribute.
Using a readable stream
We can create a readable stream from a LevelDB database like this:
var stream = db.createReadStream();
This streams emits one data object per record in the database:
The emitted data is an object that contains a key attribute and a value attribute. Since LevelDB stores data sorted by key, we can expect this stream to emit the keys in a sorted order.
Besides emitting data, this readable stream emits some other interesting events:
The end event is emitted once the stream has emitted all the data, and the close event is immediately after. Also, the error event may be emitted if an error happens while reading the data.
Using ranges
In the previous example we were streaming the entire contents of the database. You can, alternatively, limit the emitted data to a range of keys by using the gt (greater-than), gte (greater-than-or-equal-to), lt (less-than), and lte (less-than-or-equal-to) options:
ranges.js:
Here we’re creating a read stream that will emit a data event for every record that contains a key that is both greater than or equal to the string "a", and less than or equal to the string "z". Because of the inclusiveness we're using, if our database contains a record with a key "a" or "z", that record will be emitted. On the other hand, if it includes a record with a key named "z1", it will not be emitted (because "z1" is greater than "z").
Limiting the number of results
Instead of streaming the entire contents of your database or your range, you can limit the maximum number of emitted records using the limit option:
limit.js:
A consistent snapshot
One of the many advantages of using a readable stream is that you get a consistent snapshot read. This means that, if any write or batch write happens after the stream begins, these writes will not be reflected in the stream. Let’s see this in action:
consistent.js:
If you run this script you will get the following result:
Some time between reading the values, the batch write replacing all the values with the string other value succeeded, but it is never reflected on the readable stream that was already open.
Using ranges to partition the data
LevelDB is a key-value store: an arbitrary value is stored under an arbitrary key, but this doesn’t need to limit the type of values you store. Since it stores all the keys in sorted order, you can partition the key namespace into an unlimited set of separate namespaces.
For instance, if you’re programming some kind of a social network and you have a database where you store user records, you can also use it to store all the user relationships.
bookface_populate.js
In this last file there are some peripheral things going on. First, we’re populating a batch operation that contains all the initial users and their friendships, storing it in the batch variable. Next we're issuing the db.batch command, inserting all the users and their friendships into our database in one go. For this to work, you will need to define the users.js module:
users.js:
We can now populate our bookface database:
Next, we can devise a command-line query script:
bookface.js:
We can now use this script to query users and their relationships from our database:
Here we’re storing each friendship inside the <EMAIL>
!friendship! namespace. Generically, we're using the ! character to separate elements in our keys. The trick in retrieving the data is in the range query, as in this example:
Perhaps counter-intuitive, but particulary important is the higher-end limiting (key + '\xff'), so that we don't get more records than we want.
Using level-sublevel
Instead of creating the keys by hand, you can use the level-sublevel NPM module to partition the key space at will. Let's then install it:
$ npm install level-sublevel --save
We’ll also need to generate random unique IDs, so let’s take the chance to install the cuid NPM module:
$ npm install cuid --save
We can now use level and level-sublevel to create the two sub-databases — one for holding user records and another for holding user messages:
sublevels.js:
Here we’re wrapping the database returned by level with sublevel, storing it in a temporary variable named db. We then create the two sub-databases by calling db.sublevel('<DATABASE NAME>
'), one for users and another for messages.
Now we can use this sublevels module to populate the database:
sublevels_populate.js:
Here we’re creating a user record, and then creating 20 messages addressed to them. You can see how we can create a sublevel inside another sublevel: we’re storing the user messages in a sub-database named after the user email inside the messages sub-database.
You may also have noticed that each message has a unique ID given to us by the cuid module.
Let’s call our script:
$ node sublevels\_populate
Now we can create a script to query our database:
sublevels_query.js:
We can now test this script, using it to query our user:
Batch in different sublevels
Besides being able to use one LevelDB database as a set of different databases, we can also use level-sublevel to atomically perform a batch of updates into two or more distinct sublevels, all in an atomic way. For instance, when a message is created, you may want to atomically insert it into the sender outbox and the receiver inbox:
sublevels_insert_message.js:
Here we can see that we’re using db.batch to atomically insert two records, each into different sublevels. We specify which sublevel the operation is going to affect in the prefix property of each batch entry.
You will also notice that we’re inserting the batch into the base database, and not into any sublevel. This is because the sublevel is already specified inside each batch entry.
Hooks
The level-sublevel is really the Swiss army knife for LevelDB, and it doesn't end here. We can use it to automatically make changes before a change occurs by using db.pre. We can use this to store a log of all the changes performed in a database (or a specific sub-level) in a separate sub-level. For instance, we could hook into our users sub-level and record every change made to it:
sublevels_user_hook.js:
We can now activate this hook by including this module in our sublevels.js file:
bottom of sublevels.js:
We can now make some number of user changes using our sublevel_populate script:
We can now create a script to query our changes to a given user:
sublevels_user_log.js:
And use it:
In the next article we’ll look into an open-source in-memory database named Redis and how we can interface with it using Node.js.