“DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.” (source)
Batching is dataloader’s primary job.
- The batch function receives an array of keys;
- Returns a promise for an array of values.
Whaaat? how?
There are a few options you can add to the dataloader, check them in the docs.
We can later load data from the dataloader we created:
myLoader.load(key)
To better understand the dataloader and where its “magic” lies check the enqueuePostPromiseJob function, where the Node.js process.nextTick(fn) is called.
To sum up, the batching is done by grabbing all load operations that need to finish before the next event loop (and before any I/O operation). Using process.nextTick it will dispatch the job in the right place (right before the next event loop!) and batch them together.
Note that most databases support batch loading and there are few examples on how to do it in facebook/dataloader examples.
Caching
If we call the load function twice with the same key, the result will be the same because the data is cached (it will only not be cached if you explicitly set { cache: false } in the options). Here are some characteristics of the caching functionality:
- The dataloader does in-memory caching
- The cache will only be deleted when the data is garbage collected
- …but you can clear the cache calling clear(key) or clearAll()! This is useful when you have mutations and you want to invalidate the existing cache
Football use case!
Imagine that we have a football team with all the great players! A football team has players, coaches, games, etc. There is a relationship between Player and Coach:
Let’s say for the purpose of our example that the team has a main Coach, but each Player position may have an assistant Coach: goalkeepers have its own assistant Coach, forwards have its own, and so on. Taking that into consideration our data could be the following:
What a team we have! As you can see the coach might get duplicated, and this is a reason to use DataLoader and avoid doing duplicate requests. If our DataLoader keys are the ids, the coach requests would be the following:
DataLoader’s batch function could also be invoked if we’re querying all players because we are (probably) requesting them from the same database table. Let’s use SQL as an example:
- we would do a SELECT * FROM PLAYERS to have all players;
- to have a specific set of players, lets say the lineup team for a game, we would do SELECT * FROM players WHERE id IN (id1, id2, …, idn) — yay! batch really works in SQL!
- Please note that there is no magic, your database has to support batch in order to do it ;)
Adding DataLoader to GraphQL
In order to use the DataLoader in GraphQL you need to pass it through the GraphQL context. Let’s use the coachLoader example we just mentioned earlier:
It can now be used in the resolvers:
And the coachLoader.js implementation:
batchCoaches is the function passed to the dataloader that receives and array of keys (ids) and returns an array of values. db.findCoachesByIds is the function that demystifies most of the magic: it only calls the database batch query. If we are using Sequelize ORM it should be the following:
That’s all! There’s really no magic! Node.js’s process.nextTick and a database that supports batch does the trick!
Resources
- Facebook/dataloader github
- Video: DataLoader — Source code walkthrough by Lee Byron
- DataLoader and caching in Apollo
Want to know more?
YLD provides training tailored to your needs. It could be React.js, Node.js, DevOps and more. Get in touch to learn more.