In this post I'll show you how to build a simple Javascript library to handle asynchronous callbacks with the generators mechanism.

So, what are Javascript generators and why should you care?

The short answer

...Because generators can make asynchronous code look attractive.

The long answer

Anyone who's been working with Javascript / NodeJS for even a short while should be familiar with callbacks. A lot has been said about Javascript's callback mechanism; specifically about the code attractiveness (or lack thereof) when employing callbacks for serial asynchronous operations. A classic example is the one in which we use NodeJS to do a number of serial asynchronous IO calls:

var db = require('db');
db.get( /** some query */, function(err, result){
    /** handle error, and / or: */
    db.get( /** another query */, function(err, result){
        /** handle error, and / or: */
        db.insert( /** ... */, function(err){
            /** handle error, and / or: */
            db.update( /** ... */, function(err){
                /** ... */
            });
        });
    });
});

The code becomes extremely convoluted when the control flow becomes more complex and we have to handle errors in each step. To make the code more attractive we usually want two things:

  1. Handling errors in one place.
  2. Having a visual separation of logical blocks.

How it was done before generators

Numerous libraries were built to provide new patterns for asynchronous code. All (most) of them provide some kind of mechanism for errors propagation and untangling of nested callbacks. Personally I have used extensively the async library which implements a good selection of asynchronous control flows (parallel, serial, waterfall, etc.); and an error propagation mechanism which allows handling of errors in one place (and to halt the control flow once an error occurs).

With 'async' the above example looks a bit more sane:

var db = require('db'),
    async = require('async');

async.waterfall([
    function(next){
        db.get( /** some query */, next);
    },

    function(result, next){
        db.get( /** another query */, next);
    },

    function(result, next){
        db.insert( /** ... */, next);
    },

    function(next){
        db.update( /** ... */, next);
    }
],
function(err){
    /** handle error */
});

Much better.

Other solutions use different approaches to achieve the same goals. The promise mechanism is one of these solutions, which became very prominent recently; maybe due to extensive usage in several projects (such as AngularJS).

With promises we wrap asynchronous code in a promise object. We defer the result of the asynchronous operation to a later time, at which the promise is 'resolved' (or 'rejected' on error). We can bind our own callbacks to events in the promise life (such as when it is resolved or rejected).

Personally, I am less fond of this mechanism, but it's important to be aware of it.

All of these mechanisms don't introduce new functionalities to Javascript itself; but rather do sophisticated manipulations to our asynchronous functions in order to make our code more readable.

How it's done after generators

First, let's discuss what are generators.

Generators are function-like entities which behave like iterators.

What are iterators? Iterators are entities which provide a sequential list of values. From the outside, an iterator is just an object with some methods to help us control it. For example, iterators have a next() method which we call to retrieve the next value. Obviously iterators have a state which is preserved between calls to next().

This concept should be familiar for anyone coming from languages such as Python, Java, C++, etc. In those language iterators are implemented with classes which we use to generate these 'iterator' objects. They define the internals of the iterator (to make it stateful) and methods which the iterator must have according to the specifications of the language.

A generator is very much like an iterator, with the difference that the state is preserved in the scope of the defining function, instead of in some classed object. Much like an iterator iterates over some finite or infinite list of values, a generator yields the values. An iterator remembers its state between iterations by using some backing object; but a generator remembers its state by keeping the scope of the function suspended between yields. Once the generator is exhausted (which might never happen if it yields an infinite series of values) the function's scope will be cleared from the stack. In other words, the generator's function is allowed to run until it encounters a yield expression.

Let's illustrate. Following is a generator which generates an infinite series of even numbers, starting from some given number:

// the '*' indicates this is a generator
function * generateEvens(start) {
    if (start % 2 !== 0) start++;
    while (true) {
        yield start;
        start += 2;
    }
}

var evens = generateEvens(5);
evens.next().value; // 6
evens.next().value; // 8
evens.next().value; // 10
evens.next().value; // 12

Why is this important? I'll repeat:

...a generator remembers its state by keeping the scope of the function suspended between yields.

Imagine we could suspend the run of a function while it waits some asynchronous callback to complete. Well, with generators we can. And it's rather simple. But we'll get to that in a minute.

First we need to discuss how to transfer information to / from a generator. We already know that next() is used to get the next yielded value from the generator. We can also use next( /** some value */ ) to send values back to the most recently called yield expression. Inside the generator function, whatever we send back with next will be evaluated as the result of the yield expression. This is very important, because it means that once the generator resumes from its suspended state in order to yield the next value, we can actually change its internal state.

OK, another example. This time we want to generate a series of even numbers, but be able to to reverse the order of iteration.

function * generateEvens(start) {
    var dir = 1; // up
    if (start % 2 !== 0) start++;
    while (true) {
        var tmp = yield start;
        if (tmp === 'up') dir = 1;
        if (tmp === 'down') dir = -1;
        start += dir * 2;
    }
}

var evens = generateEvens(5);
evens.next().value; // 6
evens.next().value; // 8
evens.next('down').value; // 6
evens.next().value; // 4
evens.next().value; // 2
evens.next('up').value; // 4
evens.next().value; // 6

So yield start; evaluates to whatever we send to next(). Pretty cool.

So what does all of this have to do with asynchronous code you ask? Good question. Many articles across the net show to how control the flow of asynchronous operations by using generators with promises; or show how to use some external libraries. But I want neither. I'm not a fan of promises, and I want to know what is actually going on behind the scenes. Apparently it's not very complicated.

Let's contrive another useless example:

var foo, bar;

function getFoo(callback) {
    setTimeout(function() {
        callback('hello');
    }, 200);
}

function getBar(callback) {
    setTimeout(function() {
        callback('world');
    }, 200);
}

getFoo(function(val) {
    foo = val
    getBar(function(val) {
        bar = val;
        console.log(foo, bar); // hello world
    });
});

As expected, after 400ms 'hello world' will be printed. Now let's use generators to make this code more readable.

We want to be able to suspend our code while we wait for the result of some asynchronous operation. We have already seen how generator functions can be suspended. Great, let's wrap our code with a generator function. Every time we need to wait for an asynchronous operation, we will call yield. Whenever the asynchronous operation finishes we will call next() with the result, thus passing the result back to our code.

We want something like this:

function * run (){
    var foo = yield getFoo();
    var bar = yield getBar();
    console.log(foo,bar);
};
run();

Running this code will do nothing, as we never actually call next to start and resume the generator. We need to do that from outside the generator. But timing is important. We must resume the generator only when the value from the asynchronous operation is ready. We have no choice but to delegate this responsibility to the asynchronous functions themselves. Luckily they accept a callback function which is run once the asynchronous operation is complete. Whenever they are ready with the value, they will resume our generator for us. In order to allow this, we will provide to the generator a special function resume. resume will be passed as a callback to the asynchronous functions.

var r = run(resume);
r.next(); // start

function resume(value) {
    r.next(value);
}

function * run(resume) {
    var foo = yield getFoo(resume);
    var bar = yield getBar(resume);
    console.log(foo, bar); // hello world
}

Simple. resume simply sends back the result (value) of the asynchronous operation back into the generator, in place of the last yield expression. Things are happening asynchronously, but instead of writing nested callbacks to react in a specific way to each asynchronous operation; we just wait for the operation to complete, send the result back to our main flow of control and continue as if things are happening synchronously.

Some may argue that imposing synchronicity on asynchronous operations defeats the purpose of asynchronicity. True. But that's not what happening here. We wait only when we must wait for an asynchronous operation to complete. Exactly as we would with classical nested callbacks.

Make it a library

Taking the above example, we can build on it to handle any generator whatsoever. Let's build a NodeJS module to do that. We will show here only the basics, upon which we can build cool things like centralized errors handling, waiting for parallel asynchronous operations, etc. See this repository for more.

/**
 * Take a generator, provide it with a 'resume' function
 * and run it.
 *
 * assume callbacks are in standard NodeJS form:
 *
 *     function(err, result)
 *
 */
module.exports = function(generator){

    var run;

    function resume(err, result){

        // if the callback returned an error
        // make the generator throw it.
        if (err) return run.throw(err);

        // pass the result to the last 'yield'
        // expression by calling generator's
        // 'next' with the result
        run.next.call(run, result);

    }

    run = generator(resume);
    run.next(); // start the generator immediately

};

That's it. Of course a full fledged library has to deal with edge cases, validate arguments, etc. There are a few nice ones out there, such as suspend, galaxy and genny. Check them out.

Where to go from here

Javascript generators are only available in engines that implement ECMAScript 6. You can grab the latest version of NodeJS (at least 0.11) and start hacking. Remember to run node with the --harmony flag to have support for generators.