Monday, December 23, 2013

Comparison of two worlds - Await and Yield in C# and JavaScript

A few months ago, I published the source code of NWarpAsync.Yield, a C# library that emulated "yield" (a C# language feature) on top of "await" (another C# language feature).
This project is possible because await and yield are essentially the same feature. This is something some members of the JavaScript developer community noticed before I did and have exploited it to their benefit: JavaScript (as of ECMAScript version 6, which is still not finalized) already enables C#-style simple asynchronous programming.

I'll cover the main differences between the two approaches here, as well as why the JavaScript approach wouldn't work as nicely in C#, even though something similar would be possible. I do have a working prototype showing how it can be done (check the first link in the "Further Reading" section).

Required Knowledge

In order to understand this post, you should understand what generator functions are. Knowing C# and JavaScript will help greatly.

Yield in C#

I'd like to start this post by pointing out that yield in C# is not identical to yield in JavaScript. Unfortunately, as we'll see, yield in C# is less flexible than yield in JavaScript.

In C#, generator functions are those that return IEnumerable, IEnumerable<T>, IEnumerator or IEnumerator<T>. For the purposes of this post, we'll focus on IEnumerable<T>.
For those who are not familiar with C# terminology, an IEnumerable<T> is an object that can be iterated on.

Here's what a typical C# generator function looks like:
public IEnumerable<int> GetValues(int count)
{
   for (int i = 0; i < count; ++i)
   {
      yield return i;
   }
}
This method can now be used like any other method.

Enumerables in C# are classes that declare a method named "GetEnumerator" that in turn returns an IEnumerator<T>. "Enumerator" is the C# name for "iterator".

It is possible to iterate multiple times on the same instance returned by our GetValues method. Every time GetEnumerator is called, the function starts from scratch.

IEnumerator itself is an interface. The most important members of this interface are MoveNext and Current.
When MoveNext is called, the function is executed up to the point of the next "yield return" (or the end of the method, if it there aren't any more "yield return" statements).
MoveNext returns true if the generator reached a "yield return" statement, false if it reached the end.

The value returned by the "yield return" statement can be read by the "Current" property.

This has two important implications:
  1. No code in the function is executed until "MoveNext" is first called;
  2. The code is lazily evaluated, meaning that if, after a yield return statement is executed, MoveNext is never called again, then no more code is executed.

The second implication means we can get away with generators that return infinite collections without having the application hang, provided that the caller stops calling "MoveNext"(which will always return true in this case) at some point.
public IEnumerable<int> GetAllIntegers()
{
   // This method is valid.
   // Just don't use it in a foreach loop without a break statement.
   int i = int.MinValue;
   for (;;)
   {
      // Wraps around back to negative numbers on overflow.
      yield return unchecked(i++);
   }
}
So let's review the contract associated with yield in C#:
  1. The caller calls a method that returns an IEnumerable
  2. The caller calls enumerable.GetEnumerator()
  3. The caller calls enumerator.MoveNext() followed by enumerator.Current as many times as it wants to.
To handle finally blocks, the IEnumerator<T> interface also defines a Dispose method. When the Dispose method is called, the code in the applicable "finally" blocks is executed and the enumerator is considered terminated.

Yield in JavaScript

Yield in JavaScript is recent. So recent, in fact, that the standard it is described in is still just a draft.

Let's take a look at what yield in JavaScript looks like:
function * generator(count) {
   for (var i = 0; i < count; ++i) {
      yield i;
   }
}
Dynamic typing aside, the main differences we can see so far between JavaScript and C# are:
  1. Generator functions in JavaScript must be explicitly marked as such with the "*" after the "function" keyword (in C# it was inferred by looking at the function body):
  2. "yield" is used instead of "yield return".

However, the two languages do not have identical semantics.
In C#, generator functions can either return an enumerable or an enumerator.
In JavaScript, the generator function always returns the iterator (enumerator) directly.

The iterator is an object with a method called "next", which replaces "MoveNext" and "Current". The next() function returns an object that indicates the yielded value and whether the function is over.
function * generator() {
   yield 1;
   return 2;
}
var g = generator();
g.next(); //returns { value: 1, done: false }
g.next(); //returns { value: 2, done: true }
g.next(); //throws exception
Note that "return" is a valid statement in generator functions, as opposed to C#, where mixing the two styles in one function results in a compile error.

Another big difference is that yield in C# is a statement, but yield in JavaScript can be used in an expression. This means the following is valid:
function * generator() {
   return yield 1;
}
var g = generator();
g.next(); //returns { value: 1, done: false }
g.next(); //returns { value: undefined, done: true }
So now, what does the "yield" expression return? In this case, it returned null but clearly it wouldn't be very useful if that was always the case.
Indeed, the yield expression returns the argument of the "next" function. Since we did not specify any arguments, undefined was returned, but that can change:
function * generator() {
   return yield 1;
}
var g = generator();
g.next(); //returns { value: 1, done: false }
g.next(42); //returns { value: 42, done: true }
This isn't the only feature iterators have. Generator objects have another method called "throw". When "throw" is called, execution of the generator function resumes. However, instead of returning a value, yield throws an exception.
function * generator() {
   try {
      yield 1;
      return "success";
   } catch (e) {
      return e;
   }
}
var g = generator();
g.next(); // { value: 1, done: false }
g.next(); // { value: 'success', done: true }
g = generator();
g.next(); // { value: 1, done: false }
g.throw('my exception'); // { value: 'my exception', done: true }
}
As we'll see, these two differences -- yield being an expression and JavaScript iterators having a "throw" function -- are what make asynchronous programming easy with JavaScript yield but cumbersome with C# yield, which ultimately explains why C# needed something new.

Asynchronous programming

In many ways, asynchronous functions are a lot like generators. They perform operations, but then are paused waiting for other events to happen before they can be resumed.
page := Wait for Document to be retrieved from the Internet
words := page.Words
return words.Count
In languages such as Java, the easiest way to do this is to start a new thread and run the operations in that thread, blocking whenever a slow operation is found.
We'd not be using threads to take advance of the multi-core age or anything like that. We'd be using threads to avoid blocking the main program thread because the alternative - to manually set up events and listeners - is boring, ugly and error-prone. Of course, threads have their own overhead.

If we can "pause" and "resume" functions, we no longer need to use threads for basic asynchronous programming.
In C#, the chosen syntax for this was:
public async Task<int> GetWordCount(string url) {
   var page = await DownloadDocumentAsync(url);
   var words = page.Words;
   return words.Count();
}
Note that asynchronous methods are marked with the "async" keyword and return a Task instance.
In JavaScript, there are libraries (e.g. Q) that implement async/await using yield.
var getWordCount = Q.async(function*(url) {
   var page = yield downloadDocument(url);
   var words = page.words;
   return words.count();
});
The end-result is similar. But let's see what Q does to transform yield into await.

Q is a library to implement Promises in JavaScript. In Q, promises are objects with a few special functions. One of the most important is "then". The then function takes two callbacks as arguments which it calls when the function is successfully terminated (first callback) or when the function is terminates with an exception (second callback).

When the next() function is executed, it can yield a promise, return the final value or throw an exception.
If it throws an exception, then Q calls the error callback.
If it returns a value, then Q returns that value.
If it yields a promise, then Q calls the then function on that promise to register two callbacks. The success callback of that promise goes back to the beginning of these steps to call next() again (with the argument of next being the result of the promise) and so on. The error callback of that promise uses the iterator.throw function to notify the generator function that an error occurred. As such, errors in the promise can be caught by a normal try...catch statement.

Emulating Await in C#

In C#, yield wouldn't work as nicely as a replacement for await.
For starters, yield is more restrictive than await. Yield can not be used in anonymous delegates/lambdas. But that's not the main reason.

"value = await promise;" is very useful but it can only possibly compile because "await promise" is an expression. Additionally, unlike what happens with await and JavaScript yield, there is no way to notify the generator function when an exception is supposed to happen.

Still, it can be done. It's not pretty, but it can be done. I've updated my NWarpAsync project to include a library(EmulateAwait) that does exactly this. This is how it looks like:

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using NWarpAsync.EmulateAwait;

class Program
{
    static IEnumerable<TaskIteration<int>> AsyncMethod(AsyncContext<int> context)
    {
        int arg1 = context.Argument<int>(0);
        string arg2 = context.Argument<string>(1);

        yield return context.Await(Task.FromResult(arg1 + arg2.Length));
        int result = context.GrabLastValue();

        yield return context.Return(result * 2);
    }

    public static void Main()
    {
        Task<int> task = AsyncBuilder.FromGenerator<int>(AsyncMethod, 10, "Hello");
        Console.WriteLine(task.Result); //Prints 30
    }
}
And that's it. I have reused the TPL here because it existed even before await was added to C#. The goal of this library is to allow developers to use yield and await, so whatever the library uses internally is fair.

Whereas NWarpAsync.Yield had a few practical uses, this one does not. It is merely an exercise in "Could X be done to do Y?"
Well, the answer is: Yes, it can. But it shouldn't.

If you're a C# user, stick to await.
But if you're someone developing a new programming language and deciding which features to include, I hope this post was helpful.

Merry Christmas.

Further Reading

luiscubal/NWarpAsync
The official github repository for NWarpAsync (including EmulateAwait).

Coroutine - Wikipedia, the free encyclopedia
Coroutines are special functions that can be suspended and resumed. Yield and Async/Await are two coroutine mechanisms.

luiscubal's code sketchbook: Introducing NWarpAsync - Emulating yield return using await
A previous blog post describing the reverse (implementing yield on top of await)

How yield will transform Node.js
A blog post describing how yield and promises can be used to simplify node.js development.