Don't Call Us, We'll Call You

Monadic IO and the Sugarloaf Transformation

Saturday, January 7, 2017 · 3 min read

Look. I get it. Monad tutorials are a meme, memes about monad tutorials are a meme, and we’re almost at the point where memes about memes about monad tutorials are a meme. Before we go any farther down this infinite stack of turtles, though, I want to share one way to think about monads that doesn’t involve burritos or boxes or any Haskell code at all. I present to you, the Sugarloaf Transformation.

The overall idea makes sense to me because I’m a fan of visual metaphors. Your mileage may vary.


Consider the dataflow programming paradigm (think of Quartz Composer or LabView-G). A dataflow language takes the idea of an abstract syntax tree and extends it to the tree’s close relative, the directed acyclic graph. A dataflow program is a DAG where the edges entering and leaving each node are ordered. Each edge is associated with a value, and a node computes the values of its out-edges based on the values of its in-edges. Nodes with no in-edges are called sources, and nodes with no out-edges are called sinks. The program below computes the difference between 3 and 1, and prints the number 2. The two nodes on the left are sources that return constants; the print node is a sink.

+---+
| 3 |--+
+---+  |   +-------+
       +-->|       |   +-------+
           | minus |-->| print |
       +-->|       |   +-------+
+---+  |   +-------+
| 1 |--+
+---+

To interpret a dataflow program, we have a variety of strategies. We could begin at sinks and recursively descend into in-edges until we reach sources, which can be evaluated with no further work. This corresponds to a recursive-descent interpreter of an AST. Alternatively, we could begin by collecting the set of sources, evaluating them, assigning the results to the respective out-edges. Then, we iteratively find and evaluate nodes whose in-edges have all been assigned values, until at last we have evaluated the entire program.

The point I wish to make is that the order in which nodes are evaluated is nondeterministic. Thus, the following “obvious” program to subtract two numbers is not well-defined. If given inputs “3, 1” by a human, it may output “2” or “-2” depending on the order in which the two read nodes were evaluated.

+------+
| read |--+
+------+  |   +-------+
          +-->|       |   +-------+
              | minus |-->| print |
          +-->|       |   +-------+
+------+  |   +-------+
| read |--+
+------+

The problem is that nondeterministic evaluation order for IO results in the evaluator potentially diverging. This is really bad! How can we force a deterministic order of execution for IO primitives? Well, notice that the only way to force node x to evaluate before node y is to make y depend on the output of x. We can have each node output an “ordering sentinel” that becomes the input to future nodes: this allows us to impose a partial ordering on node evaluation, as shown below.

+-------+   +------+     +------+
| start |-->|      |---->|      |
+-------+   | read |     | read |
            |      |-+   |      |-+
            +------+ |   +------+ |
                     |            |   +-------+
                     |            +-->|       |   +-------+
                     |                | minus |-->| print |
                     +--------------->|       |   +-------+
                                      +-------+

The start node simply emits an ordering sentinel, to kick off the computation.

This looks really good: this program is, in fact, well-defined in its execution order. Sadly, a partial ordering still allows divergent programs. Why? It’s because, in the words of Philip Wadler, “Truth is free.” Once you have computed a sentinel, you can use it wherever you want. What prevents us from doing this?

               +------+
           +-->|      |
           |   | read |      +-------+
           |   |      |----->|       |   +-------+
+-------+  |   +------+      | minus |-->| print |
| start |--+             +-->|       |   +-------+
+-------+  |   +------+  |   +-------+
           +-->|      |  |
               | read |  |
               |      |--+
               +------+

Nothing. Yikes.


If the problem is that “truth is free,” we might be able to solve the problem by simply restricting the flow of information created by IO primitives. One solution is to sandbox all computation that uses the output of an IO node. An easy way to do this is to “lock” the output of an IO node in a special pipe that can only be unlocked by entering a sandbox. Of course, sandboxes must only allow locked values to exit—otherwise you can trivially pass a value through the null sandbox to unlock it. Our program then becomes the following (the dotted boxes denote sandboxes, and fat arrows are locked).

            .................................................
+------+    :                                               :
| read |===>@---------------+                               :
+------+    :               |                               :
            :             ..|............................   :
            :             : |                           :   :
            :             : |   +-------+               :   :
            :             : +-->|       |   +-------+   :==>:==> ...
            : +------+    :     | minus |-->| print |==>:   :
            : | read |===>@---->|       |   +-------+   :   :
            : +------+    :     +-------+               :   :
            :             :                             :   :
            :             :.............................:   :
            :                                               :
            :...............................................:

Notice that since a sandbox only unlocks one pipe at a time, and does not allow the unlocked value to escape, our previous divergence hack becomes impossible. minus takes two unlocked inputs, so they must be computed in nested sandboxes which ensure sequential execution. Indeed, by generalizing, it is not hard to see that now there is only one way to compute the contents of a locked pipe. This means that the result of a locked pipe is literally the same as the computation that goes into evaluating its contents—and so our programs must have a well-defined execution order.

In Haskell, locked pipes correspond to values “boxed” in a value of type IO a (this is where the burrito metaphor stems from). Sandboxes are lambdas, and the unlocking mechanism @ is the monadic binding operator >>=. The return function in Haskell is simply a node that locks its input—this lets us return an arbitrary value from a sandbox. Finally, do-notation is just syntactic sugar to save programmers from reasoning about lots of nested sandboxes, in the same spirit that C has in letting you write 1+2+3+4 instead of 1+(2+(3+4)).


P.S. The title of this post refers to the idea that this transformation in a way inverts the traditional flow of control. Normally, minus would call read twice to get the values of its inputs. Here, however, the reads are calling minus to create their output. “Don’t call us, we’ll call you.”

◊ ◊ ◊