Learning to Play the Chaos Game
I, Thomasina Coverly, have found a truly wonderful method whereby all the forms of nature must give up their numerical secrets and draw themselves through number alone…
Friday, December 25, 2020 · 7 min read
It’s Christmastime again! Shall we talk about trees? This post is about my new holiday hobby: hallucinating tree-shaped fractals using our old friend, gradient descent.
But — let me start at the beginning. For a variety of reasons, the cover image of Douglas Hofstadter’s I am a Strange Loop has been on my mind this month. I have been thinking a lot about this wonderful image created by pointing a camera at a projector that displays what the camera is seeing. In modern terms: what happens if you screen-share the screen sharing window? This happens:
Such “feedback loop” recursions often rapidly converge to breathtaking fixed-points. The NYC MoMath even has an art installation that lets you play with a camera-and-projector setup to make wonderful patterns.
As you can imagine, there is a mathematical theory here. At risk of ruining the mystery, I will tell you about it, though without writing any equations. :-) The theory is the theory of Iterated Function Systems, which is almost exactly what it sounds like. Start with a set of affine functions (affine because that’s what camera-projector-systems do) and repeatedly apply them to a set of points, taking the union at each step. Regardless of where you start, you will soon end up with a fractal structure which is the fixed point of the system. Why fractal? —because the self-similarity comes from the infinitely-nested composition of the affine functions.
There are theorems that ensure convergence towards and uniqueness of this fixed point in the limit, but… it’s believable enough, don’t you think? Following your intuition, you can build Sierpinski Triangles and Barnsley Ferns and all sorts of other beautiful structures using an IFS. I refer you to Section 8.2 of The Algorithmic Beauty of Plants by Prusinkiewicz and Lindenmayer (yes, that Lindenmayer) for more botanical connections.
Now here is the question that has been on my mind: if I give you a picture of a fractal-fern, can you give me a set of affine functions whose fixed point is that fern? This is the inverse problem for IFSes. According to Wikipedia, this problem has real-world applications in image compression, but it is hard to do in general.
Hmm. Interesting. Can our favorite tool, gradient descent, come to the rescue? At least in an approximate sense? Recall that a two-dimensional affine function is really just a 3x3 matrix (with 6 free parameters) that encodes the details of the transformation. If we could easily compute images of IFS fixed points from a set of matrices, then perhaps we could try to optimize the parameters of these matrices to make the fixed point look the way we want it to.
The challenge, of course, is efficiently finding the fixed point — repeatedly rasterizing and compositing affine transformations on images sounds expensive — and doing so differentiably sounds like a scalability nightmare! But there is a wonderful solution to this problem, which is to play the so-called “chaos game.” Instead of densely applying all the functions to all the points on the plane, you can sparsely approximate the fixed point as follows: Start with a point — any point! — and randomly select one of the affine functions of the IFS and apply it to the point. Repeat this process with the new point. It turns out that in the limit, the “trail” left behind by this wandering point converges to the fixed point of the IFS. (This, too, is a theorem, but again I think it is believable enough that I will not demand a proof.) This is what the “chaos game” looks like for the Sierpinski triangle; notice how the salt-and-pepper spattering of points soon converges boldly into the fractal we know and love (it’s an animation — stare for a few seconds).
So here is the plan: we start with a point, and play the chaos game with our current IFS matrices to obtain a point cloud that stochastically approximates the fixed-point. Then, we compare this point cloud to our target fern-image to obtain a “loss.” Finally, we update our IFS matrix parameters to minimize this loss, until at last the fixed-point converges to the target image. It’s just “machine learning,” really: we’re learning to play the chaos game!
Here is an outline of the algorithm (with some details elided):
# initialize IFS
F = [random_3x3_affine() for _ in range(4)]
o = torch.optim.Adam(F, lr=0.0001)
for step in range(100_000):
o.zero_grad()
# start at origin
v = torch.tensor([0., 0., 1.])
# play the chaos game once
trace = []
for _ in range(200):
# applying an affine function is just
# a matrix multiplication!
v = torch.matmul(random.choice(F), v)
trace.append(v)
# treat the trace as a point cloud
loss = compare(target_image, trace)
# update parameters
loss.backward()
o.step()
Ah! Actually, there is one detail that I should explain: how do you compare a point cloud and an image? My idea is to convert the target image to a second point cloud by uniformly sampling points from it, for example by rejection-sampling. Then you can compare the two point clouds by the so-called “chamfer distance,” which is the mean distance from each point to its nearest neighbor in the opposite point cloud. This is quadratic-time to compute, and the most expensive part of the whole operation, but with ~100 points in each set it is quite doable.
A pedagogical aside: notice that the “trick” here is really a change in representation. This optimization problem is easier to solve on point clouds than on raster images! An important lesson, that applies across the ML-for-graphics domain… and more broadly to all differentiable programming enterprises.
Remarkably, this zany scheme works quite well! Let’s try it with a heart. (A heart, because it’s an easy low-entropy shape, but also because it felt symbolically appropriate in relation to Hofstadter’s broader philosophical project of souls-within-souls-within-souls.)
When we initialize the system with 4 random affine transformations, the trail left behind by the chaos game is very unimpressive. The sparsity is because I simulate only 200 steps of the chaos game for each step of gradient descent — there’s a tradeoff between noise and computation time, of course, as there is with any form of SGD.
Okay. Time to optimize! Here is what it looks like after 100K steps of optimization with an Adam optimizer (that’s about 40 minutes of wall-time computation on my old MacBook). It looks pretty good, don’t you think? The chamfer distance scheme works!
Now we can export the four affine matrices out from PyTorch and work out the “true” fixed point with a more expensive offline computation (for example, by simulating a few million steps of the chaos game). It looks like this — not perfect, and indeed quite crayon-scribble-ey, but clearly something interesting has happened, and the result is convincingly heartlike.
Where is the fractal structure hidden in this heart? This needs some vizualization tooling to see clearly. After a bit of JS-canvas-hacking: here is a GIF of the heart that reveals its fixed-point structure. Hearts-in-hearts-in-hearts!
I will admit to blinking a few times when I first saw this animation play out in full — I did not expect it to work, and even now I find myself marveling at this creation? discovery? of a heart encoded in 24 numbers.
But it still doesn’t look quite right — the fractalness isn’t obvious. How come? It’s because our affine transformations stretch and squash the heart into almost unrecognizable blobs. A final, natural improvement is to force our affine transformations to be rigid, so that there isn’t any squishing or skewing in the fractal. This is actually not too bad to implement: the solution to problems of constraint are typically reparametrizations, and indeed in this case we can simply reparametrize the matrices in terms of a rotation, an (isotropic) scale, and a translation. Now we’re down from 6 parameters per matrix to just 4 — even more magical! The result is a much more “obviously” fractal-ey fixed point, simply because your eye can more easily pick out the structural recursion when it is made of rigid motions.
Let’s take it for a — pardon — spin!
Since I promised you trees, here is some fractal foliage. The target image was a picture of a maple leaf! I’d never thought about this before, but indeed a maple leaf contains within it an echo of the whole tree. (Full disclosure: it takes a couple of randomized restarts to get a really impressive result — the low dimensionality of the fractal’s parametrization leads to the same stuck-in-a-local-optimum woes that differentiable rendering folks face all the time…)
This one uses just three matrices — experimentally, three matrices tend to be enough to get really good results, and more just lead to very busy and crowded fractals. (I don’t know if there is a word for number-of-maps–in-an-IFS, but let’s call it the IFS’s arity. In this terminology, I like ternary IFSes.)
And finally, since I promised you Christmas trees…
This last animation is actually absolutely baffling to me. In part, this is because of how treelike the fractal turned out — here it is overlayed against the silhouette I optimized against. Can you imagine this is encoded by just 12 parameters? Odd.
But even more bafflingly: the actual geometry of the recursion is nothing like the tree recursion geometry you and I are used to from CS106! Compare the GIF above to Barnsley’s Fern: while Barnsley turns each full leaf into two smaller leaves with “semantic” affine maps, this Christmas tree does all sorts of bizarre uninterpretable cartwheels and somehow almost magically works itself out in the end. It is mesmerizing to me, I can stare at it for minutes at a time.
When you think about it, it is quite shocking that a Christmas tree is the unique fixed point of these 3 rigid affine maps. “Beautiful” isn’t the word for it — neither is “grotesque” (though both words have been used by various people I have shared this with). There is something just unsettlingly fascinating about the way something magical pops out from three rectangles. Maybe I should have demanded a proof of that theorem after all…
There is so much more to say about all of this. What shapes are easiest to IFSify, and why? In higher dimensions, can we treat the IFS as a kind of SVD-esque dimensionality reduction scheme for data (i.e. literal point clouds)? What kind of data would it make sense to do this for — where do you find spatial self-similarity? Do point clouds have an inherent IFS “dimensionality” (in terms of, say, the number of affine transformations you need to get a good approximation)? With some more work, could SGD be a legitimate solution to the open problem of fractal compression?
It’s a lot to think about — but for now… well, there is a Calvin and Hobbes gag that begins with Calvin saying “I’ve been thinking” and Hobbes interrupting “On a weekend?” That’s how I’ve been feeling this winter break! Now it is time to shutdown the jupyter kernel and get some rest — Comfortably Numbered will return in 2021!
(Jupyter notebook for this blog post available on Github.)
Update (Dec 27): Check out some improvements made by a reader!
Update (Dec 29): I was pointed to a 1998 paper (Melnik and Pollack) that (independently!) tells exactly the same story. I’m struck by how closely the two expositions align. There is something deeply comforting about rediscovering a place someone else has visited once-upon-a-time, like finding a flag buried under the snow on a mountaintop — a kind of storybook enchantment that sits across the table from loneliness.