It’s the end of week 4 of the quarter and you’ve decided it’s finally time to do some laundry. You somehow transport the massive mound of clothing in your closet to the laundry room, occupy all three machines and also the dishwasher, and finally transport the now-slightly-less-smelly mound of clothing back up the stairs to your room. Then you turn on your favorite Pink Floyd album and begin folding…
You pull out items of clothing from the heap one by one. Shirts and pants you can fold immediately, but socks pose some difficulty: you can’t put them away until they have been paired. So you put them aside on your bedside table to be processed later. Soon, however, the pile of socks on your bedside table grows too large, and the table can only fit so much sock on it.
So you decide to try and manage the situation by eagerly pairing up socks. If you pull a sock out of the heap of clothes and see its partner on the bedside table, you pair them up and put them away. Otherwise, you put the unpaired sock on the bedside table.
Suppose you own an infinite number of socks, each of which is uniformly one of ($ k $) colors. How big does your bedside table need to be? Let’s call the pile of clean clothes the “stream,” and the bedside table the “buffer.” Then the question is, what is the distribution of the buffer size over time as a function of ($ k $)?
Here is one way to think about this: if ($ k = 1 $) then the buffer size is 0 half the time and 1 half the time; the distribution is normal at mean 0.5 and variance 0.25. For larger ($ k $), we can think of this as just a superposition of multiple buffers, one for each ($ k $). The means and variances add, so we expect the distribution to be normal with mean ($ k/2 $) and standard deviation ($ \sqrt{k}/2 $).
Here is another way to think about this: we have a Markov process where the transition from state ($ i $) to ($ i + 1 $) is with probability ($ (k - i)/k $) and to ($ i - 1 $) is with probability ($ i/k $) for ($ 0 \leq i \leq k $). We can create a matrix representing this transition system; then, eigenvectors of this matrix (with eigenvalue 1, and normalized to sum 1) represent equilibrium probability distributions.
So, incredibly, we’ve shown that the coordinates of the ($ \lambda=1 $) eigenvector for this matrix can be computed using the normal CDF. What a fascinating correspondence!
If you don’t believe it, the graph below shows the normal-CDF predictions (orange line) and the elements of the eigenvector (blue dots) for ($ k = 100 $). The IPython notebook that generated this graph is here.
]]>Quick back-of-the-envelope analysis of a windmill, just for fun.
According to National Wind Watch a typical windmill is a GE 1.5-megawatt model which has 116-ft blades.
Let’s say the wind speed is ($ v $), which is typically 10-15 m/s on a good day. Then in time ($ t $) the volume of air that passes across the windmill is a cylinder of height ($ tv $) and area ($ \pi r^2$). Air’s density, ($ \rho $), is around 1.225 kilograms per cubic meter. This gives a total mass of ($ M = \rho tv\pi r^2 $). Taking ($ v $) to be a modest 10 m/s, this works out to 48 metric tons of air per second.
Now let’s apply conservation of energy on this mass of air. Initial energy is ($ Mv^2 / 2 $) and final energy is ($ Mw^2/2 $), and the difference is on the order of the energy output, which is a function of the power output and ($ t $) assuming reasonable efficiency in the turbine.
So, we have ($ \rho tv\pi r^2 v^2 / 2 - \rho tv\pi r^2 w^2 /2 = Pt $). Time cancels, which is comforting since this is a continuous process. The equation that’s left suggests that power output of a windmill is proportional to the cube of the wind speed!
Another thing to think about is how much the wind slows down by. Solving for ($ w $) and taking ($ P $) to be 1.5 MW, we have ($ w = \sqrt{ \frac{2P/\rho \pi r^2 - v^3 }{-v}} $). For 15 m/s winds, this means wind slows down by around 10% because of the windmill.
Actually this wasn’t “just for fun,” it was to show off initial progress on my side-project. :-)
Oh, and an unresolved question: because air is coming into the windmill faster than it’s going out, the momentum flux across the plane of the windmill is net negative. Applying Gauss’ law, we would expect a buildup of air in the windmill, which obviously doesn’t happen. What’s wrong?
]]>I was in Los Angeles over winter break, and on the long drive back home I began thinking about a billboard I saw just at the edge of the city advertising the “closest casino to anywhere in LA.”
This is a fascinating claim. Let me rephrase it, at least the way I interpret it: the claim is that wherever you are in LA, the closest casino is the one advertised on the billboard. (This is confusingly distinct from the claim that the casino is closest to anywhere in LA, in the sense that the placement of the casino minimizes the distance to the nearest bit of LA soil. Of course, practically speaking this latter claim is useless because any casino within LA trivially has the minimal distance of zero to “anywhere in LA.”)
The question is this: what region does the billboard’s claim imply is devoid of casinos?
Let’s start with a simple case to get some intuition. Suppose LA is a 15-mile-radius disk, and the casino is at the center of the disk. Then, according to the billboard, there must be no other casinos within LA’s 15-mile radius (otherwise, if you were in LA, you might be closer to that other casino!). But actually, the claim is quite a bit stronger: there cannot be any casinos within thirty miles of the center. Why? Well, suppose there was a casino 20 miles from the center of LA. Then, someone just within the city borders would be 5 miles away from that other casino, but 15 miles from the city-center casino.
One more simple case: suppose LA is a line segment of length 15 miles, and the casino is located at one endpoint. Can you imagine what the region in question must be? It is a disk of radius 15 miles, centered at the other endpoint! This is not entirely obvious, and you might need to draw a picture to convince yourself that this is true.
Now let’s consider a much trickier case: suppose LA is a circle again, but the casino is located along the circumference. Suddenly, it’s much harder to picture what’s going on — sitting in a car without pencil or paper, I had no idea what the region might look like. My instinct was “circle centered at the diametrically-opposite point on the circumference,” but it turns out that this is wrong!
In the rest of this post, we’ll build up some mathematical machinery to answer this question correctly. If you don’t want to work through the math, however, then feel free to just scroll to the red-bordered squares and enjoy the interactive demos.
Definition. Given some region ($ R \subset \mathbb{R}^2 $), the casino closure with respect to some point ($ p \in \mathbb{R}^2 $) is defined as
\[ R^p = \left\{ c \in \mathbb{R}^2 ~\middle|~ \exists z \in R, || p - z || > || c - z || \right\} \]
Or, informally: the casino closure is the set of possible casino locations ($ c $) such that there exists a person ($ z $) in the region ($ R $) who is closer to ($ c $) than to ($ p $).
From here on out, I’m only going to worry about “nice” regions, i.e. closed, connected regions with smooth boundaries. I’m also going to use ($ x, p, z, c $) to range over points in ($ \mathbb{R}^2 $), but really most math that follows is equally applicable in ($ \mathbb{R}^n $). It’s just harder to visualize.
Lemma. Let the disk ($ D(z, r) $) be the set of all points in ($ \mathbb{R}^2 $) within ($ r $) of point ($ z $). Then, \[ R^p = \bigcup_{z~\in~R} D(z, ||p - z||) \]
That is, we can construct ($ R^p $) by combining all the disks at all the points in ($ R $) that have ($ p $) on their circumference.
Proof sketch. This should make sense “by construction.” If point ($ c $) is in this constructed ($ R^p $), then it lies in some disk ($ d $), and thus the point ($ z \in R $) that created disk ($ d $) fulfills the existence criterion of our definition.
Lemma. If region ($ R $) has a boundary ($ B \subset R $), then ($ B^p = R^p $).
Proof sketch. Clearly, ($ B^p \subset R^p $) if you believe the first lemma — adding more points to ($ B $) should only increase the union of disks.
The harder direction to show is ($ R^p \subset B^p $). Consider some point ($ z \in R$). Extend the ray ($ \overrightarrow{pz} $) until it intersects with ($ B $) at point ($ z^\prime $). Then, we can relate the disks created by ($ z $) and ($ z^\prime $): ($ D(z, ||p-z||) \subset D(z^\prime, ||p-z^\prime||) $). Why? Because the disk from ($ z $) is smaller and internally tangent to the disk from ($ z^\prime $).
Mapping this argument over the entire union, it makes sense that ($ R^p \subset B^p $).
Okay, time for some empirical verification of all this theory. Try drawing the boundary of a region (in black) here! The disks will show up in red. Notice how filling in your region with black dots doesn’t change the red blob at all. (Click to clear.)
The boundary-circle lemma gives us a nicer characterization of ($ R^p $): it is the region bounded by the curve that is tangent to all of the disks created by the points on ($ B $). It turns out that there is a very nice mathematical theory of “the curve that is tangent to all curves in a given family of curves,” and that is the theory of envelopes. The account below is paraphrased from “What is an Envelope?”, a lovely 1981 paper.
Let the function ($ F(x, t) : \mathbb{R}^2 \times \mathbb{R} \rightarrow \mathbb{R} $) define a family of curves parameterized by ($ t $), in the sense that ($ F(x, 0) = 0 $) defines a curve and ($ F(x, 1) = 0 $) defines another curve, and so on. Then, we seek to characterize the envelope curve which is tangent to every curve in ($ F $).
Lemma. If the boundary of ($ R $) can be (periodically) parameterized as ($ B(t), t \in \mathbb{R} $) then the boundary of ($ R^p $) is the envelope with respect to ($ t $) of \[ F(x, t) = || x - B(t) ||^2 - || p - B(t) ||^2 \]
Proof sketch. Again, this should make sense “by construction”: ($ F $) is chosen to correspond to circles centered at ($ B(t) $) passing through ($ p $).
Ah, but how do we find the envelope? Here we need a tiny bit of multivariable calculus.
Let ($ X(t) $) be the parameterization of the envelope of ($ F $). Then for all ($ t $), we have that ($ F(X(t), t) = 0 $) because the envelope must lie on the respective curve in the family (“tangent” means “touch”!). We also have that the curve ($ X(t) $) must be parallel to the member of family ($ F $) at ($ t $). We can then express this condition by saying that the gradient (with respect to ($ X $)) of ($ F $) at ($ t $) is perpendicular to the derivative of ($ X $) at ($ t $). Or:
\[ d X(t) / dt \cdot \nabla_X F(X(t), t) = 0 \]
In two dimensions, with ($ X(t) = (x(t), y(t)) $), this equation manifests itself as ($ x^\prime(t)\partial F(x, y, t) / \partial x + y^\prime(t)\partial F(x, y, t) / \partial y $).
The left hand side is oddly reminiscent of the multivariable chain rule. Indeed, if we took the partial derivative of our equation ($ F(X(t), t) = 0 $) with respect to ($ t $), we would get:
\[ dF(X, t)/dt = dX/dt\cdot\nabla_X F(X, t) + dt/dt\cdot \partial F(X, t)/\partial t = 0 \]
So we must have ($ \partial F(X, t) / \partial t = 0$).
There is a simpler but less rigorous derivation if you believe that the envelope is exactly the points of intersection of infinitesimally close curves in the family ($ F $). Then we want every point ($ X $) on the envelope to satisfy both ($ F(X, t) $) and ($ F(X, t + \delta) $) for some ($ t $) and some infinitesimal ($ \delta $). Taking the limit as ($ \delta $) approaches zero gives the same condition that ($ \partial F(X, t) / \partial t = 0 $). The paper above discusses how this notion is subtly different in some strange cases, but it suffices to say that for all “nice” ($ R $), we’re fine.
Almost-a-theorem. The boundary of ($ R^p $) is given by the parameterized vectors ($ X $) that satisfy ($ F(X, t) = 0 $) and ($ \partial F (X, t) / \partial t = 0 $) for the ($ F $) defined above.
Almost-a-proof-sketch. Almost! In general, the solution for ($ X $) might self-intersect, so we want to take only the “outermost” part of ($ X $). But this is easy to work out on a case-by-case basis.
Example. Suppose ($ R $) is the unit disk centered at ($ (a, 0) $), and ($ p $) is located at the origin. Then ($ B(t) = (\cos t + a, \sin t) $) and ($ p = (0, 0) $). We have
\[ F((x, y), t) = ((x - (\cos t + a))^2 + (y - \sin t)^2) - ((\cos t + a)^2 + \sin^2 t) \]
\[ F((x, y), t) = -2ax + x^2 - 2x\cos t + y^2 - 2y \sin t = 0 \]
We also have
\[ \partial F((x, y), t) / \partial t = 2x\sin t - 2y\cos t = 0 \]
Solving these by eliminating ($ t $) is a simple exercise in polar coordinates. Discover from the second equation that ($ t = \theta $), then recall that ($ r^2 = x^2 + y^2 $). The resulting boundary of ($ R^p $) is (almost!) the curve ($ r = 2(1 + a\cos\theta) $). In other words, it’s (almost!) a limaçon! As ($ a $) varies, the character of the limaçon varies, and at the critical points ($ a = \pm 1 $), we get a cardioid with a cusp. Beyond those critical points, the curve has an inner loop that we have to ignore; hence, “almost!”
Okay, time for more empiricism. Move your mouse around in the square below to see how the relative placement of LA and the casino affects the envelope. Notice also the inner loop predicted by the envelope, which we should of course ignore for the purposes of bounding ($ R^p $).
An amazing fact is that a cardioid is the same shape you get on the surface of your coffee mug when you put it under a light! Well, not quite — it depends subtly on where the light source is. Read more about caustics at Chalkdust, from whom I also borrowed the image below:
Further reading: Wikipedia has great diagrams to accompany. Dan Kalman’s article “Solving the Ladder Problem on the Back of an Envelope.” includes a nice pedagogically-oriented discussion of envelope subtleties. It cites Courant’s Differential and Integral Calculus (vol 2), which is freely available on Archive.org.
]]>I recently read this Atlantic piece on input methods for computers, and it reminded me of a mathematical adventure I had this summer that I should have blogged about at the time.
Back in July, I was exploring the City Museum of New York when I came across this lovely Crown typewriter from the late 1800s. By our modern standards, it uses a rather clunky input mechanism: you manually shift the pointer to the character you want to type, press the button, rinse, repeat.
“What a waste of time,” you say. Ah, but even in the late 19th centure time was the essential ingredient, and in the modern world there was no time. Notice, then, that the designers of the Crown typewriter did not place the letters in alphabetical order. Instead, the they placed commonly-together letters near each other. You can type “AND” and “THE” rather quickly with just 2 shifts apiece; “GIG,” on the other hand, takes 15 shifts from the G to the I, and 15 shifts back to the G.
The question, then, begs to be asked: is this the most efficient permutation of letters? For example, Q and U are on opposite ends of the typewriter; surely it would be better to put them together?
I have a doubly disappointing answer to this question: first, this is not the most efficient permutation of letters (based on my digraph frequency map), but I also don’t know what the most efficient permutation is! That is, I’ve found permutations that are more efficient, but cannot prove that they are optimal — after all, exploring all 26-factorial permutations is not an option.
My “cost” metric here is defined as “Ignoring non-alphabetic characters, how many shifts does it take to type all of Pride and Prejudice followed by all of The Time Traveller followed by all of The Adventures of Sherlock Holmes?” (All three are classic 19th-century novels available from Project Gutenberg.)
By this metric, the naive alphabetical order (ABCDEFGHIJKLMNOPQRSTUVWXYZ
)
requires a whopping 9,630,941 shifts. In comparison, the Crown Typewriter
(XQKGBPMCOFLANDTHERISUWYJVZ
) requires only 6,283,692 shifts. But the
ComfortablyNumbered typewriter (ZKVGWCDNIAHTESROLUMFYBPXJQ
) requires a mere
5,499,341 shifts. This means we can save nearly one in eight shifts from the
Crown typewriter!
I found this permutation with the following algorithm: start with a random permutation, then “optimize” it by repeatedly swapping characters such that the post-swap permutation has a lower cost than the pre-swap permutation. Continue “optimizing” until no further swaps can be made. Now, do this procedure for a large number of random starting permutations, and pick the lowest-cost optimized permutation.
Encouragingly, it turns out that many different random starts get optimized to my solution. What do I mean by that? Well, the graph below shows two datasets: the costs of 1,000 random permutations and the costs of those permutations once optimized. Clearly, optimization is doing great good.
But let’s zoom in on the optimized permutations. Notice that the two lowest-cost bars are almost an order of magnitude more frequent than the remaining bars. In fact, the lower-cost bar’s “bucket” in the histogram is populated only with permutations of cost 5499341! This suggests that 5499341 is a “hard ceiling” for my optimizer, rather than the furthest point on the tip of a long tail that my optimizer samples from.
Of course, this is no proof: there might be a single highly-efficient permutation that is hard to reach by optimizing a random permutation. But that feels unlikely!
So: I leave this as an open problem for readers to explore.
]]>Update (Dec 15): I found this blog post by Dennis Yurichev that tackles the same problem, but restated as “in what order do I mount these devices on a rack if I want to minimize the total length of cables between them”? Dennis finds an optimal solution for 8 devices with Z3… perhaps the solution scales to 26 “devices”? Intriguingly, his post was published just weeks before my visit to the City Museum!
Update (Jan 28): @rjp on Github has posted a blog post with several new approaches to this problem!
Welcome to this month’s edition of the Aperiodical Carnival of Mathematics! The Carnival is a monthly roundup of exciting mathematical blog posts. Last month, it was hosted by Paul at the Aperiodical. This month, it is my honor to host it here at Comfortably Numbered. But first…
Let’s play a game, shall we?
Pick a number. Not too large, though! You’re about to do some quick math on it. (I’ll play along with 4.)
Okay. Ready? Good.
Now take your number and square it. (4 squared is 16.) Then, add your original number to the square. (16 plus 4 is 20.) Finally, add forty-one. (20 plus 41 is 61.)
And now – your result – is it prime? Ha! I thought so. (61 certainly is.)
This little trick is due to Euler, who pointed out in 1772 that the polynomial ($ f(x) = x^2 + x + 41$) returns prime numbers for small integers — indeed, all nonnegative integers up to and including 39. Since then, the quest for other such “prime-generating” polynomials has fascinated number theorists from around the world. As a little exercise, you may try convincing yourself that there is no perfect prime-generating polynomial; that is, that there will always be at least one integer input that gives a composite output.
But I digress. Here is what matters: The integers ($ x $) for which ($ f(x) $) is composite are the deviants, the rebels, the ones who refuse to play along with Euler’s little game.
Forty is the first such integer.
One hundred fifty-nine is another.
Welcome to the 159th Carnival of Mathematics.
The Carnival always has a special place in its heart for clever ways to teach children various math concepts. And this one’s no exception. In Set Theory for Second Grade, Manan talks about how he designed an engaging lesson on set theory (and common multiples!) for second graders. A quote from his students: “Can you hang this in the hall so that everyone can see the college math we did?”
What an amazing moment — new symbols, new concepts, no problems! At this point, I made sure to remind them that what they are learning right now is no different from what I would teach in college. And that if today, here in second grade they could do college math, then in third grade they can do third grade math, in fourth grade they can do fourth grade math, and that they can always do math! More than a few students’ faces lit up.
In British Mathematical Colloquium, days 3 and 4, Peter Cameron recounts in excellent detail the last couple days of the Colloquium (it reads like a mini-Carnival!). Days 1 and 2 are linked within.
An induced subgraph of a graph is obtained by throwing away some vertices and the edges incident with them; you are not allowed to throw away an edge within the set of vertices you are keeping. Paul began with the general problem: given a graph H, can you determine the structure of graphs G containing no induced copy of H? … The answer is known in embarrassingly few cases … Not even for a 4-cycle is the answer known!
In Sum of Cubes is Square of Sum… And More!, Pat Ballew begins with a fact that most high-schoolers are taught, and then rather suddenly finds himself deep in a fascinating rabbit hole. (Editor’s note: I encourage you to read the author’s On This Day in Math series; I would list all thirty of the past month if I could…)
Like many teachers at the upper level high school math classes, over the years I’ve presented the sum of the Cubes of the natural numbers formula above many dozens of times. Then, perhaps like many others, I would point out how nice it is that it turns out to be the square of the nth triangular number, a happy coincidence that would make it easier to remember. Usually then, we would challenge them to extend the idea to fourth powers and see if they could do the induction proof, even though there was no really nice simplification (to my knowledge) of the sums of fourth powers.
But then I reread a book that has been in my library for about six years, and realized that many of those teachers may have known a different approach to sums of cubes equaling square of sums that I had been completely unaware of. In case there are other teachers who somehow also didn’t know, I share my newfound ancient knowledge.
In Thinking about the Law of Quadratic Reciprocity, Dan McQuillan gives a fast-paced overview of one of my personal favorite theorems in number theory.
Mathematics, the way it is currently written, can be difficult to read. Sometimes it helps to see how people think about a topic or theorem before (or after, or during) the reading of a proper treatment or rigorous proof. The purpose of this post is to provide such a view regarding the proof of the famous law of quadratic reciprocity. There are many details missing, on purpose, and the hope is that it reads like a good story that’s both interesting, believable and easily verifiable.
Just for fun!
In Magical Learning, John Cook reports the results from an informal Twitter poll he conducted: “If a genie offered to give you a thorough understanding of one theorem, what theorem would you choose?”
In case you missed it, Christian at the Aperiodical is running The Big Internet Math-Off. It may not have the intense moment-by-moment drama of the World Cup, but the daily tidbits of math are definitely worth subscribing for.
In Math Explained through Anagrams, Ben Orlin constructs a frankly impressive amount of anagrams for various parts of math. And they’re all illustrated! (Editor’s note: I also enjoyed this piece by the same author. I don’t want to spoil it, but here is a wonderful quotation: “I sometimes think that there are no puddles in math;” says Orlin, “there are only oceans in disguise.”)
That’s all I have for you this month! Come back next time when Robin at Theorem of the Day will host the 160th Carnival of Mathematics!
]]>Earlier this month, Frederick Koh, a Singaporean math tutor, invited me to write a guest post for his website. Of course I accepted — blogging is, after all, a conversation.
But what could I write about?
I spent the month of November on the lookout for an idea worth sharing, and last week I found one. I follow a lot of math-educator blogs (blogging is, after all, a conversation!), and one of my favorites is that of Dan Meyers. Recently, Dan posted a rather provocatively-titled piece, “Dismantling the Privilege of the Mathematical 1%“. He makes the case that those with a mathematical education — those who are mathematically privileged — those who make up the mathematical 1% — they are the ones with the responsibility to define what mathematics is. Dan says, eloquently as always, that “through our action or inaction we create systems that preserve our status as the knowers and doers of mathematics.”
Dan’s post made me think deeply about where I myself fit into his spectrum: I’m hardly one of the 1% — and certainly won’t graduate with a math degree — but at the same time, I love math, both as an activity and as a set of truths.
And I realized that this gives me a privileged position, one where I can comment on mathematical topics from a somewhat neutral perspective. I am neither the person who barely scraped through high school calculus, nor the person who skipped high school calculus because it was too easy. So, trapped between “math people” and “not a math person people”, I wish to use this post to explore what exactly creates this divide — or, rather, explore it in a way slightly different from what you have probably heard a dozen times already.
It is almost universally acknowledged that “education is good.” More education, says society, will lead to a happier, more prosperous world. Educate everyone, says society.
I agree, of course.
And yet, paradoxically, we are so attached to the notion that education — college, in my case, grad school or high school for others — is a means to distinguish oneself from one’s peers. A degree, we argue, makes us stand out in both the workplace and in society. The more competitive the institution, the more valuable the degree.
Less enthusiastically, I agree with this as well: reality forces me to concede the point. Why else would college admissions be so competitive? Why else would classes be graded on curves? Why else would we even have grades in the first place? As disturbed as I am by it, society cares very much about my academic performance, especially in relation to others.
And therein lies the rub. Reader, are these not contradictory notions? If the purpose of an education for an individual is to separate him or her from the general populace, then what follows is the absurd notion that universal education is self-defeating: that the more people we educate, the less an education is worth to the individual.
How does one reconcile this? Can education in the limit benefit both the individual and the society? As an optimist, I wish it could. And in fact I believe it can, but only if we rethink what education means to the individual.
Here is what I think. I believe that too often, we conceive education to be a ladder that lifts us — above others, if we’re fast enough — rung by rung. The more you climb, the higher you get.
But too often, we forget that you can also climb down a ladder. That an education can also lower the arrogant to humility and place them alongside the less privileged, on common ground. That education builds capacity for empathy and communication, empowering the individual but also society at large to have a dialogue. That this view resolves the paradox of the previous section, because both individuals and society benefit from the capacity for having that dialogue.
What do I mean by this? Let’s return to mathematics for a moment.
If you are reading this blog, you most likely have been at a gathering of mathematicians at some point in your lives. It is quite a marvellous thing to behold: a congregation of brilliant minds sharing ideas. Mathematics as a community has its own folklore, its own in-jokes, and its own language. You need only to glance at sites like The Art of Problem Solving or mathmo.org to see this community in all its glory.
I love it. There is charm to the way mathematicians celebrate their field, unlike any other profession I have come across.
And yet, imagine being an outsider for a moment. Imagine being part of the mathematical 99%. How would you feel if someone responded to a question of yours with a grin, saying “left as an exercise to the reader”? Or if someone made you use this weird software package with lots of backslashes to write up your homework? Or if someone went off on a tangent about why their coffee mug said “donut” on it? Or if someone makes an arithmetic mistake and then says “it’s true in base 12” before you even notice the error? Or if someone claimed a very unobvious-to-you solution was “trivial”?
These phrases aren’t meant to be exclusionary. Some are common inside jokes. Others are part of the mathematical vocabulary. The word “trivial,” after all, has a very specific mathematical meaning — think of the “trivial group” with one element, for example. I use it all the time.
Yet to an outsider, they make the mathematical community seem simply impenetrable. How am I ever going to understand all this? I can’t think that fast!
“Math people” tend to be remarkably self-selecting, and I believe this is one of the reasons why: there is a divide between the initiated and the uninitiated, and far too few resources for the latter.
Education, I believe, should be tasked with bridging this divide — rather than exacerbating it as it does now. Education should give the 99% the opportunity to join the magical world of mathematics, but education should also show the 1% how to open up the world to new members. It should teach students to write about mathematics, finding a middle ground between dense manuscripts weighed down by Greek-letter jargon, and airy puff-pieces that contain nothing of substance. Where will future Ian Stewarts, Martin Gardners, and Brian Hayes come from? I myself would almost certainly be very firmly a “not a math person” person were it not for an Ian Stewart book in my dad’s bookshelf that taught me about Fermat’s Last Theorem and the Mandelbrot Set when I was very young.
Education should also encourage the next generation of professors to move away from lessons that consist of the copying of lecture notes onto a chalkboard, and provide them with the tools they need to create engaging, interactive lessons that appeal not only to those whom we believe are pre-ordained to be mathematicians, but to artists, musicians, writers, and athletes. It should teach students about the bigger picture of where their mathematics fits into society, about who produces and who consumes mathematics, and why.
At the same time, it should explain to the mathematical 99% what exactly those math people are on about all the time. It should teach them the mathematical canon, help them learn the language, and help them discover the beauty that the 1% have already found.
Yes, such an education will produce a more diverse generation of mathematicians, not only in terms of demographics, but also in terms of ways of thinking. It will produce a generation that writes not only more effective grant proposals, but also clearer papers. That’s what the individuals get out of it.
But it will also inspire the mathematical community to rise up against the tyrrany of Alice-gives-Bob-three-bananas standardized tests, to use their passion for mathematics and their perception of its beauty to guide the development of curriculums and lesson plans. Not to reject non-math-people as heretics who refuse to see the light, but rather to see them as evidence of a broken education system that failed to convey the beauty of mathematics.
To redefine mathematics to be the way they see it, because, whether or not they realize it, the way the mathematical 1% sees mathematics — as the pursuit of beautiful truth — is far from what the mathematical 99% sees it as — the painfully rushed manipulation of symbols on a midterm. To care.
That’s what society gets out of it.
]]>This post first appeared on White Group Maths.
Welcome to this month’s edition of the Aperiodical Carnival of Mathematics! The Carnival is a monthly roundup of exciting mathematical blog posts. Last month, it was hosted by Lucy at Cambridge Maths. This month, it is my honor to host it here at Comfortably Numbered. But first: a story…
A long time ago, there lived a great and powerful king. His kingdom was rich and his subjects content; ambassadors from afar would often come to his palace gates to offer presents from their distant homes. One day, such a traveler approached the king and challenged him to a game of chess. The king, always welcoming to his guests, accepted the challenge. He sent a minister to fetch a chessboard and pieces for the game. Soon, they were playing an intense game; the court watched in a hushed silence as the great king and the foreign traveler concentrated on the board. At last, the king won.
Now, this particular traveler was quite an arrogant young man. Outraged that anyone could defeat him, he rose and smashed the chessboard with his fist. The carved wood shattered into a number of pieces, splitting at the delicate joints between the black and white squares. The pieces clattered noisily on the marble floor.
Normally, such behavior would be punishable by death. But this king was a merciful king: he understood the insecurities of youth, and he believed in the healing powers of recreational math. So, he decided to give the traveler a second chance. He said,
“Young man, do you notice anything about the shattered pieces of the chessboard?”
The traveler, embarrassed, paused before answering,
“Your majesty, each piece seems to be a perfect square: there are three 4-by-4 squares, twelve 1-by-1 squares, and one 2-by-2 square.”
“Very good,” replies the king, “very good indeed. Now, if you can tell me how many ways there are to break a chessboard into (indistinguishable) square pieces, I shall spare your life.”
The next day, the traveler returned to the palace, escorted by guards. Standing before the throne, he handed the king a card upon which was written his answer: 148, which coincidentally happens to be this month’s Aperiodical Carnival of Mathematics!
Ayliean MacDonald entrances us by drawing a giant dragon curve! She says there was a bit of a learning curve involved, but fortunately for us, her video is a magnificent timelapse.
I love dragon curves, so I drew a gigantic one. I regretted starting it almost immediately. Let’s just say this was my first attempt and its a learning curve.
Rachel Traylor extends sets to “fuzzy” sets, which are just like normal sets, but, well, fuzzy!
What if we relax the requirement that you either be in or out? Here or there? Yes or no? What if I allow “shades of grey” to use a colloquialism? Then we extend classical sets to fuzzy sets.
Dr. Nira Chamberlain explains how important mathematical modeling is, and in particular, uses an extension of the gambler’s ruin problem to show how a model’s assumptions and limitations are related.
To me, mathematical modelling is about looking into the real world; translating it into mathematics, solving that mathematics and then applying that solution back into the real world.
Edmund Harriss exploits Desmos to draw some fantastic images.
One of my courses was to use Desmos to help develop thinking on functions and start to get to some of the ideas of calculus (without the need for the algebra). Here are the example calculators that I set up for the course.
David H. Bailey examines a surprisingly large collection of published papers that assert (incorrect!) values of pi.
Aren’t we glad we live in the 21st century, with iPhones, Teslas, CRISPR gene-editing technology, and supercomputers that can analyze the most complex physical, biological and environmental phenomena? and where our extensive international system of peer-reviewed journals produces an ever-growing body of reliable scientific knowledge? Surely incidents such as the Indiana pi episode are well behind us?
Jeremy Kun embeds Boolean logic in polynomials, thus revealing why finding the roots of multivariate polynomials is NP-hard.
This trick is used all over CS theory to embed boolean logic within polynomials, and it makes the name “boolean algebra” obvious, because it’s just a subset of normal algebra.
Ben Orlin evaluates whether or not you should ever buy two lottery tickets.
Gambling advice from mathematicians is usually pretty simple. In fact, it’s rarely longer than one word: Don’t! My advice is gentler…
Jimmy Soni and Rob Goodman emphasize that Claude Shannon’s wife Betty Shannon was a brilliant mathematician herself, and in fact instrumental in many of his successes.
Shannon valued the help. Though his ideas were very much his own, Betty turned them into publishable work. Shannon was prone to thinking in leaps—to solving problems in his mind before addressing all the intermediary steps on paper. Like many an intuitive mind before him, he loathed showing his work. So Betty filled in the gaps.
Peter Cameron elaborates on the conference held in honor of his 70th birthday, in Lisbon.
There is far too much, and far too diverse, mathematics going on here for me to describe all or even most of it. Nine plenary lectures on the first day! … I didn’t mention the film that the organisers have made about me (based mostly on old photographs). I am really not used to being in the spotlight to this extent!
Patrick Honner elucidates an erroneous exam question, and along the way tells us the story of a 16-year-old student who exposed the error and started a Change.org petition in response.
So, this high-stakes exam question has no correct answer. And despite the Change.org petition started by a 16-year-old student that made national news, the New York State Education Department refuses to issue a correction.
Danesh Forouhari extracts a fantastic math problem from properties of the factorization of 2016.
As we were approaching end of 2016, I was wondering if I could come up with a math puzzle which ties up 2016 & 2017, hence this puzzle.
Rachael Horsman extrapolates a lesson about measuring to explain the entire number line!
As we write the framework, activities such as pacing off form critical junctions between various areas in mathematics … Making these explicit to teachers and pupils helps cultivate their understanding of the connections that make up mathematics.
Vijay Kathotia enlightens us by revealing where the roman numeral for “10” might have come from.
What is the Roman numeral for ten? If you answered ‘the letter X’, it may not be quite right. It may well be what you write – but do you know why? There are at least two stories for explaining how X came to represent ten.
Lucy Rycroft-Smith engages UK-based artist MJ Forster in an interview about how math influences his art.
His latest series of paintings seem inherently mathematical; but just how explicit is the mathematics in his art, and how does he feel about the subject?
Brian Hayes estimates pi using rational numbers and an HP-41C calculator!
Today, I’m told, is Rational Approximation Day. It’s 22/7 (for those who write dates in little-endian format), which differs from pi by about 0.04 percent. (The big-endians among us are welcome to approximate 1/pi.)
Finally, the world expresses its sorrow: Fields medalist Maryam Mirzakhani passed away this July after a long battle with cancer. In keeping with the Carnival’s tradition, I offer you two blog posts from the mathematical community.
Ken Regan: on the significance and brilliance of her work
She made several breakthroughs in the geometric understanding of dynamical systems. Who knows what other great results she would have found if she had lived: we will never know. Besides her research she also was the first woman and the first Iranian to win the Fields Medal.
Terence Tao: a more personal perspective
Maryam was an amazing mathematician and also a wonderful and humble human being, who was at the peak of her powers. Today was a huge loss for Maryam’s family and friends, as well as for mathematics.
If you would like to learn more about her life and work, I encourage you to read some of the articles and watch the video on the AMS’ tribute to Maryam Mirzakhani.
This concludes the 148th edition of the Carnival of Mathematics. Please do join us next time, when Mel from Just Maths hosts the 149th!
]]>I started thinking about these ideas in late May, but haven’t gotten a chance to write about them until now…
If you want to take a boat from the Puget Sound to Lake Washington, you need to go across the Ballard Locks, which separate the Pacific Ocean’s saltwater from the freshwater lakes. The locks are an artificial barrier, built in the early 1900s to facilitate shipping.
Today, the locks have a secondary function: they are a picnic spot. A while back, I visited the locks on a sunny and warm day. A band was playing in the park by the water, and there were booths with lemonade and carnival games. Every few minutes, a boat would enter the locks, be raised or lowered, and continue on its way.
If you walk across the locks, you can check out the fish ladder, a series of raised steps designed to help fish — in this case, salmon — migrate, since the locks cut off their natural path between the water bodies. There is usually a crowd around the fish ladder. Around once a minute, a salmon leaps out of the water and goes up a step; the children gasp and cheer as they watch over the railing.
This is the idyllic scene that we will soon destroy with the heavy hammer of mathematical statistics. You see, it turns out that a little bit of thought about these salmon can give us a way to use historical earthquake data to approximate ($ e $).
But I’m getting ahead of myself. Let’s start at the beginning.
What is the probability that a fish jumps out of the water right now? This is a tricky question to answer. Suppose there’s a 10% chance that a fish jumps out of the water right now. That means the probability that a fish doesn’t jump is 90%. In the next instant of time, there’s again a 10% chance that the fish jumps. So, the laws of probability tell us that over the course of ($ n $) instants, there’s a ($ 0.90^n $) probability that no fish-jumps occur.
But there’s an infinite number of instants in every second! Time is continuous: you can subdivide it as much as you want. So the probability that no fish-jumps occur in a one-second period is ($ 0.90^\infty $), which is… zero! Following this reasoning, a fish must jump at least every second. And this is clearly a lie: empirically, the average time between fish-jumps is closer to a minute.
Okay, so “probability that a fish jumps right now“ is a slippery thing to define. What can we do instead? Since the problem seems to be the “right now” part of the definition, let’s try to specify a time interval instead of an instant. For example, what is the probability that we will observe ($ n $) fish-jumps in the next ($ t $) seconds?
Well, we’re going to need some assumptions. For simplicity, I’m going to assume from now on that fish jump independently, that is, if one fish jumps, then it does not affect the behavior of any other fish. I don’t know enough about piscine psychology to know whether or not this is a valid assumption, but it doesn’t sound too far-fetched.
While we’re on the subject of far-fetchedness: the math that follows is going to involve a lot of handwaving and flying-by-the-seat-of-your-pants. We’re going to guess at functions, introduce constants whenever we feel like it, evaluate things that may or may not converge, and, throwing caution and continuity to the wind, take derivatives of things that might be better left underived.
I think it’s more fun this way.
Yes, we could take the time to formalize the ideas with lots of definitions and theorems and whatnot. There’s a lot to be said about mathematical rigor, and it’s really important for you, the reader, to be extremely skeptical of anything I say. In fact, I encourage you to look for mistakes: the reasoning I’m about to show you is entirely my own, and probably has some bugs here and there. (The conclusions, for the record, match what various textbooks say; they just derive them in a slightly different way.)
A couple of lemmas here and there might make the arguments here much more convincing. But they will also make this post tedious and uninspiring, and I don’t want to go down that road. If you’re curious, you can look up the gnarly details in a book. Until then, well, we’ve got bigger fish to fry!
Okay, back to math. We can model the probability we’re talking about with a function that takes ($ n $) and ($ t $) as inputs and tells you the probability, ($ P(n, t) $), that you see ($ n $) fish-jumps in the time period ($ t $). What are some things we know about ($ P $)?
Well, for starters, ($ P(n, 0) = 0 $), since in no time, there’s no way anything can happen.
What about ($ P(n, a + b) $)? That’s the probability that there are ($ n $) fish-jumps in ($ a + b $) seconds. We can decompose this based on how many of the fish-jumps occurred in the “($ a $)” and “($ b $)” periods:
\begin{align} P(n, a+b) & = P(0, a)P(n, b) \\ & + P(1, a)P(n-1, b) \\ & + \ldots \\ & + P(n, a)P(0, b) \end{align}
Hmm. This looks familiar… perhaps…
Yes! Isn’t this what you do to the coefficients of polynomials when you multiply them? The coefficient of ($ x^n $) in ($ a(x)b(x) $) is a similar product, in terms of the coefficients of ($ x^i $) and ($ x^{n-i} $) in ($ a(x) $) and ($ b(x) $), respectively.
This can’t be a coincidence. In fact, it feels appropriate to break out this gif again:
Let’s try to force things into polynomial form and see what happens. Let ($ p_t(x) $) be a polynomial where the coefficient of ($ x^n $) is the probability that ($ n $) fish-jumps occur in time ($ t $):
\begin{align} p_t(x) &= P(0, t)x^0 + P(1, t)x^1 + \ldots \\ &= \sum_{n=0}^\infty P(n, t)x^n \end{align}
(Yes, fine, since ($ n $) can be arbitrarily large, it’s technically a “power series”, which is just an infinitely long polynomial. Even more technically, it’s a generating function.)
We know that ($ p_0(x) = 1 $), because nothing happens in no time, i.e. the probability of zero fish-jumps is “1” and the probability of any other number of fish-jumps is “0”. So ($ p_0(x) = 1x^0 + 0x^1 + \ldots $), which is equal to just “1”.
What else do we know? It should make sense that ($ p_t(1) = 1 $), since if you plug in “1”, you just add up the coefficients of each term of the polynomial. Since the coefficients are the probabilities, they have to add up to “1” as well.
Now, taking a leap of faith, let’s say that ($ p_{a+b}(x) = p_a(x)p_b(x) $), because when the coefficients multiply, they work the same way as when we decomposed the probabilities above.
Why is this property interesting,? We’re turning a property about addition into a property about multiplication. That sounds awfully like something else we’re used to: logarithms! Forgetting for a moment that ($ p $) is a power series, maybe we can “solve” for the function ($ p_t(x) $) by messing around with something like this:
\[ p_t(x) = e^{tx} \]
Okay, ($ e^{tx} $) doesn’t quite work because we want ($ p_t(1) = 1 $). Maybe ($ e^{t(x-1)} $) will work? It seems to have all the properties we want…
Let’s take a moment to stop and think. At this point, it’s not even clear what we’re doing. The whole point of defining ($ p_t(x) $) was to look at the coefficients, but when we “simplify” it into ($ e^{t(x-1)} $) we no longer have a power series.
Or do we?
Recall from calculus class that you can expand out some functions using their Taylor Series approximation, which is a power series. In particular, you can show using some Fancy Math that
\begin{align} e^x &= \frac{x^0}{0!} + \frac{x^1}{1!} + \frac{x^2}{2!} + \ldots \\ &= \sum_{n=0}^\infty \frac{x^n}{n!} \end{align}
If you haven’t taken calculus class yet, I promise this isn’t black magic. It’s not even plain magic. It’s just a result of a clever observation about what happens to ($ e^x $) when you increase ($ x $) by a little bit.
If you have taken calculus, bet you didn’t think this “series approximation” stuff would ever be useful! But it is, because a quick transformation gives us the series representation for ($ p_t(x) $):
\[ e^{t(x-1)} = e^{tx}/e^t = \sum_{n=0}^\infty \frac{(tx)^n}{n!e^t} \]
and so the coefficient of ($ x^n $) gives us ($ P(n, t) = t^n/(e^t n!) $).
Now we have a new problem: this formula doesn’t depend at all on the type of events we’re observing. In particular, the formula doesn’t “know” that the salmon at Lake Washington jump around once a minute. We never told it! Fish at other lakes might jump more or less frequently, but the formula gives the same results. So the formula must be wrong. Sad.
But it might be salvageable! Let’s go back and see if we can add a new constant to represent the lake we’re in. Perhaps we can call it ($ \lambda $), the Greek letter “L” for lake. Where could we slip this constant in?
Our solution for ($ p_t(x) $) was:
\[ p_t(x) = e^{t(x-1)} \]
but in retrospect, the base ($ e $) was pretty arbitrarily chosen. We could make the base ($ \lambda $) instead of ($ e $), but that would mess up the Taylor Series, which only works with base ($ e $). That would be inconvenient.
However, we know that we can “turn” ($ e $) into any number by raising it to a power, since ($ e^{\log b} = b $). If we want base ($ b $), we can replace ($ e $) with ($ e^{\log b} $). This suggests that ($ \lambda = \log b $) could work, making our equation:
\[ p_t(x) = \left(e^\lambda\right)^{t(x-1)} = e^{(\lambda t) (x-1)} \]
This seems to fit the properties we wanted above (you can check them if you want). Going back to our Taylor Series expansion, we can just replace ($ t $) with ($ \lambda t $) to get:
\[ P(n, t) = \frac{\left(\lambda t\right)^n}{e^{\lambda t} n!} \]
Let’s step back and think about what we’re claiming. Knowing only that fish jump randomly, and roughly independently, we claim to have an expression for the probability that ($ n $) fish-jumps occur in a time interval ($ t $).
“Okay, hold up,” you say, “something smells fishy about this. This is pretty bold: we know nothing about how fish think, or fluid dynamics, or whatever other factors could influence a fish’s decision to jump. And yet we have this scary-looking expression with ($ e $) and a factorial in there!”
That’s a fair point. I’m just as skeptical as you are. It would be good to back up these claims with some data. Sadly, I didn’t spend my time in Seattle recording fish-jumping times. But, in a few more sections, I promise there will be some empirical evidence to assuage your worries. Until then, let’s press on, and see what else we can say about fish.
We have a way to get the probability of some number of fish-jumps in some amount of time. What’s next?
One thing we can do is compute the average number of fish-jumps in that time interval, using expected value. Recall that to find expected value, you multiply the probabilities with the values. In this case, we want to find:
\[ E_t[n] = \sum_{n=0}^\infty P(n, t)n \]
This looks hard… but also oddly familiar. Remember that
\[ p_t(x) = \sum_{n=0}^\infty P(n, t)x^n \]
because, y’know, that’s how we defined it. Using some more Fancy Math (“taking the derivative”), this means that
\[ \frac{dp_t(x)}{dx} = \sum_{n=0}^\infty P(n, t)nx^{n-1} \]
and so ($ E_t[n] = p^\prime_t(1) $).
That… still looks hard. Derivatives of infinite sums are no fun. But remember from the last section that we also have a finite way to represent ($ p_t(x) $): what happens if we take its derivative?
\begin{align} p_t(x) &= e^{(\lambda t) (x-1)} \\ p^\prime_t(x) &= (\lambda t)e^{(\lambda t) (x-1)} \\ p^\prime_t(1) &= E_t[n] = \lambda t \end{align}
Aha! The average number of fish-jumps in time ($ t $) is ($ \lambda t $). If ($ t $) has units of time and ($ \lambda t $) has units of fish-jumps, this means that ($ \lambda $) has units of fish-jumps-per-time. In other words, ($ \lambda $) is just the rate of fish-jumps in that particular lake! For Lake Washington, ($ \lambda_w = 1/60 \approx 0.0167 $) fish-jumps-per-second, which means that the probability of seeing two fish-jumps in the next thirty seconds is:
\[ p_{30}(2) = \frac{(0.0167\times30)^2}{e^{0.0167\times30}2!} \approx 0.076 \]
I think that’s pretty neat.
What about the standard deviation of the number of fish-jumps? That sounds ambitious. But things have been working out pretty well so far, so let’s go for it.
Standard deviation, or ($ \sigma $), the Greek letter “sigma”, is a measure of “how far, on average, are we from the mean?” and as such seems easy to define:
\[ \sigma = E[n-\lambda t] \]
Well, this isn’t hard to evaluate. Knowing that expected values add up, we can do some quick math:
\begin{align} \sigma &= E[n] - E[\lambda t] \\ &= \lambda t - \lambda t = 0 \end{align}
Oops. We’re definitely off by a little bit on average, so there’s no way that the standard deviation is 0. What went wrong?
Well, ($ n - \lambda t $) is negative if ($ n $) is lower than expected! When you add the negative values to the positive ones, they cancel out.
This is annoying. But there’s an easy way to turn negative numbers positive: we can square them. Let’s try that.
\begin{align} \sigma^2 &= E[(n-\lambda t)^2] \\ &= E[n^2 - 2n\lambda t + (\lambda t)^2] \end{align}
Now what? We don’t know anything about how ($ E[n^2] $) behaves.
Let’s go back to how we figured out ($ E[n] $) for inspiration. The big idea was that
\[ \frac{dp_t(x)}{dx} = \sum_{n=0}^\infty P(n, t)nx^{n-1} \]
Hmm. What if we take another derivative?
\[ \frac{d^2p_t(x)}{dx^2} = \sum_{n=0}^\infty P(n, t)n(n-1)x^{n-2} \]
We get an ($ n(n-1) $) term, which isn’t quite ($ n^2 $), but it’s degree-two. Let’s roll with it. Following what we did last time,
\begin{align} p_t(x) &= e^{(\lambda t)(x - 1)} \\ p^\prime_t(x) &= (\lambda t)e^{(\lambda t)(x - 1)} \\ p^{\prime\prime}_t(x) &= (\lambda t)(\lambda t)e^{(\lambda t)(x - 1)} \\ E[n(n-1)] &= p^{\prime\prime}_t(1) \\ &= (\lambda t)^2 \end{align}
And now we have to do some sketchy algebra to make things work out:
\begin{align} \sigma^2 &= E[(n-\lambda t)^2] \\ &= E[n^2 - 2n\lambda t + (\lambda t)^2] \\ &= E[n^2 - n - 2n\lambda t + n + (\lambda t)^2] \\ &= E[(n^2 - n) - 2n\lambda t + n + (\lambda t)^2] \\ &= E[n^2 - n] - E[2n\lambda t] + E[n] + E[(\lambda t)^2] \\ &= (\lambda t)^2 - 2(\lambda t)(\lambda t) + \lambda t + (\lambda t)^2 \\ &= \lambda t \end{align}
…which means ($ \sigma = \sqrt{\lambda t} $).
Seems like magic.
Okay, fine, we have this fancy function to model these very specific probabilities about fish-jump-counts over time intervals. But the kids watching the fish ladder don’t care! They want to know what’s important: “how long do I need to wait until the next fish jumps?”
Little do they know, this question opens up a whole new can of worms…
Until now, we’ve been playing with ($ n $) as our random variable, with ($ t $) fixed. Now, we need to start exploring what happens if ($ t $) is the random variable. This needs some new ideas.
Let’s start with an easier question to answer. What is the probability that you need to wait longer than five minutes (300 seconds) to see a fish-jump? (Five minutes is way longer than my attention span when looking at fish. But whatever.)
It turns out that we already know how to answer that question. We know the probability that no fish jump in five minutes: that’s equal to ($ p_{300}(0) $). Why? Well, when we plug in ($ x = 0 $), all the ($ x $) terms go away in the series representation, and we’re only left with (the coefficient of) the ($ x^0 $) term, which is what we want.
Long story short, the probability that you need to wait longer than five minutes is ($ e^{0.0167\times300(0-1)} = 0.00674 $). This means that the probability that you will see a fish-jump in the next five minutes is ($ 1 - e^{0.0167\times300(0-1)} $), which is around 0.9932. This is the probability that you have to wait less than five minutes to see a fish-jump. For an arbitrary time interval ($ T $), we have ($ P(t<T) = 1 - e^{-\lambda T} $), where ($ t $) is the actual time you have to wait.
Sanity check time! This gets close to 1 as ($ T $) gets higher, which sounds about right: the longer you’re willing to wait, the likelier it is that you’ll see a fish jump. Similarly, if fish jump at a higher rate, ($ \lambda $) goes up, and the probability gets closer to 1, which makes sense. Indeed, encouragingly enough, this equation looks very close to the equation we use for half-lives and exponential radioactive decay…
Now things are going to get a bit hairy. What is the probability that you have to wait exactly ($ T $), that is, ($ P(t = T) $)? This should be zero: nothing happens in no time. But let’s be reasonable: when we say “exactly” ($ T $), we really mean a tiny window between, say, ($ T $) and ($ T + dT $) where ($ dt $) is a small amount of time, say, a millisecond.
The question then is, what is ($ P(T < t < T + dt) $), which isn’t too hard to answer: it’s just ($ P(t < T + dt) - P(t < T) $), that is, you need to wait more than ($ T $) but less than ($ T + dT $). In other words,
\[ P(t \approx T, dT) = P(t < T+dT) - P(t < T) \]
where ($ dt $) is an “acceptable margin of error”.
This looks awfully like a derivative! We’re expressing the change in probability as a function of change in time: if I wait ($ dT $) longer, how much likelier am I to see a fish-jump?
Let’s rewrite our above equation to take advantage of the derivativey-ness of this situation.
\begin{align} P(t \approx T, dT) &= \left(\frac{P(t < T+dT) - P(t < T)}{dT}\right)dT\\ &= \left(\frac{d P(t < T)}{dT}\right)dT \\ &= \left(\frac{d (1-e^{-\lambda T})}{dT}\right)dT \\ &= \lambda e^{-\lambda T} dT \end{align}
By the way, this might give a simpler-but-slightly-less-satisfying answer to our initial question, “what is the probability that a fish jumps out right now?“ If we set ($ T $) to 0, then we get ($ P(t \approx 0, dT) = \lambda dT $). In other words, if fish jump out of the water at a rate ($ \lambda $), then for a tiny period of time ($ dT $), the probability of seeing a fish jump in that time is ($ \lambda dT $). This is one of those facts that seems really straightforward one day, and completely mindblowing the next day.
Anyway. Now that we have an approximation for the probability that you need to wait a specific time ($ T $), we can find an expected value for ($ t $) by taking the sum over discrete increments of ($ dt $):
\[ E[t] = \sum^\infty_{k=0} P(t \approx T, dT) \times T \]
where ($ T = k\times dT $). Since we’re talking about the limit as ($ dT $) gets smaller and smaller, it seems reasonable to assume that this thing turns into
\begin{align} E[t] &= \int^\infty_0 P(t \approx T, dT) \times T \\ &= \int^\infty_0 \lambda e^{-\lambda T} dT \times T \end{align}
You can integrate that by parts, or just use WolframAlpha, which tells you that ($ E[t] = \lambda^{-1} $).
…which is kind of obvious, isn’t it? Remember that ($ \lambda $) was the rate at which our fish jumped. If fish jump once a minute, shouldn’t we expect to have to wait a minute to see a fish jump? Isn’t this similar to the way wavelength and frequency are related?
The answer is, “yes and no”. “Yes”, the value ($ \lambda^{-1} $) is indeed pretty sensible in retrospect. A simpler way to derive it might have been to note that for any time period ($ T $), the expected number of fish-jumps is ($ \lambda T $) (as we found out above), and so the average time interval between fish-jumps would be ($ T / (\lambda T) = \lambda^{-1} $). The fact that the average interval between fish-jumps corresponds to the the expected interval is captured by the surprisingly well-known acronym “PASTA”: Poisson Arrivals See Time Averages (I’m not making this up!).
But “no”, it’s not “obvious” that you should have to wait the average inter-fish time!
Suppose you, like Rip Van Winkle, you woke up after a very long sleep, and you wanted to know “how much longer until Monday morning?”
Well, Monday mornings happen every 7 days, and so if you set ($ \lambda = 1/7 $), you should expect to have to wait 7 days until Monday.
But that’s silly! You definitely need to wait fewer than 7 days on average! In fact, most people would intuitively say that you need to wait 7/2 = 3.5 days on average: and they would be right. (The intuition is that on average, you’d wake up halfway between two Monday mornings.)
This is the so-called “Hitchhiker’s Paradox”: if cars on a highway through the desert appear roughly once an hour, how long does a hitchhiker who just woke up need to wait until he sees a car? It seems reasonable to say “half an hour”, since on average, you’d wake up halfway between two cars. On the other hand, with ($ \lambda = 1 $), you’d expect to wait an hour until you see a car.
So which one is right? And why are the answers different?
Well, the “Rip Van Winkle” interpretation assumes that cars on a desert highway — like Mondays — come at regular intervals. In reality, cars on a desert highway — like the salmon of Seattle — are usually independent. They might come in a cluster a few minutes after you wake up, or a lone car might come the next day. Crucially, the next car doesn’t “know” anything about previous cars, and so it doesn’t matter when you wake up: we call this property “memorylessness”.
It turns out that since there’s a nonzero probability of having to wait a very long time for a car, the average gets pulled up from half an hour. With that in mind, it’s really quite surprising that the true mean turns out to be exactly ($ 1/\lambda $).
And now, the aftermath.
Very little of the above discussion was fish-specific. The only properties of salmon that mattered here were that salmon jump randomly and independently of each other, at some rate ($ \lambda $). But our calculations work for any such process (let’s call such processes Poisson processes).
Poisson processes were studied as early as 1711 by de Moivre, who came up with the cool theorem about complex numbers. However, they’re named after Siméon Denis Poisson, who in 1837 studied (not fish, but) the number of wrongful convictions in court cases.
Today, Poisson processes model all sorts of things. Managers use it to model customers arriving at a grocery checkout. Programmers use it to model packets coming into a network. Both of these are examples of queueing theory, wherein Little’s Law relates ($ \lambda $) to how long things have to wait in queues. You could probably use a Poisson process to model how frequently bad things happen to good people, and use that to create a statistical model of how unfair the world is.
The upshot is this: even though I didn’t record any fish-jumping data back in Seattle, we can definitely try out these ideas on other “sporadic” processes. Wikipedia, it turns out, maintains a list of earthquakes that happened in the 21st century. Earthquakes are pretty sporadic, so let’s play with that dataset.
I scraped the date of each earthquake, and wrote a small script to count the the number of earthquakes in each month-long interval. That is, ($ t $) is 2,592,000 seconds. By “binning” my data by month, I got lots of samples of ($ n $). This gives an easy way to compute ($ P(n, t) $) “empirically”.
On the other hand, taking the total number of earthquakes and dividing by the total time range (around 17 years, since we’re in 2017) gives us the rate ($ \lambda $), which in this case works out to about ($ 1.06\times10^{-6} $) earthquakes per second. This gives a way to compute ($ P(n, t) $) “theoretically” by using our fancy formula with the factorial and whatnot.
\[ P(n, t) = \frac{\left(\lambda t\right)^n}{e^{\lambda t} n!} \]
Comparing the results gives us this pretty plot!
They match up surprisingly well.
What else can we say? Well, the average inter-earthquake time works out to ($ 1/\lambda $), or around 940,000 seconds. That’s about eleven days. On average, a reader of this blog post can expect to wait eleven days until the next earthquake of magnitude 7 or above hits.
And for those of you who have been wondering, “can we do these calculations in reverse to approximate ($ e $)?” the answer is, yes! We just solve the above equation for ($ e $).
\[ e\approx\left(\frac{P(n, t)n!}{(\lambda t)^n}\right)^{-(\lambda t)^{-1}} \]
In my case, using earthquake data for ($ n = 1 $), I got ($ e \approx 2.75 $). I’d say that’s pretty good for an algorithm that relies on geology for accuracy (in reality, ($ e $) is around 2.718).
In many ways, it is quite incredible that the Poisson process conditions — randomness, independence, constant rate — are all you need to derive conclusions for any Poisson process. Knowing roughly that customers at a burger place are random, act independently, and arrive around once a minute at lunchtime — and knowing nothing else — we can predict the probability that four customers arrive in the next three minutes. And, magically, this probability will have ($ e $) and a factorial in it.
Humans don’t evaluate expressions involving ($ e $) and factorials when they decide when to get a burger. They are subject to the immense complexity of human life, much like how salmon are subject to the immense complexity of the fluid mechanics that govern Lake Washington, much like how earthquakes are subject to the immense complexity of plate tectonics.
And yet, somehow statistics unites these vastly different complexities, finding order and meaning in what is otherwise little more than chaos.
Isn’t that exciting?
~ Fin. ~
Assorted references below.
Postscript, two weeks later. This morning at the coffee shop I realized that the Poisson distribution is a lot like the binomial distribution with a lot of trials: the idea is that you have lots of little increments of time, and a fish either jumps or doesn’t jump in each increment — this is called a Bernoulli process. Presumably, over a long period of time, this should even out to a Poisson process…
Recall that the probability of a fish-jump happening in some small time period ($ dt $) turned out to be ($ \lambda dt $) for our definition of ($ \lambda $) as the rate of fish-jumps. Can we go the other way, and show that if the probability of something happening is ($ \lambda dt $) for a small period of time ($ dt $), then it happens at a rate of ($ \lambda $)?
Turns out, yes!
The binomial distribution is a way to figure out, say, what the probability is that if I flip 100 counts, then exactly 29 of them land “heads” (a coin toss is another example of a Bernoulli process). More abstractly, the binomial distribution gives you the probability ($ B(N, k) $) that if something has probability ($ p $) of happening, then it happens ($ k $) times out of ($ N $) trials.
The formula for ($ B(N, k) $) can be derived pretty easily, and you can find very good explanations in a lot of high-school textbooks. So, if you don’t mind, I’m just going to give it to you for the sake of brevity:
\[ B(N, k) = \binom{N}{k} p^k (1-p)^{N-k} \]
Now, can we apply this to a Poisson process? Well, let’s say ($ k = n $), the number of times our event happens in time ($ t $). Then we have
\[ \binom{N}{n} p^n (1-p)^{N-n} \]
What next? We know that ($ p = \lambda dt $). Also, for time period ($ t $), there are ($ t / dt $) intervals of ($ dt $), so ($ N = t / dt $). That means we can substitute ($ dt = t / N $), and thus ($ p = \lambda (t / N) $). This gives us
\[ \binom{N}{n} (\lambda t / N)^n (1-\lambda t / N)^{N-n} \]
Oh, and of course to approximate a Poisson process, this is the limit as ($ N $) approaches infinity:
\[ \lim_{N\to\infty} \binom{N}{n} (\lambda t / N)^n (1-\lambda t / N)^{N-n} \]
This isn’t a hard limit to take if we break apart the product.
\[ \lim_{N\to\infty} \frac{N! (\lambda t)^n}{n!(N-n)! N^n} \lim_{N\to\infty}(1-\lambda (t / N))^{N-n} \]
The right half is surprisingly enough the definition of ($ e^{-\lambda t} $), since the ($ - n $) in the exponent doesn’t really matter. The left half is trickier: it turns out that ($ N! / (N-n)! $) is the product ($ N(N-1)\ldots(N-n+1) $). As a polynomial, it is degree ($ n $), and the leading term is ($ N^n $). But look! In the denominator, we have an ($ N^n $) term as well, so in the limit, those both go away.
We’re left with what simplifies to our expression for the Poisson distribution.
\begin{align} \lim_{dt\to 0} B(N=t/dt, p=\lambda dt) &= \frac{(\lambda t)^n}{n!}e^{-\lambda t} \\ &= \frac{(\lambda t)^n}{e^{\lambda t}n!} \\ &= P(\lambda, t) \end{align}
which I think is literally magic.
]]>In a desparate attempt to learn a little more about the brilliant, strange, confusing, mystifying world we all live in, I have spent the past few weeks reading about it. About camelids and Cavendish, about 17th-century piracy and 21st-century photography, about Betsy Ross and clause learning, about metric structure and morse code.
Those last two subjects — metric structure and morse code — were Friday night’s reading, and as I read, I felt that familiar sensation you feel when you notice big ideas intersecting. It’s a tremendously exciting sensation, one that invites you to think just a little harder about each idea, to probe just a little more aggressively at their boundaries, until at last you can dig out the connection from your intuition.
The connection in this case turned out to be simple: many morse code signs
correspond to metric feet.
“A” or .-
is an iamb (“To be or not to be”), “D” or -..
is a dactyl
(“Merrily, merrily, merrily, merrily”), “U” or ..-
is an anapest (“There
was once an old man in Peru…”), “M” or --
is a spondee (“Rage, rage”),
and so on.
A practical result of this fact is that we can come up with nice metric mnemonics for Morse code, and indeed Wikipedians have already done so. By remembering the syllable stress patterns for such mnemonics, you can remember the dot-and-dash pattern for the associated letter.
The real question, of course, is whether or not we can automate the search for such mnemonics.
Armed with the pronouncing Python library (which feeds off of the CMU Pronouncing Dictionary), I decided to make my own set of Morse code mnemonics. This set is optimized for CS students, and was generated with the assistance of a simple computer program I named versificator. Here it is — enjoy!
Letter | Morse | Mnemonics |
---|---|---|
A | .- |
assert |
array | ||
B | -... |
bcrypt hashing |
base64 | ||
C | -.-. |
computation |
CS major | ||
D | -.. |
digital |
dithering | ||
E | . |
eh? |
err… | ||
F | ..-. |
Futamura |
fs_usage | ||
G | --. |
GNU emacs |
git checkout | ||
H | .... |
hullabaloo |
Hewlett-Packard | ||
I | .. |
IP |
id | ||
J | .--- |
JS? How sad! |
moar jQuery | ||
K | -.- |
kilobyte |
kangaroo | ||
L | .-.. |
linguistics course |
legitimate | ||
M | -- |
malware |
Markov | ||
N | -. |
Nyquist |
nearley | ||
O | --- |
oh my zsh |
output file | ||
P | .--. |
prevent errors |
prometheus | ||
Q | --.- |
QuickTime has crashed |
quad-core machine | ||
R | .-. |
recursion |
rotation | ||
S | ... |
sudoer |
scp | ||
T | - |
type |
tree | ||
U | ..- |
unsubscribe |
undeclared | ||
V | ...- |
video game |
visual mode | ||
W | .-- |
wget it |
Wilensky | ||
X | -..- |
LaTeX by Knuth |
Xcode is slow | ||
Y | -.-- |
combinator |
Yukihiro | ||
Z | --.. |
Z3 solver |
zip archiver |
Ivan Sutherland says, “It’s not an idea until you write it down.” With that in mind, here is a series of thoughts I have recently been thinking.
How do you feel when you walk through a library? Perhaps you feel relaxed by the silence, comforted by the presence of books at your side. Perhaps you feel awed by the pages that surround you, or inspired by the wonderful tales you know they contain.
I feel all these emotions, but I also feel a little despair. A library has thousands and thousands of books. Surely I will never make it through even a small fraction of them — no matter how hard I try, even if I read constantly for years, there will always be books remaining, books I might have loved and cherished if only I had decided to read them. It is more than despair: the state of mind is perhaps best described as preemptive regret.
Such is life. But as I walk between the bookshelves at my local library, it nevertheless seems natural — even prudent — to wonder: is there any end in sight? Will writers ever run out of stories to tell? Of distinctive plots, of clever endings, of engaging characters? Surely there are a finite number of words, a finite number of ways to meaningfully combine those words, and thus eventually one must reach a limit on the number of stories those words can tell. What happens if we tell them all, if we saturate our own Library of Babel? What then?
In 2004, Christopher Booker published a book called The Seven Basic Plots: Why We Tell Stories. He argued that all stories fall into one of seven categories, such as “Rags to Riches” (Cinderella and Jane Eyre) or “Overcoming the Monster” (Beowulf and Harry Potter). Though harshly received by critics, Booker’s idea serves well to introduce the thesis of this blog post: that it is not the story, but rather its telling, that matters.
What is the difference between a story and its telling? For that, let us borrow a pair of words from the Russian formalists, who began the modern tradition of narratology. The Russian formalists made a distinction between the fabula and the syuzhet of a story. The fabula is the actual story as it would be written in a history textbook: the real sequence of events, in their chronological order, the — to appeal to Kant — the noumenal, das-Ding-an-sich view of a tale. The syuzhet is the way the story is told: the organization of scenes, the development of characters, the surprise ending, the — to once more appeal to Kant — the phenomenal. The former is a story, the latter is a plot. If fabula is the theory of differential calculus, syuzhet is a well-delivered lecture on the subject. If fabula is a prism, syuzhet is a rainbow. If fabula is a galaxy, syuzhet is a telescope.
The thesis of this post is therefore that to the art of storytelling, the syuzhet matters much more than the fabula. Booker’s seven basic plots attempt to restrict the possible fabulae, but an infinitude of syuzhets leave us with generations of thought-provoking literature. The Odyssey and The Wizard of Oz both narrate Booker’s “Voyage and Return” Fabula, but the syzhets are different enough for the stories to have vastly different meanings and impacts on our minds. The stories teach us different lessons, make us laugh and cry in different places, and make us empathize with different characters. Nevertheless, they both follow a very predictable order of events. It seems, therefore, that the work of a writer is to create syuzhet, not fabula.
And so, as computer scientists, we must ask ourselves, “can we automate the creation of fabula?”
I want to first clarify that I am not talking about creating some sort of Great Automatic Grammatizer. And I certainly am not trying to reduce literature to a series of theorems. Rather, I want to use computation as a lens to explore stories in a new way. The purpose, therefore, is not to generate a novel automatically — that sounds horrifying — but rather to generate the framework of a novel automatically. A writer can then focus on expressing herself without worrying about crafting a compelling storyline through which to do so. After all, a writer with a fascinating character, setting, or parable in mind still requires a medium for those elements to occupy. A readymade fabula provides a starting point, and above all, assurance that the story is going somewhere. A fabula protects you from the prospect of a contrived deus-ex-machina ending.
Perhaps the best work on the computational generation of fabula comes from Chris Martens’ thesis, in which she reveals a wonderful connection between stories and Girard’s linear logic. (For a taste of linear logic, a good reference paper is this set of lecture notes by Philip Wadler.) Briefly, Martens’ work allows you to create rules by which propositions change over time. These rules are much like algebraic manipulations; indeed, in a way, her idea equates stories with proofs. Ceptre, a proof-of-concept implementation of this work, lets you create rich interactive narratives as well as complete stories.
(Further related work is covered by this survey paper. Of particular interest is the incredibly-well-named Narrative Intelligence Lab at the University of New Orleans, which publishes papers like A Computational Model of Plan-Based Narrative Conflict at the Fabula Level.)
But while Martens’ work creates a logical story effectively (which is very useful for interactive fiction), what we seek here is a powerful story. We want stories that make us feel rather than think. For that, we need new tools.
Having said that, I must now admit that I have little to say about what such tools might actually be. I have been thinking about it for many months now, and have yet to make any progress.
Some ideas stand out. Surely, the heart of fabula is conflict, and in particular, conflict of morals. Hamlet conflicts between revenge and self-preservation; Brutus between friendship and freedom; Juliet between love and family. Perhaps there is a way to take an arbitrary pair of values, and, by some nondeterministic process, construct a situation that forces a character to choose between them. Rather than Ceptre’s forward-chaining proof strategy, we might instead appeal to a form of resolution-refutation to generate interesting situations.
Another part of fabula must be the asymmetry of information: there is distinction between the knowledge of each character (not to mention the reader and even the narrator). The function of information asymmetry is usually obvious. Consider, for example the classic murder mystery: Poirot’s stories rely on the murderer knowing what Poirot doesn’t, of Poirot knowing what the murderer doesn’t, of Hastings — a proxy for the reader — not knowing much in general. Yet information asymmetry is present in far more than just murder mysteries. Even Romeo and Juliet relies on a sequence of misunderstandings at the end.
Fate, too, has its role in narrative, as do its counterparts, intention and motive. And what of irony? Does irony belong in fabula or syuzhet? The Gift of the Magi relies on irony for its fabula, whereas Shakespeare uses irony to enrich the syuzhet of Hamlet.
Searching for the fundamental building blocks of effective fabula — the narremes that build up the narratives — is an exercise left for you to ponder along with me. As you read novels and short stories over the next few weeks, I invite you to try to distill the fabula from the syuzhet, and then to decide what exactly makes the fabula compelling. As with all fascinating questions, the answers may not be to our liking — or they may be too complex for us to mold into tractable generalizations — or they may be disappointing in their simplicity.
But then, what’s an adventure without a little danger?
]]>Look. I get it. Monad tutorials are a meme, memes about monad tutorials are a meme, and we’re almost at the point where memes about memes about monad tutorials are a meme. Before we go any farther down this infinite stack of turtles, though, I want to share one way to think about monads that doesn’t involve burritos or boxes or any Haskell code at all. I present to you, the Sugarloaf Transformation.
The overall idea makes sense to me because I’m a fan of visual metaphors. Your mileage may vary.
Consider the dataflow programming paradigm (think of Quartz Composer or
LabView-G). A dataflow language takes the idea of an abstract syntax tree and
extends it to the tree’s close relative, the directed acyclic graph. A dataflow
program is a DAG where the edges entering and leaving each node are ordered.
Each edge is associated with a value, and a node computes the values of its
out-edges based on the values of its in-edges. Nodes with no in-edges are
called sources, and nodes with no out-edges are called sinks. The program
below computes the difference between 3 and 1, and prints the number 2. The two
nodes on the left are sources that return constants; the print
node is a
sink.
+---+
| 3 |--+
+---+ | +-------+
+-->| | +-------+
| minus |-->| print |
+-->| | +-------+
+---+ | +-------+
| 1 |--+
+---+
To interpret a dataflow program, we have a variety of strategies. We could begin at sinks and recursively descend into in-edges until we reach sources, which can be evaluated with no further work. This corresponds to a recursive-descent interpreter of an AST. Alternatively, we could begin by collecting the set of sources, evaluating them, assigning the results to the respective out-edges. Then, we iteratively find and evaluate nodes whose in-edges have all been assigned values, until at last we have evaluated the entire program.
The point I wish to make is that the order in which nodes are evaluated is
nondeterministic. Thus, the following “obvious” program to subtract two
numbers is not well-defined. If given inputs “3, 1” by a human, it may output
“2” or “-2” depending on the order in which the two read
nodes were
evaluated.
+------+
| read |--+
+------+ | +-------+
+-->| | +-------+
| minus |-->| print |
+-->| | +-------+
+------+ | +-------+
| read |--+
+------+
The problem is that nondeterministic evaluation order for IO results in the evaluator potentially diverging. This is really bad! How can we force a deterministic order of execution for IO primitives? Well, notice that the only way to force node x to evaluate before node y is to make y depend on the output of x. We can have each node output an “ordering sentinel” that becomes the input to future nodes: this allows us to impose a partial ordering on node evaluation, as shown below.
+-------+ +------+ +------+
| start |-->| |---->| |
+-------+ | read | | read |
| |-+ | |-+
+------+ | +------+ |
| | +-------+
| +-->| | +-------+
| | minus |-->| print |
+--------------->| | +-------+
+-------+
The start
node simply emits an ordering sentinel, to kick off the
computation.
This looks really good: this program is, in fact, well-defined in its execution order. Sadly, a partial ordering still allows divergent programs. Why? It’s because, in the words of Philip Wadler, “Truth is free.” Once you have computed a sentinel, you can use it wherever you want. What prevents us from doing this?
+------+
+-->| |
| | read | +-------+
| | |----->| | +-------+
+-------+ | +------+ | minus |-->| print |
| start |--+ +-->| | +-------+
+-------+ | +------+ | +-------+
+-->| | |
| read | |
| |--+
+------+
Nothing. Yikes.
If the problem is that “truth is free,” we might be able to solve the problem by simply restricting the flow of information created by IO primitives. One solution is to sandbox all computation that uses the output of an IO node. An easy way to do this is to “lock” the output of an IO node in a special pipe that can only be unlocked by entering a sandbox. Of course, sandboxes must only allow locked values to exit—otherwise you can trivially pass a value through the null sandbox to unlock it. Our program then becomes the following (the dotted boxes denote sandboxes, and fat arrows are locked).
.................................................
+------+ : :
| read |===>@---------------+ :
+------+ : | :
: ..|............................ :
: : | : :
: : | +-------+ : :
: : +-->| | +-------+ :==>:==> ...
: +------+ : | minus |-->| print |==>: :
: | read |===>@---->| | +-------+ : :
: +------+ : +-------+ : :
: : : :
: :.............................: :
: :
:...............................................:
Notice that since a sandbox only unlocks one pipe at a time, and does not allow
the unlocked value to escape, our previous divergence hack becomes impossible.
minus
takes two unlocked inputs, so they must be computed in nested
sandboxes which ensure sequential execution. Indeed, by generalizing, it is not
hard to see that now there is only one way to compute the contents of a locked
pipe. This means that the result of a locked pipe is literally the same as the
computation that goes into evaluating its contents—and so our programs must
have a well-defined execution order.
In Haskell, locked pipes correspond to values “boxed” in a value of type IO a
(this is where the burrito metaphor stems from). Sandboxes are lambdas, and the
unlocking mechanism @
is the monadic binding operator >>=
. The return
function in Haskell is simply a node that locks its input—this lets us
return an arbitrary value from a sandbox. Finally, do
-notation is just
syntactic sugar to save programmers from reasoning about lots of nested
sandboxes, in the same spirit that C has in letting you write 1+2+3+4
instead
of 1+(2+(3+4))
.
P.S. The title of this post refers to the idea that this transformation in a
way inverts the traditional flow of control. Normally, minus
would call
read
twice to get the values of its inputs. Here, however, the reads
are
calling minus
to create their output. “Don’t call us, we’ll call you.”
I think the best way to begin this article is by quoting James Mickens:
[Cryptographers] are like smarmy teenagers who listen to goth music: they are full of morbid and detailed monologues about the pervasive catastrophes that surround us, but they are much less interested in the practical topic of what people should do before we’re inevitably killed by ravens or a shortage of black mascara.
Now, I don’t listen to goth music and I didn’t even know “smarmy” was a word. But I am a teenager and I do worry about the pervasive catastrophes that surround us, including, but not limited to, the following three scenarios that (for reasons beyond my control or understanding) I spent all winter break contemplating.
Scenario 1: The CEOs of ACME, Inc. and Ajax, Inc. both want to know whether or not the other one is willing to collude. But, they cannot ask the question outright: if ACME says “Ajax, I want to collude with you,” then Ajax can use that message to get ACME in trouble with antitrust laws, and thus eliminate competition. So, both companies end up making less money than they would in an ideal situation.
Scenario 2: Alice and Bob both want to go to prom with each other, but each is too afraid to ask the other. If Bob asks Alice to prom and Alice rejects Bob, then Bob is in trouble (socially). The same holds if Alice asks Bob. Game theoretically, each party has a higher expected payout by not playing the game. The only winning move is not to play: they never end up going to prom.
Scenario 3: Alice is the world’s best safecracker and Bob is the world’s best hacker. If Alice and Bob were to work together, they could pull off a really lucrative heist. But neither one would suggest such a thing, because the other would rat him or her out: it’s a round of potential-prisoners’ dilemma. As a result, the heist never occurs, and the only thing the world is robbed of is another cheap crime novel “based on a true story.”
Clearly, these scenarios are isomorphic in some way (no, not because they are all potential Ryan Gosling movies). The common thread is the following: two parties, each with a secret bit, wish to compute the logical AND of their bits without revealing their individual bits. Neither party trusts the other, so there is a deadlock and opportunities are missed. It’s worse than that time Jim and Della invited a bunch of philosphers for dinner. For that matter, this is awkwardly reminiscent of that time the two most powerful nations in the world decided to build lots of nuclear weapons.
Abstractly, what we have is a situation where mutually distrusting parties need to orchestrate a secure transfer of sensitive information. This sounds like a job for cryptography.
Apparently Scott Aaronson solved this problem before I did. See here. S. Dukhovni, J. Weisblat, and I. Chung improved upon his work. Their solution, the SENPAI protocol, can be found here.
The problem with both protocols is asymmetry: Bob learns the result of the protocol before Alice does. This means that Bob can carry out the protocol by inputting a “1”, and obtain a proof that Alice also input “1”, which, if Bob were malicious, could be catastrophic for Alice’s social life. Or, in the isomorphic colluding-businesses case, ACME could obtain a proof that Ajax intended to collude, and thus sabotage Ajax without completing the protocol.
In fact, if you think about it, any finite protocol should have this weakness, because if you exchange information in a finite number of turns, then at some point one person must have the proof before the other person (the act of sending never creates information for the sender).
How can we solve this? I spent this winter break thinking deeply about this problem. My result is a protocol that I named, following the tradition of SENPAI, the FACADE (Feelings Are Complicated And I Don’t Even) protocol. In case of a “1” result, FACADE gives both parties some proof of the other’s guilt, thus solving the problem with SENPAI. Much like nuclear disarmament and the formation of a friendship, both parties take turns making themselves a little bit more vulnerable, until at last they reach a mutually-agreed-upon level of nuclear and/or social détente.
Unfortunately, FACADE has some unfortunate quirks that I haven’t been able to iron out yet—more on that in a moment.
The FACADE protocol is based on Rabin’s 1981 paper, How to Exchange Secrets with Oblivious Transfer.
The way the FACADE protocol works is by creating plausible deniability for MAYBE responses. If Bob has sent 10 MAYBE messages, then
In other words, each successive MAYBE makes it harder to believe that Bob’s bit is “0”. However, Bob could still deny having a “1” bit and claim to be extremely unlucky. Since he keeps a a secret, it is impossible for anyone to tell whether or not he was “forced” to send MAYBE. Moreover, since Alice and Bob alternate messages, if Alice decides to rat Bob out with her “evidence” of n consecutive MAYBE messages from Bob, then Bob can retaliate by showing n-1 consecutive MAYBE messages from Alice, which should be good enough to convince a jury that neither (or both) of them were up to no good.
For completeness, I want to briefly touch on the number theory behind Rabin’s oblivious transfer mechanism: why does this number-theoretic gymnastics make sense?
When Bob generates s, it is a quadratic residue mod N, and consequently mod both p and q. s has two square roots mod p and two square roots mod q because a quadratic polynomial can only have at most two roots in the ring of integers mod a prime, and we know that if r is a root then -r is also a root. Using the Chinese Remainder Theorem on each pair, we can reconstruct a total of four square roots mod N. (To find square roots mod the primes, Alice can select primes such that both are congruent to 3 mod 4. Then, raising s to the power (p+1)/4 yields a square root of s. Showing that this works should be easy if you know Fermat’s Little Theorem.)
Now, Alice has no idea which of the four square roots Bob chose when choosing a. So, she must report a random root. With probability 50%, that root r will be ±a. Bob already knows this root, it’s easy to compute! Let’s call this a trivial root: it means Bob can’t really do anything new with this information, because he already knew it.
However, if Alice chooses a nontrivial root r, then the difference of squares of a and r is zero; that is, we have the equation (a+r)(a-r) = 0. Since r is a nontrivial root, neither of the factors is individually equal to zero: so, by Euclid’s lemma, we must have p divide one of them and q divide the other. That is, the GCD of N = pq and a ± r must be p or q, and thus Bob can factor N easily.
In summary: Alice has no way of selecting a trivial (or nontrivial) root on purpose, and thus Bob has a fair 50% chance of factoring N. Crucially, Alice does not know whether or not Bob can factor N because she doesn’t know whether or not her root was nontrivial.
I think Rabin’s oblivious transfer trick is one of the cleverest cryptographic ideas ever, because of how elegantly it accomplishes something seemingly impossible (“try to tell me something without knowing whether or not you told it to me”).
An unfortunate “gotcha” of this protocol is that it is not reusable, because all previous MAYBE messages are fair game to be used as evidence against the sender. Alice can keep knocking at Bob’s door, getting a “1” response, sending a “0” response, and eventually conclude that Bob’s bit is “1”.
Solving this problem is tough, because FACADE by nature inherently does leak information. I experimented with strategies where you randomly choose to pretend your bit is “0” every once in a while to throw off an adversary: that turned out to be a fruitless endeavor since any such strategy will eventually be detected by statistically analyzing a large enough set of transcripts (convincing yourself of this is left as a fun exercise).
In any case, my opinion is that a one-time protocol is definitely better than no protocol at all, and in each of the potential use-cases listed above, the secret bits are unlikely to change in a short timeframe.
TOXIC stands for “Two Operators eXchange Information Concurrently,” and to be perfectly honest it is kind of a cop-out because it requires both parties to be present in the same physical space. But I think it’s kind of clever anyway, and it is definitely worth sharing with you, if only because it is fun to imagine.
As long as you discard the result immediately, there is no way for anyone to know what the contents of beaker A and beaker B were. I suppose in practice an adversary could use some high-tech forensics (e.g. licking the beaker with their tongue) to recover some information, but in practice you shouldn’t be doing sketchy protocols in a chemistry lab anyway.
One of my favorite blogs, GLL, closes each post with a set of open questions. I think that is a lovely tradition, and so I invite you to ponder the following with me:
What other applications are there for this type of protocol? What other flaws are there in FACADE, and are there ways to fix them? How, at a fundamental level, is the FACADE protocol related to the concept of a zero-knowledge proof, or to that of homomorphic encryption? What makes this problem hard?
]]>I’ve been cleaning out my computer for the past few days, and late last night I came across this README. I have no memory of writing this, and to be honest I’m kind of frightened that it works (why are there regexes?).
Inferno is a JavaScript library that provides dynamic scoping for JavaScript.
First, install Inferno using our one-line cross-platform installer script,
eval("var $ = function(name) { var caller = arguments.callee.caller; while (caller !== null) { var names = caller.toString().replace(/\\/\\*.+\\*\\//g, ' ').replace(/\\/\\/.*\\n/g, ' ').match(/^function\\s+(?:\\w+\\s*)?\\((.*?)\\)/)[1].split(',').map(function(a) { return a.trim(); }); if (names.indexOf(name) !== -1) return caller.arguments[names.indexOf(name)]; caller = caller.arguments.callee.caller; } };")
.
Then, use $('name')
to dynamically search for a name.
Suppose you want to write a function that prints a story, but you want to be
able to specify where the output goes. It would be clumsy to thread an argument
through every subroutine, and for some reason people always get mad at you when
you use global variables. With Inferno, you can simply write a wrapper function
that binds the name output
dynamically.
function tell_beginning() {
$('output').write("Once upon a time...");
}
function tell_ending() {
$('output').write("...and they all lived happily ever after.");
}
function print_story(output) {
tell_beginning();
}
if (typeof(window) !== 'undefined') {
print_story(document);
} else {
print_story(process.stdout);
}
By some miracle, Inferno works in both node and the browser.
var
keyword, arrow
functions, and pretty much anything that isn’t bound via function arguments.I recently had the privilege of attending a talk by Guy Steele: and that reminded me of one of his most famous lectures, Growing a Language. I don’t want to spoil the talk for you, so I encourage you to watch it on YouTube, or at least to read a transcript. I promise that it’s relevant.
In the ring of integers, prime numbers can be enumerated using the “Sieve of Eratosthenes”. In linguistics, we define a semantic prime as an idea that can’t be expressed in simpler terms (for example, “there exists” or “because”). The question I ask in this blog post is, how do I enumerate semantic primes? In other words, does there exist an efficient semantic sieve?
The motivation here isn’t to build some sort of Basic English—or Newspeak—but rather to better understand the building blocks of our communication. However, this problem has more than just linguistic appeal. Consider the popular computer game Alchemy, which lets you combine elements like “fire” and “water” to produce “steam”. Given all the elements of alchemy, as well as the possible recipes, what are the possible starting elements? Similarly: approximately twice a year like clockwork, someone in the Scratch community notices that some Scratch blocks can be defined in terms of others, and then proceeds to ask what the minimal set of Scratch blocks is (you can see, for example, the October and April editions of this conversation). Clearly, any block that has no “workaround” (and, analogously, any word without a definition) must remain in a set of semantic primes, which I will call the “basis set” from now on.
Beyond that, though, it is not immediately obvious which blocks or words must be kept in order to minimize the basis. If you define a “puppy” as a “small dog” and a “dog” as a “large puppy”, then which should you keep in your basis set? Perhaps you already have “small”, in which case it makes sense to keep “dog” to get “puppy” for free. Clearly, there are many possible basis sets (trivially, all words together form a basis). We’re just searching for the smallest such set (which may not even be unique!).
All this is to say, this is not an easy problem.
If you’re impatient for a solution (or just want to get the answer from a more reliable source), David P. Dailey has the answer in his paper, The extraction of a minimum set of semantic primitives from a monolingual dictionary is NP-Complete, published in 1986.
Here, I want to present three subtly different ways to solve this problem. While the solutions I will present are perhaps evident to you already (and honestly not that profound), I would like to take this as an opportunity to explore the delightful subtlety that connects various areas of math—something that is getting dangerously close to becoming a theme of this blog.
A dictionary is something that takes knowledge and gives you more knowledge. This is immediately reminiscent of formal logic, where you can take two statements and derive a new statement using an inference rule (we explored much of this theory in my 4-part series on formal logic and automated theorem-proving, Meet the Robinson).
Let each word be associated with a proposition. We can now reason about the
provability of propositions equivalently with the definability of the
associated words. In particular, a definition corresponds to an implication,
where the conjunction of the definition’s propositions together imply the
defined word’s proposition. Thus, “a bird is a winged animal” turns into
winged
∧ animal
→ bird
.
Now, to reason about provability, I used what I knew about resolution-refutation. We want to prove the conjunction of the propositions of all the words in the dictionary by adding as few fresh axioms as possible, where each added axiom is just a lone proposition. One way to do this is to resolution-refute against the negation of this conjunction, searching for clauses that only have terms in negative polarity. The smallest such clause represents the minimal set of axioms needed to make the negation disprovable, and thus the desired conjunction provable.
Notice, however, that we can’t use the full power of resolution-refutation
here, because we are limited to only one incomplete inference rule: modus
ponens. For instance, if P
→ (Q
∨ R
) and Q
→ Z
and R
→ Z
, then we cannot prove Z
via modus ponens even though it is
“obviously” true.
So, rather than “resolving” two clauses, we need to do some other kind of operation, which I’m going to call “dissolve” for lack of a better name. Dissolving is kind of the opposite of modus ponens. To solve the problem at hand, you compute the closure of the set of axioms (along with the negation you want to disprove) under dissolution, and filter out clauses that only have negative terms. Once you’ve saturated the set, the smallest such clause gives you the basis.
import Data.List(nub, union, (\\))
import qualified Data.Set as Set
import Control.Exception.Base(assert)
type Clause = (Set.Set String, Set.Set String)
-- Polarity = Positive Negative
isOnlyNegative :: Clause -> Bool
isOnlyNegative (a, _) = Set.null a
-- Only applies when dissolving (a & b & ...) => e against !x | !y | ...
dissolve :: Clause -> Clause -> Clause
dissolve (a, x) (b, y) =
let u = Set.intersection x b in -- u is probably a singleton set
assert (Set.size u <= 1) -- or so we hope!
(Set.union a (Set.difference b u),
Set.union (Set.difference x u) y)
dissolveAll :: [Clause] -> [Clause] -> [Clause]
dissolveAll x y =
let apex =
filter isOnlyNegative $ nub [dissolve b a | a <- x, b <- y] \\ x in
if null apex then
[]
else
apex ++ (dissolveAll (x++apex) apex)
dissolveStream :: [Clause] -> [Clause]
dissolveStream cs =
let bad = [(Set.empty, Set.unions $ map fst cs)] in
dissolveAll (cs++bad) bad
solutions cs =
map snd $ filter isOnlyNegative (dissolveStream cs)
-- You gotta know when to fold 'em.
bestSolutions [] n = []
bestSolutions (s:ss) n =
if n<0 || Set.size s < n then
s:(bestSolutions ss (Set.size s))
else
bestSolutions ss n
solve d = bestSolutions (solutions d) (-1)
mkdfn (a:as) = (Set.singleton a, Set.fromList as)
w = map mkdfn [
"water":["hydrogen", "oxygen", "fire"],
"oxygen":["plant", "water"],
"fire":["oxygen", "heat", "hydrogen"],
"wood":["plant", "soil"],
"plant":["water", "time", "soil"],
"soil":["nitrogen", "time"],
"nitrogen":["hydrogen", "heat"],
"heat":["time"]
]
main = putStrLn $ show $ solve w
Against all odds, this seems to work, and with a little thought you can probably convince yourself that it will always work.
What else lets us take things and make new things? We talked about statements in logic, so how about the other kind of mathematical statement: the equation? We can certainly combine equations to derive more knowledge, so it doesn’t sound far-fetched to model words with variables and definitions with equations, and then reason about the solvability of the equations (rather than the provability of the terms).
More formally: assign each word a unique positive integer and a variable. Construct linear equations using the sum of the definition-words on the left-hand-side, and the word-to-be-defined with the correct coefficient on the right-hand-side. So, if a bird=4 is a winged=2 animal=3, then construct the equation “winged + animal = (5/4) bird”, which is clearly true.
How does this help?
Well, solvability of linear equations is well-studied stuff. If you encode the equations in a matrix, and convert it to reduced-row echelon form using Gauss-Jordan elimination, the result is a matrix where
Thus, if we supply “values” of all variables in the latter rows, then we can deduce values of the variables in the former. Since the Gauss-Jordan elimination procedure minimizes this set, we’re already done.
from __future__ import division
import sympy
n_cache = []
def numify(g):
def n(x):
try:
return n_cache.index(x)
except ValueError:
n_cache.append(x)
return len(n_cache)-1
return [map(n, r) for r in g]
def mktrix(g):
count = len(g)
m = []
for d in g:
row = [0]*(len(n_cache)+1)
for k in d[1:]:
row[k] = 1
row[d[0]] = -(sum(d[1:])+len(d[1:]))/(d[0]+1)
m.append(row)
return m
def solve(m):
return sympy.Matrix(m).rref()[1]
def basis(g):
return set(n_cache) - {n_cache[x] for x in solve(mktrix(numify(g)))}
test = [
["water", "hydrogen", "oxygen", "fire"],
["oxygen", "plant", "water"],
["fire", "oxygen", "heat", "hydrogen"],
["wood", "plant", "soil"],
["plant", "water", "soil"],
["soil", "nitrogen", "time"],
["nitrogen", "hydrogen", "heat"],
["heat", "time"]
]
print basis(test)
But: Gauss-Jordan elimination is O(n^{3}) and Dailey claims this problem is NP-complete. Clearly, we haven’t just proven that P=NP. So what gives?
It turns out that we made a subtle extra assumption when encoding a “provability” problem in terms of a “solvability” problem. Logical implications are one-way: if P implies Q, then Q does not imply P. However, the linear equations we’re working with are not one-way! If you can solve for x in terms of y, then you can usually also solve for y in terms of x.
So, for example, given definitions of “bird” and “winged”, and the relationship “a bird is a winged animal”, the linear equation model assumes we can deduce what an “animal” is. And in many cases, this is a reasonable assumption: indeed, it’s kind of how the game Jeopardy! works: “if you add wings to this, you get a bird” is not the worst way to hint at the word “animal”. It relies, in the end, on what you mean by “define”.
That being said, I do want to note that this kind of reasoning can lead you straight to deriving Buckingham’s Π Theorem in dimensional analysis (we discussed this theorem in detail in a recent article). This fact shouldn’t come as a surprise to anyone: dimensional analysis, too, deals with getting more knowledge from existing knowledge. The reasoning shown here can help identify redundant units, and thus discover the “fundamental” kinds of quantities in the physical world. Do we necessarily need units for frequency and for time?
In our hearts, we always knew this was a graph theory problem, and so naturally I feel compelled to present a graph-theoretic solution. This is the one Dailey presents in his 1986 paper cited above.
Consider a graph where each word corresponds a vertex, and a directed edge travels from a word to another if the former is used in the definition of the latter. Naively, we may imagine that all we need to do is find all the leaves of this graph. However, this doesn’t work in the presence of cycles (if “GNU” is defined as “GNU’s Not UNIX”, then you definitely want “GNU” in your basis set).
What we really seek is the smallest possible set of vertices such that:
In other words, what we want is the minimum feedback vertex set of the graph, which is one of Karp’s 21 NP-Complete problems. This is unfortunate, but on the bright side, computers are fast and so we can at least hope to find solutions.
There exist a bunch of algorithms to compute the FVS, but I chose to stick with the theme of logic by using an SMT solver (inspired by this). That is, I encoded each vertex with (1) a “guard” bit that says “am I part of the basis”, and (2) an integer that corresponds to an index into a topological sort. Then, for each pair of vertices, I check that if neither vertex is part of the basis, then the topological indices obey any edges between the vertices. This gives me a cheap cyclic-ness check that is easy to execute symbolically because it is nonrecursive. Finally, using Rosette to run the algorithm symbolically, I can solve for guard bits and indices that make the graph cycle-free. All that’s left is to minimize the number of guard bits that are “true”, which is supported by Z3 via a “minimize” call.
#lang rosette
(define graph
'((water hydrogen oxygen fire)
(oxygen plant water)
(fire oxygen heat hydrogen)
(wood plant soil)
(plant water time soil)
(soil nitrogen time)
(nitrogen hydrogen heat)
(heat time)
(hydrogen)
(time)
))
(define (cycle-free? graph)
(andmap
(λ (vertex)
(define v (car vertex))
(define children (cdr vertex))
(define topology (cdr (assoc v s-cache)))
(andmap
(λ (c)
(define topology+ (cdr (assoc c s-cache)))
(if (or (car topology) (car topology+))
#t
(> (cadr topology) (cadr topology+))))
children))
graph))
(displayln "Initializing...")
(define s-cache
(map (λ (v)
(define name (car v))
(define-symbolic* guard boolean?) ; am I part of the initial spawn
(define-symbolic* index integer?)
(list name guard index))
graph))
(define fvs-count (apply + (map (λ (x) (if (cadr x) 1 0)) s-cache)))
(displayln "Solving...")
(define sol
(optimize #:minimize (list fvs-count)
#:guarantee (assert (cycle-free? graph))))
(if (unsat? sol)
(displayln "dude you failed")
(append
; leaves
(map car (filter (λ (x) (= (length x) 1)) graph))
; feedback vertex set
(map car
(filter
(λ (x)
(define u (evaluate (cadr x) sol))
(and (not (constant? u)) u))
s-cache))))
Tim scraped the Scratch Wiki’s workaround database for me, and though the scraping is dodgy at best, my analysis showed that of the 103 blockspecs mentioned, only about half are needed (Brian, of course, pointed out that in Snap! all you need is lambda). So that answers that question. As for the English language, well, I would almost certainly need a bigger computer to deal with millions of words. I guess I could scrape from GNU’s free dictionary, but that’s a project for another day.
Meanwhile, I want to note that this procedure doesn’t allow one word to have multiple equivalent definitions, unlike both the propositional-provability approach and the linear-algebra approach.
If nothing else, I loved this little question because it forced me to think deeply about how to model something new and unfamiliar in terms of math I already knew.
My chemistry teacher used to put at least one “toolbox” question on his tests. A toolbox question is one that doesn’t imply what tool you should use to solve it: unlike standard textbook stoichiometry problems that can all be solved with the same trick, a toolbox problem forces you to look at your toolbox and pick the tools yourself.
Well, in the same spirit—and I’ve used this metaphor before—I feel that high school mathematics has given me an incredible set of tools, but not nearly enough opportunities to use them. Sure, I’ve often had to solve linear equations and prove theorems in propositional logic: but I have rarely chosen to do so in order to solve a problem.
Maybe that’s why I care so much about finding math in the world around me.
]]>I’m taking AP Music Theory this year, and I’ve been spending some time exploring the number theory of music. This post is just a quick check-in on my thoughts so far, and an explanation for why I’m trying to build a 19-note marimba in my spare time.
A piano has 12 notes between consecutive C’s: C, C#, D, D#, E, F, F#, G, G#, A, A#, B, and C again. A good question to ask is, “why?”
Well, it comes down to how your ear works. There is plenty of literature on this (I recommend this blog post or this video), but the main takeaway is that for a variety of reasons related to the structure of your cochlea, your brain likes to hear frequencies that have nice whole-number ratios. For example, a C is 256 Hertz while a G is 384 Hertz, and 384/256 = 3/2 which is a nice whole-number ratio. Indeed, if you play a C and a G together, it sounds pretty good Musicians call this a “perfect fifth”, and one musician in particular liked it so much that he built a tuning system around it.
Pythagoras—yes, the one with the triangle theorem—figured that using the ratios 2:1 and 3:2, you can build a whole bunch of notes that sound good together (it’s easy to do this kind of tuning for string instruments, because the lengths are really easy). You do this by going up by the 3:2 ratio, and then going down by a factor of 2 when it gets higher than 2. So, you get 1, 3/2, (9/4)/2 = 9/8, and so on. As a musician, you might recognize this as “going up the circle of fifths”.
We can then ask, “will going up the circle of fifths ever get us back to our root note?” Obviously, the answer is no: to get back to the root note you need to get a ratio of 1 again. But, every time you multiply by 3/2 you add a power of 3 to the numerator, which will never cancel with the denominator. That is, we are searching for an integer solution to the equation ($ (3/2)^x / 2^y = 1 $) which clearly does not exist.
However, we can get close! Recall from A Balance of Powers that we can compute really good approximate solutions for these equations by taking logaritms and using Diophantine approximation techniques. The first convergent yields x=5 and y=3, which gives us a ratio of 243/256, or around 0.95. This is pretty bad: the difference is clearly audible (in fact, it’s approximately the difference between a C and a C#). Nevertheless, the notes you get by this process (C, G, D, A, E) form something very familiar: the pentatonic scale.
In any case: the second convergent yields x=12 and y=7, which gives us a ratio of 531441/524288, or around 1.01. This is much better—it’s much harder to hear the difference!
So, twelve fifths gets us reasonably close to the root note: and that’s where our 12-note chromatic scale comes from.
Unfortunately, there were problems with this scheme. The most urgent problem was that it lacked symmetry: you can only play in “C”. If you start a song on “D”, everything sounds awful because the ratios don’t work the same relative to “D”. In theory we could start all our songs on “C”, but that makes it hard for singers with different ranges, and also prevents composers from modulating to a different key.
With this issue in mind, people tried to come up with alternative tunings. One such alternative was “well-tempering”, which is what Bach really loved. He wrote The Well-Tempered Clavier to show off well-tempering: he has a prelude and fugue in each key, and I’m told that that each piece takes advantage of the subtle tuning tendencies of its respective key.
The big winner turned out to be “even tempering”. Even tempering spaces out the 12 notes of the scale evenly. Since the top note is twice the frequency of the root note, each consecutive pair of notes in the even-tempered scale differs by a ratio of ($ 2^{(1/12)} : 1$), which is around 1.06. Crucially, the difference between “C” and “D” is the same as that between “D” and “E”, so if you start a song on “D” instead of “C”, everything still fits. Furthermore, even tempering frequencies don’t stray too far from the Pythagorean frequencies, so everything still sounds good! Well, except for the occasional connoisseurs who complain that they can hear the difference.
But, we have strayed from the original goal: we wanted lots of simple whole-number ratios. Does the 12-note even tempering achieve that?
Well, I wrote a Python program that calculates a “pleasantness score” for each n-note even tempering. The pleasantness score takes a set of whole-number ratios (Pythagorean ratios, mostly), and returns the total error a tempering achieves with respect to those whole-number ratios. Clearly, a million-note even tempering would have a very low total error, because it is off by at most a millionth for each ratio.
>>> ratios = [3/2, 4/3, 5/3, 7/5] # for example
>>> def score(n):
>>> ... total = 0
>>> ... for r in ratios:
>>> ... total += min([(2**(1.0*i/n) - r)**2 for i in range(n)])
>>> ... return total
>>>
>>> for i in range(2, 32):
>>> ... print i, '\t', '%10.2f' % (1/score(i))
n 1/score
--------------
2 12.85
3 25.68
4 69.78
5 80.27
6 51.99
7 252.74
8 282.14
9 182.01
10 413.60
11 249.02
12 2293.78
13 267.00
14 811.75
15 1277.37
16 711.44
17 785.59
18 550.84
19 5030.16
20 1150.78
21 901.15
22 2531.40
23 1238.07
24 2293.78
25 1182.83
26 2645.21
27 6829.60
28 1274.03
29 4398.47
30 1520.35
31 14003.10
It turns out that for almost any set of simple ratios you pick, 12 notes has a significantly better score than anything before it! Note that after 12, you can do better with a 19-note scale, and even better with a 31-note scale. I do not know why 12, 19, and 31 work out so well: but they do. 19 and 31 in particular are very nice because they are prime. So, you can do a “circle of fifths” with just about any ratio: a “circle of thirds”, for example. Both 19- and 31- note even temperings were explored by Joel Mandelbaum in the 1900s (so, I’m not the first person to have figured this out).
If you’re curious, you can find a piece of 19-note music, Truth on a Bus, here and a piece of 31-note music, “Music fur die 31-Stufige Orgel”, here, which is played on a 31-note Fokker organ.
And that is why we’re currently building a 19-note marimba for our jazz band.
]]>Editor's note: This post was automatically converted
from a LaTeX document. While some attempts have been made to preserve the
original layout, it is impossible to faithfully reproduce the original
formatting. If you would like a copy of the PDF version, please contact
me.
In school, many of us learn about dimensional analysis as a way to convert between various units. There is, however, far more to this humble technique. In this paper, I would like to present a smörgåsbord of dimensional analysis pearls that I have found over the past few months.
Many people argue that the metric system is better than the imperial system. This argument, however, is usually predicated on the relationship between various units—powers of ten are more easy to manipulate because we are taught arithmetic in base ten; additionally, the uniformity makes it easier to remember how to convert from centimeters to meters than from inches to furlongs. If we only consider units in absolutes, however, then is a meter really ‘better’ than a yard?
The theme of this paper is that there is no obvious quantity to use as a fundamental unit of length, mass or even time. In fact, this poses problems in certain domains where we cannot appeal to convention to establish units. For example: on the Voyager 1 probe, NASA placed a golden record designed to be found by extraterrestial life. Inscribed on this record, NASA provided Sagan-esque instructions for playback. The time taken for one rotation of the record is speciﬁed in terms of the period associated with the fundamental transition of the hydrogen atom [10].
“The fundamental transition of the hydrogen atom” is certainly more reasonable to explain to intelligent life than, say, a second (which would force the extraterrestial life forms to somehow measure the time Earth takes to complete its orbit). However, this choice still seems a little arbitrary^{∗}.
Why aren’t there obvious fundamental units? We can answer this question with a philosophical observation: nobody knows the absolute size of anything. For all we know, the universe could be really small, and us smaller still. All is not lost, however. Clearly, some properties of the world remain constant regardless of how we choose to measure them. I cannot become richer by measuring my income in cents rather than dollars. Percy Bridgman, a physicist who studied the properties of materials under extremely high pressure, stated this property eloquently in his 1931 treatise [3]:
[T]he ratio of the numbers measuring any two concrete examples of a secondary quantity shall be independent of the size of the fundamental units used in making the required primary measurements.
This statement, known as Bridgman’s Principle of absolute signiﬁcance of relative magnitude, is really just an expression of humility: nature is indifferent to our choice of units [11, 8]. That is, the period associated with the fundamental transition of the hydrogen atom will always be $2.23\times 1{0}^{-17}$ times the period taken by Sol 3 to orbit the sun, regardless of which system of units are used to measure the two quantities. Units are arbitrary; ratios are invariant.
Recall the formula for measuring the area of a rectangle. Assuming that the $x$ and $y$ axes represent values with units, do the units of area in this system correspond to the units of integration you found in the previous question? To you, does this feel like compelling evidence that the Fundamental Theorem of Calculus is true?
It seems obvious that representing a number in a different set of units leads to a different answer. An interesting question to ask, however, is: are there meaningful calculations whose results do not change when evaluated with different units?
Surprisingly, Buckingham’s $\Pi $ Theorem can be used to deduce the fine structure constant. This constant is approximately $1\u2215137.036$, and has no units (like $\pi $). It can be computed using a formula involving the charge of an electron, the speed of light, Planck’s constant, and the capability of vacuum to permit electric field lines. Surprisingly, the calculation works regardless of which units are used. Richard Feynman called it [6] “one of the greatest damn mysteries of physics: a magic number that comes to us with no understanding by man.”
To make his estimate, Taylor used two photographs of the explosion, which he anecdotally took from the cover of LIFE magazine. Taylor asked, how does the radius of the blast grow with time? The other relevant factors are energy and the density of the surrounding medium (i.e. air). Find images of the Trinity blast that have timestamps (in milliseconds!) and a scale in meters^{∗∗}. Using the Buckingham $\Pi $ Theorem, estimate the amount of energy released by Trinity. Even getting this value right to within an order of magnitude is quite impressive.
…there is to be one measure of wine throughout our kingdom, and one measure of ale, and one measure of corn, namely the quarter of London, and one breadth of dyed, russet and haberget cloths, that is, two ells within the borders; and let weights be dealt with as with measures.
One of the reasons for this clause was that due to the proliferation of too many systems of units, merchants could cheat their customers into paying more money for less goods.
Consider the following fictitious system, consisting of the units wizard and elf , with the conversion factor
A merchant announces that for convenience, he will introduce the unit hobbit. He provides the following conversion factors to his customers:
You want to trade your eight $8\phantom{\rule{0.3em}{0ex}}{\text{wizard}}^{4}$ for some ${\text{elf}}^{2}$ in exchange. You perform the following calculation to determine the conversion:
The merchant disagrees, presenting his own calculation:
[1] Mars climate orbiter mishap investigation board phase I report, 1999.
[2] Batchelor, G. The Life and Legacy of G. I. Taylor. 1996.
[3] Bridgman, P. Dimensional Analysis. 1931.
[4] Buckingham, E. On physically similar systems; illustrations of the use of dimensional equations. Phys. Rev. 4 (Oct 1914), 345–376.
[6] Feynman, R. QED: The Strange Theory of Light and Matter. 1985.
[8] Pienaar, J. A meditation on physical units, 2016.
[9] Rayleigh, J. W. S. On the question of the stability of the ﬂow of ﬂuids. Philosophical Magazine 4 (1892), 59–70.
[10] Sagan, C. Murmurs of Earth. 1978.
[11] Sonin, A. A. The Physical Basis of Dimensional Analysis. 2001.
[12] Summerson, H. The 1215 magna carta: Clause 35.
[13] Taylor, G. I. The formation of a blast wave by a very intense explosion. II. the atomic explosion of 1945. Proceedings of the Royal Society of London 201, 1065 (1950), 175–186.
[14] Torczynski, J. R. Dimensional analysis and calculus identities. The American Mathematical Monthly 95, 8 (1988), 746–754.
^{∗}There have in fact been efforts to standardize units based on fundamental physical constants like the speed of light in vacuum: such units are called ‘natural units’.
^{†}This problem was suggested by my friend David.
^{‡}To prevent such software errors, programming languages such as [5] ‘remember’ the units associated with a number and enforce that certain operations only happen on commensurable values. This generalizes to non-unit-like data types as well (for example, preventing you from dividing a number by an image, which makes no sense).
^{§}You might have noticed that the number 6.28 is roughly $2\times \pi $. Using some basic physics, we can explain why $\pi $ is involved.
^{¶}The idea had actually been around for almost 50 years before Buckingham published his 1914 paper about it. In particular, Rayleigh’s use of dimensional analysis to calculate the Reynolds number—an important constant used to study the motion of fluid in a pipe—was published in 1892 [9] and is now a classic textbook example. Buckingham’s use of the $\Pi $ symbol in this paper is what gives the theorem its name.
^{∥}Such a model is said to have similitude with the real version.
^{∗∗}Hunting for pictures is part of the fun, but if you get stuck, here are two hints:
All these images are in the public domain since they were taken as part of a federal government program.
I just realized that TJCTF happened a really long time ago, and so it’s probably okay for me to share these writeups. Enjoy.
A CTF is a rough road
When you play on hard mode
My tactics are frugal
My weapon is Google
All I do is search for code
The time is 2:00am. In the distance, you can hear the city breathe softly. It is the sound of night: when the hackers like you rise from their sleeping bags to reclaim what is rightly theirs.
You unholster your keyboard and turn off the safety.
You’re trying to take a discrete log in an elliptic curve’s group: a gnarly beast so powerful, it is used by the Browsers to protect User Privacy. You wonder briefly who put you up to this gig, and make a mental note to charge them overtime. Discrete log? What are you, some sort of lumberjack?
There are two kinds of hackers: those that hack in style and those that get the job done. You are the latter.
You rev up the ol’ browser and go on a wild-Google-hunt for attacks on elliptic curve discrete logs. Cryptographic protocols come and go, you say to nobody in particular, but Google is here to stay. You swerve past some lecture notes and duck a Wikipedia article; before anyone can say “Pollard Rho”, you find Matthew Musson’s paper[0] which mentions a whole variety of attacks on the elliptic curve discrete log. “Sweetheart,” you say, pointing to the bean-shooter in your holster, “me and my friend here think you’d better start talking.”
The pages read like molasses; progress is slower than Windows XP. But just as the rays of dawn begin to reach over the netscape to reveal golden cloud computers, your luck changes. Turns out that there’s an attack on the elliptic curve discrete log problem which works when the size of the elliptic group is equal to the size of the field over which the elliptic curve lives (a so-called “anomalous” curve). The attack’s by someone who thinks they’re Smart—and you ain’t complaining.
You call up your friend Sage; you need to pull a favor. Sage confirms that your curve is in fact anomalous. You tell him you owe him a drink and hang up. What are friends for?
You don’t mind a reasonable amount of trouble. But soon you realize that this is not a reasonable amount of trouble. This is a downright unreasonable amount of trouble, and you ain’t takin’ it. You’re way too lazy to implement this attack on your own. But you know someone down at Github’s probably knows how to whip it up. You order an “elliptic anomalous curve attack”, and Github obliges[1]. You tip the guy a star and drive off in the boiler before anyone can ask any questions.
Back at the office, you substitute in your curve’s parameters, and turn the crank. Out comes a hex-encoded string shaped suspiciously like a flag.
“What is it?” asks Sage.
“Oh,” you say, “just the stuff that dreams are made of.”
You’re good. You’re very good.
[1] https://gist.github.com/elliptic-shiho/e76e7c2a2aff228d7807
Flag à la Grandma’s Cookies
Preparation time: 3 hours Serves: A team of 4
Ingredients:
Step 1. Prepare a fresh exploit payload in JavaScript for the user to execute. You want the flag’s address. Poisoning the user’s cookie to be yours would work because the “recent” beacon would update when the user navigates away from the flag page.
Step 2. Insert the exploit payload in a script tag, knowing that showdown doesn’t sanitize HTML. Report it so that the simulated user views it. See nothing. Realize that there’s a very strict CSP in place that prevents inline JavaScript.
Step 3. Realize that the only way to inject the payload is by having it loaded by existing code on the same origin. Realize that the “raw” url allows you to serve an XSS payload on the same origin.
Step 4. Submit your payload to the markdown renderer, then save the URL. Create a fresh submission that creates a script element whose source is that URL. Submit and report. See nothing. Realize that inserting an element counts as inline JavaScript, which the CSP hates.
Step 5. Notice that the client-side viewer code loads the markdown renderer dynamically based on the value of a DOM element. Create a doppelganger element with the same ID as the renderer-selection dropdown. Since this element is spliced in before the actual dropdown, it gets selected by the viewer code. Set this element’s value property to the URL of your payload from (4). Submit and report.
Step 6. Copy the URL that appears in the list of recently viewed pages. Visit the URL to obtain the flag.
Step 7. Serve flag while hot (do not hoard). Bon appétit.
Once upon a time, there was a piece of Java bytecode. That Java bytecode felt incomplete and unloved, but then she met a CTF team willing to help it. Though busy seeking flags in foreign lands and CPU architectures, the CTF team felt a moral obligation to help, and so it tried to help the Java bytecode discover her true identity. However, it was too difficult to understand her in her current form. So, the CTF team converted her to a format recognized by the Jasmin assembler [0] and found her source.
Looking at the source, the CTF team knew exactly where she came from: even without the four magic numbers she had forgotten, the team recognized the undeniable signs of her being an md5 implementation which chunked input matching [a-z0-9] in blocks of 5 and concatenated the checksums.
Happy at last with her new identity, it sent off the Java bytecode to find new adventures in an exciting, beautiful world. But one task remained.
In under an hour, a parallelized md5 cracker revealed each of the twenty cleartext blocks that produced the hash embedded within the Java bytecode’s soul. It was, unmistakably, the Flag the team had sought for so long.
Moral: Kindness is always repayed.
]]>Welcome to the 134th Carnival of Mathematics, a monthly roundup of mathematics-related blog posts organized by The Aperiodical. The previous edition was hosted by Matthew at Chalkdust Magazine.
Tradition obliges me to begin with some facts about the number 134.
One of my best friends works backstage at my high school’s theater. He has a grueling and often thankless job, which involves staying backstage till midnight painting sets only to watch the actors get the applause the next evening. But he does it anyway, because he honestly loves it.
He reminds me of the number 134: the stage tech of numbers, enabling primes to shine without ever being recognized as one.
134 is the average of two consecutive odd primes, and both its double and its square are one less than a prime. The number 134^{4}+134^{3}+134^{2}+134^{1}+134^{0} is prime, as is 10^{134}+7. The sum of the first 134 primes divides the product of the first 134 primes, and the sum of squares of 134’s prime factors… is prime.
Yet a number does not have to be prime to be special. 134 redeems itself by having the beautiful property,
\[ \binom{1+3+4}{1} + \binom{1+3+4}{3} + \binom{1+3+4}{4} \]
And now, the Carnival.
Writing this edition of the Carnival felt like writing a series of book reviews, but instead of dull Victorian literature, I got to review exciting mathematics written by fellow bloggers around the world. To celebrate this, I’d like to begin this roundup with a rather thought-provoking book review—of an actual book.
I received a book Really Big Numbers by Richard Schwartz for review. I was supposed to write the review a long time ago, but I’ve been procrastinating. Usually, if I like a book, I write a review very fast. If I hate a book, I do not write a review at all. With this book I developed a love-hate relationship.
The same blogger also writes about her feelings towards math, and I would like to mention one such article.
I remember I once bought a metal brainteaser that needed untangling. The solution wasn’t included. Instead, there was a postcard that I needed to sign and send to get a solution. The text that needed my signature was, “I am an idiot. I can’t solve this puzzle.” I struggled with the puzzle for a while, but there was no way I would have signed such a postcard, so I solved it. In a long run I am glad that the brainteaser didn’t provide a solution. …[T]here are no longer any discoveries. There is no joy. People consume the solution, without realizing why this puzzle is beautiful and counterintuitive.
Perhaps appropriately, the next entry is a puzzle, and its creator (who describes himself as a “stand-up mathematician”!) admits to not knowing the answer.
For completeness, I also include an article about solving a puzzle—about the joy of discovery and the frustration of exploring large search spaces.
I’m a software developer myself and so the most interesting part for me was finding shortcuts to the answer. While a computer can find the solution by brute force fairly quickly, it still takes a bit of time as there are 3,628,800 possible answers to check. … I read a little further, and found out that indeed there were shortcuts that could be made—this is the power of mathematical thinking in problem-solving: a monumental task can become considerably easier by simply applying a little bit of thought.
As we all know, mathematics isn’t just about solving brain-teasers. With that in mind, here are some practical applications of math.
What’s the most effective strategy for loading an airplane? Most airlines tend to work from the back to the front, accepting first the passengers who will sit in high-numbered rows (say, rows 25-30), waiting for them to find their seats, and then accepting the next five rows, and so on. Both the airline and the passengers would be glad to know that this is the most effective strategy. Is it?
On a less serious but equally practical note, we can apply math to almost any game you can think of.
From Risk to tic-tac-toe, popular games involve tons of strategic decisions, probability and math. So one happy consequence of being a data nerd is that you may have an advantage at something even non-data nerds understand: winning.
While applying math to games is practical (and perhaps even lucrative!) we all know that the proof of the pudding lies… in proofs. Proofs are there to reassure us that a statement really is true.
Pythagoras’s Theorem is perhaps the most famous theorem in maths. It is also very old, and for over 2500 years mathematicians have been explaining why it is true.
Thinking about it, perhaps I lied to you. Often, the most beautiful proofs are there, not to reassure us, but to explain why a seemingly ridiculous statement might actually be true. Here is one such outrageous statement and its proof.
This code, somewhat surprisingly, generates Fibonacci numbers.
def fib(n): return (4 << n*(3+n)) // ((4 << 2*n) - (2 << n) - 1) & ((2 << n) - 1)
In this blog post, I’ll explain where it comes from and how it works.
However, proofs are often written in scary, technical language with no intuition behind them. In reality, a proof is a narrative, and what better way to present a narrative than a story? The next entry takes on the challenge.
This may seem much harder than just counting the crossings, but it works well on an abstract level—that is, when we don’t know what the drawing looks like, but still have to prove things about it. This is what makes the theorem and its proof beautiful—we will prove there is something common to infinitely many different drawings of K_5. … This presentation is a first attempt at a response to the criticism: “Why don’t mathematicians or scientists ever write the way they think?” In fact, this “unnecessarily” long exposition does not mean that it is difficult; my hope is quite the opposite.
While an appreciation for proofs is healthy for any mathematician, some fascinating mathematics can emerge from seat-of-the-pants conjectures, calculations, and manipulations.
For most people it’d be bad news to end up with some complicated expression or long seemingly random number — because it wouldn’t tell them anything. But Ramanujan was different. Littlewood once said of Ramanujan that “every positive integer was one of his personal friends.” And between a good memory and good ability to notice patterns, I suspect Ramanujan could conclude a lot from a complicated expression or a long number. For him, just the object itself would tell a story.
While we’re on the topic of biographies, here’s another interesting character in the history of math.
…Shannon built a machine that did arithmetic with Roman numerals, naming it THROBAC I, for Thrifty Roman-Numeral Backward-Looking Computer. He built a flame-throwing trumpet and a rocket-powered Frisbee. He built a chess-playing automaton that, after its opponent moved, made witty remarks. Inspired by the late artificial-intelligence pioneer Marvin Minsky, he designed what was dubbed the Ultimate Machine: flick the switch to “On” and a box opens up; out comes a mechanical hand, which flicks the switch back to “Off” and retreats inside the box.
Often, the most brilliant mathematicians are not the most adept at solving equations, but the most insightful in finding math in places you may never have dreamed of.
Several students came to speak with me about the essay, including the second author. He told me that he just didn’t see any symmetry around him. At the time, we were standing in a building (the cafeteria) with sine-curve shaped ceiling beams arranged in parallel, supported by arms with 4-fold rotational symmetry. The light fixtures were hemispherical, with slits cut out in in an 8-fold rotational pattern. Parallel sliced kiwi fruits were being served, with their near rotational symmetry. I merely pointed these things out, saying that we are surrounded by symmetry, all we need to do is look and listen.
And, for a completely different perspective, here’s a lighthearted webcomic to bring a humorous end to this edition of the Carnival of Mathematics.
That’s all for this edition of the Carnival of Mathematics! Join us next time for the 135th edition, hosted by Gaurish4Math.
]]>I needed a fun little project to tinker with while exploring tosh, and I settled on implementing regexes in Scratch. Because that’s clearly a very practical idea with no problems at all.
Regular expressions are like magic. If you haven’t come across them yet, you soon will. They’re a way to match text against patterns, and you can use a regex to teach a computer what, for example, an email address “looks like”. I’m going to assume that if you’re reading this, you have some idea of how to use regexes; but if you don’t, perhaps the aforementioned link could help you get up to speed.
Since regexes are like magic, they’re also one of those things that many people know how to use, but fewer people know how to implement, and even fewer people know how to implement efficiently. Hopefully by the end of this article, we’ll be part of that last group.
But first, something fun for you to play with: check out this Scratch project for a live demo.
Let’s talk about finite-state machines. Finite-state machines are kind of like a subway system. Suppose you’re heading back home after visiting the Scratch Team at MIT, and so you’re at Kendall Station on the MTA. Maybe you want to get to Haymarket.
(I promise this gets relevant to regexes soon.)
You could head north towards Alewife or south towards Braintree. Let’s head south. You pass Charles/MGH and then you’re at Park St. Now you have a choice: you can either continue on the red line towards Harvard, or you can switch over to the Green Line and go to west to Boylston or east to Haymarket (which is your destination).
Now imagine each little stretch of subway between two stations has a letter associated with it. So, maybe Kendall to Charles is “B” and Charles to Park St. is “A”, and so on. But, to be clear, the letters belong to the tracks joining two stations, not to the stations themselves. You go take a ride on the subway and read off the letter at each stretch.
You have now re-enacted what we call a “finite-state machine”. Instead of subway stations, we have states, and instead of stretches of tracks, we have edges.
Some details: the edges need to be directed, so you can’t go both ways on an edge. This makes sense with the subway analogy—each individual track only heads in one direction (otherwise the trains would collide!). To keep track of this, we draw them with arrows rather than lines.
Here’s a simpler picture of a small part of the T that we care about, then:
A few things to notice here: “letters” don’t need to be from the alphabet (“!” is a valid “letter”). You can use the same letter on multiple edges. And, finally, placement doesn’t matter. Haymarket isn’t actually west of Downtown.
Haymarket is double-circled because it’s the final destination.
Now, the question is, if you get from Kendall (start) to Haymarket (finish), what words can you form by reading off the letters in order? Well, you need to read off “B” and “A” to get to Park. But then you have a choice: you could go directly to Haymarket and read off a “!”, or you could go to Downtown and back and read off “N” followed by “A”. You can’t go to Boylston because there’s no way to get back to Haymarket from there.
And so some sequences you could read off are “BA!”, “BANA!”, “BANANA!”, and…
you might have already guessed that this matches the regex BA(NA)*!
.
Interesting.
So it turns out that you can turn any finite state machine into a regular expression, and, more importantly, vice-versa. If you think of finite state machines in terms of the set of words they admit and regular expressions in terms of the set of words they match, then each FSM can be paired with an equivalent regex (but that regex isn’t necessarily unique!).
So to match a string against a regex, you really just need to match it against an equivalent FSM, which seems like a much easier thing to do since they’re so much more visual.
The way to match an FSM is to start with a coin on the start state (Kendall). Then, you read off the letters from the string you’re trying to match. For each letter, you look at all the coins on the FSM. For each coin, you either move it to an adjacent state if they’re connected by an edge with that letter, or you remove it from the FSM if there is no such edge (if there are multiple edges with the same letter, then you have to “clone” the coin and put a copy on each edge’s target—this is called a nondeterministic FSM, and more on this later).
If, when you’re done reading the string, a coin ends up in your final (“accepting”, double-circled) state then you win. That coin represents a way to travel the subway so that you read off that exact string, so your string must clearly be accepted by the FSM (and therefore the regex it’s equivalent to).
I suggest trying this process yourself with the “BANANA!” example.
And so this leaves one more question: how do you turn a regex into an FSM? This
turns out to be pretty easy.
Thompson gives an
algorithm to do this, but it’s really simple and I urge you to try and figure
it out yourself. The best way is to take each of the regex primitives like *
and |
and ()
and figure out how you can compile each one down individually.
It’s a fun exercise, though there are some annoying subtleties with the empty
string that you might have to deal with.
You should also think about how you can turn other regex features like +
and
charsets into simpler regexes that just involve the above-mentioned 3 things.
Can you compile down backreferences? Lookaheads?
Anyway, there’s also a way to turn the resulting nondeterministic FSM into a deterministic one (i.e. no coin-cloning) using what’s called the powerset construction. This creates a whole lot of new states, though, so it’s usually better to just make peace with coin-cloning. And there’s a host of algorithms to make deterministic FSMs smaller.
And that’s basically how my Scratch project works! There’s a small JavaScript program that takes a regex and outputs an FSM as a data structure. A second program then “flattens” the FSM into something that looks kind of like assembly (you could call this the “link” phase of the compilation). The assembly-ish stuff is just a flat, linear representation of the FSM in a Scratch list, where instead of arrows, I have the address of the state the arrow points to. Then a simple Scratch program written with Tosh goes and interprets that assembly using the coin-shunting technique. The whole process is very much like compiling a C program and running it on a processor, in fact.
Now that we know a fair bit about how to implement regexes, I want to talk
about a cool application of FSM theory: in particular, a nice little result
that says that you cannot write a regular expression to match nested
parentheses (so []
, [[]]
, [[[]]]
all match but ][[]
doesn’t).
First, go ahead and try to write such a regex to convince yourself that you
can’t. No “cheating” using fancy stuff like backreferences and whatnot: I’m
talking about “pure” regexes involving only *
and |
and ()
.
Convinced? Good.
We’re going to prove that you can’t do this.
Recall that any regex can be turned into an FSM. So let’s suppose we had a
regex for matched parentheses, and we turned it into an FSM. Now, remember how
FSM stands for “finite-state machine”? That means it, uh, has a finite number
of states. Which means, if it has 100 states and you give that FSM the sequence
of 51 (
s and 51 )
s (for a total of 102 characters), then you must visit
some state twice. It’s the same argument as if you have 102 pigeons and 100
holes, you must have a hole with more than one pigeon in it.
And so if you loop back to a state, you can clearly loop back to that state again! So there must be some subsequence in your string that can be repeated while still matching the FSM. For example, in “BANANA!”, you can repeat the “NA” as many times as you want. This fact is called the pumping lemma (not to be confused with the pumping llama, which is a fuzzy weightlifting camelid).
Image by Kimberly Do.
Now, let’s think about the ()
language (called the Dyck
language). What part of it can
you repeat? If you repeat the (
s then you have too many (
s. If you repeat
the )
s you have too many )
s. And if you repeat something with both (
s and
)
s, you clearly don’t follow the (...(())...)
structure anymore.
So… no such regex can exist.
Isn’t that a great argument? I think it’s very cool.
So, let’s recap. Regular expressions correspond to finite-state machines, which
are souped-up subway systems. You can implement finite-state machines as a kind
of automatic board game which also somehow involves cloning currency. Finally,
the finite-ness of finite-state machines lets us prove a llama lemma
which lets you show all sorts of interesting facts about regexes.
Perhaps a deeper lesson here is that regexes, and to a lesser extent the rest of computer science, might seem like magic. But it’s magic that you’re entirely capable of learning. In the words of my hero Basil Smockwhitener,
]]>You don’t understand how powerful you are. You can learn anything. Anything at all. But you have to be willing to balance the scale with effort.
Scheme, they say, is an idea rather than a language. But what does that mean? Is Java an idea or a language? C?
When you write code in Java or C, you expect it to work, more or less, even
if you use a different compiler. gcc
and clang
are generally compatible.
Similarly, you can run a Java program in Hotspot, JRockit, Kaffe, or J9 and it
should be fine. You can compile the Java program with javac
or gcj
and it
should work the same (one might be slower than the other, though).
This is because C and Java have specifications. There are long documents describing exactly what the languages should do, and someone writing a compiler needs to follow those specifications. The implementation details—how a feature works—is up to them, but not any of the actual language design.
Scheme, on the other hand, is very loosely specified. This is a good thing: the core language is so small that the entire standard fits on about 50 pages (Java’s specification fits on 644). Additional helpful features are included in “Scheme Requests for Implementation” or SRFIs, which are not a part of the standard but are useful for programmers—things like common list operations.
It’s not nebulous, it’s simply minimalist. This is why Scheme is more of an idea than a language.
As a result, implementing a Scheme compiler gives you a lot more freedom in how you want your language to look, while still calling it a “Scheme dialect”. You can choose to make square brackets legal delimiters, or you could choose not to. You almost always have to supply your own I/O primitives like “prompt for input” or “open a socket”. Module systems and importing are up to you.
And so Scheme programs are usually not compatible across implementations. With so many hundreds of Scheme dialects out there, it’s kind of overwhelming for a first-time Scheme programmer to pick a dialect and actually get started.
I’m a Scheme dialect nerd. I probably have more Scheme dialects installed than most people have games on their phone. Here are my opinions on which Scheme dialect to use. Rather than grouping by use-case, as guides such as this one do, I grouped by language.
Racket is, in a word, academic. It started off as PLT Scheme, which was essentially a research group that happened to produce a really good pedagogical Scheme dialect which they used for a lot of their research.
Racket has an impressive standard library and a decent module system. Its POSIX interface is a bit wanting, though. For example, you can’t send a signal to a process yet (technically, I have a PR open for this here). This is partly because Racket aims to be generally platform-independent, and so doesn’t necessarily want to implement a whole bunch of Unix-specific features in order to appeal to Windows users. This is either a good thing or a bad thing, depending on your use-case.
Racket’s FFI is iffy. Writing C extensions is kind of tricky.
There are a couple ways to distribute your Racket program: you can have users
install Racket and download the source, or you can use raco
to compile an
executable that bundles Racket’s runtime system with compiled bytecode. Neither
way seems perfect to me, but, well, they work.
Racket comes with a lot of frills. There’s a GUI engine, an IDE (which isn’t great, but isn’t awful either), and primitives for manipulating images and web servers and whatnot. It also has copious amounts of documentation, and it ships with its own documentation tool called Scribble (which is documented in Scribble). Racket is also a “language lab”: it gives you the tools to create your own programming languages built on top of Racket infrastructure (for example, Typed Racket, the type-safe dialect of Racket, is written in Racket).
Racket has a nice community and mailing list archives/StackOverflow answers for help. It’s also the recommended Scheme dialect to use if you’re learning or teaching Scheme/SICP.
If you’re still lost at the end of this article, stop thinking and go with Racket and you will be fine.
Chicken is sort of the opposite of Racket. It’s small (R5RS only), but it has a fantastic POSIX API. Chicken implements the R5RS Scheme standard, which stands for the Revised Revised Revised Revised Revised Report on the Algorithmic Language Scheme.
Its website is call-cc.org
because the Chicken devs are super-proud of how
Chicken handles continuations: it uses the amazingly-named Cheney on the
MTA strategy.
Continuations, by the way, are one of the coolest Scheme features.
Chicken has a decent number of libraries, and it’s really easy to turn a C module into a Chicken module.
Chicken compiles to C, so to distribute, you can either distribute binaries or C sources which the user can build without installing Chicken. Super-easy.
ClojureScript is a compiler from Clojure to JavaScript (Clojure originally targeted Java). It is not a Scheme dialect, not does it feel like one. It’s a LISP dialect. Here are some differences between LISP and Scheme, if that interests you.
Personally, I don’t like it on a matter of principle: I don’t like things that compile to JavaScript. It doesn’t seem like anything you really need to write better code. They say JavaScript is Scheme in C’s clothing, anyway, and I do agree with that to some extent.
Same holds for BiwaScheme, Spock (written by the Chicken developer), the (unmaintained) WhaleSong Racket-to-JS compiler, and anything on this list.
The Chez Scheme compiler isn’t free, and its website lives on a COM domain. Take from that what you will.
That being said, it’s reputed to be ridiculously fast. I have never met a Chez Scheme user, but maybe I’m in the wrong crowd.
Guile is a GNU project, which means it has all sorts of ideals and mission statements and declarations that make it seem like the paragon of freedom.
In short, Guile is the Scheme you want to use in “embedded” form. That is, you can stick a Guile interpreter in a C application in order to control it without having to deal with C. One of the canonical examples of this is the turtle graphics example, in which you write a graphics display in C, and then let the user control it with Scheme, which is easier. Similarly, the WeeChat IRC chat client lets you write extensions in Guile (for example, an extension that automatically URL-shortens long URLs before you send them).
Guile can both interpret and compile Scheme, it has good documentation, and
it’s backed by GNU so you can expect maintained code (Guile is used by other
mature GNU projects such as gdb
, so they’re invested). Also, Guile is written
in Guile, and the LilyPond music formatting program is written in Guile.
You only use Emacs Lisp if you, well, use Emacs, just like how you only use VimScript if you use Vim.
Before choosing any other Scheme implementation, it’s probably worth your time to make sure it really does offer something more than the dialects I listed above. Make sure it has an active community that can help you when you run into problems, and make sure you can find code written by other people in that dialect of Scheme. A good quality test is to see if it implements any SRFIs.
Then, hack away.
And now, a brief word on the proliferation of Scheme dialects.
Part of the problem is that one you pick a Scheme implementation, you’re almost always “locked in”: you can’t easily migrate to another. So people tend to just build their own Scheme environment that they can control.
Another part is that every CS student and their pet dog has probably written some approximation of a Scheme interpreter (possibly as a course project), so the Babel effect kicks in and creates a plethora of implementations of choose from. I’m guilty of this.
It’s just so easy to write a Scheme implementation. In a way, Scheme is a virus… but that’s the subject of a future post (or you could just read Snow Crash by Neal Stephenson and stick “programming” before every instance of “language”).
An upside of this, though, is that as the Scheme virus infects the programming language community, it injects the Functional Programming DNA everywhere. Once you’re immune to Scheme (by reading SICP), you’ll thrive in any such environment, be it JavaScript, Scala, Haskell, or Ruby.
Oh, and one last thing. I feel it’s obligatory at this point for me to say, please don’t spend too much time researching Scheme dialects. Just pick Racket and start coding.
]]>The IKEA Poäng is perhaps the company’s most comfortable and best-named product: a chic, springy twist to the classic light armchair. The Poäng comes in five or six different color schemes: generally variations on white, beige, red, and coffee.
But what if it didn’t?
Let’s imagine an alternate universe, where the Poäng is advertised as a medium of expression. Let’s imagine a world where the Poäng seat covers are made of dye-able canvas. A world where customers are encouraged to decorate their armchairs to reflect their own personalities.
Sounds like fun, doesn’t it? Well, uh, let’s see what happens. I present to you an allegory in twelve parts.
January. The concept is first revealed during the keynote at the IKEA Worldwide Developers Conference. The Twitterverse explodes. The New York Times says, “What a time to be alive!”.
February. IKEA sells out within the first 24 hours of sales; customers waiting in line report being “disappointed, but contently stuffed with meatballs”. Television commercials begin to feature contemporary artists decorating their Poängs. There are rumors of AMC Theaters planning to license Poängs for their cinemas. BuzzFeed publishes ten of their best Poäng-assembling tips and tricks (you won’t believe #4).
March. Almost everyone now owns a Poäng. A dark blue Poäng with the Presidential Seal is spotted in the White House.
April. One’s Poäng-decoration becomes a profound statement of his or her identity. After all, an armchair is where you spend some of your most important hours. Reading, chatting, watching TV: these are all best done from a familiar environment that should be optimized for your lifestyle.
A Berkeley establishment begins to sell tie-dyed Poäng covers.
May. Genres emerge.
There are the loud, skeuomorphic Poängs with too much color and design. These generally belong to young children who decorate their Poängs in Crayola colors.
Then there are the average adults, who choose the most suburban colors they can find. Navy blue? Perfect. Olive green? Sounds like home.
Finally, there are the artistic adults, who go for a more refined look. They pick neutral but subtle color schemes with tasteful accents.
June. The Average Adults realize that their Poängs look outmoded compared to the beautiful Poängs of the Artistic Adults. Pastel colors are the “in” thing, according to several popular Poäng-centered Instagram accounts.
July. The development of Poäng plugins spawns a new industry. Embedded hardware for Poäng covers becomes cheap, resulting in increasingly sophisticated Poängs.
August. The genres begin to homogenize into something the Chair Gurus call the “material design revolution”. A combination of color palettes and design guidelines assembled by experienced superstar designers guides every new Poäng design.
An NPR survey reveals that while over 40% of the US population owns a Poäng, only 12% of Poäng-owners report sitting in their armchairs regularly.
September. IKEA begins selling readymade Poängs designed painstakingly by expert designers and artists. They even deliver it—assembled—to your doorstep. Most people choose to buy the readymade Poängs because they are low-maintenance and don’t require as much effort to set up. They are also stunningly beautiful, and the experienced designers probably took care of a lot of corner-cases that you, as an amateur, wouldn’t really think of.
October. Hand-decorated Poängs begin to look passé. Many of them lack essential armchair features such as cupholders and localization settings. They also ignore common best practices in the industry. Marketing professionals say that hand-decorated Poängs are a poor business choice for furnishing your waiting room because they “project an outdated look to potential customers”.
“Don’t roll your own paint,” preaches one blog post that tops Hacker News.
Google publishes a framework to develop apps for the front end of Poängs. They call it PoAngularJS. The average chair now weighs significantly more than the average American.
November. IKEA sells one kind of Poäng now. Customers have occasional problems with them, but you can find workarounds online. Besides, everything else is so user-friendly. It’s really just a couple little things that bother you, like the Wi-Fi crashing every once in a while.
Very few hand-decorated Poängs exist, mostly in educational institutions. Old people complain that “see, them chairs had character in them”, but they’ve been saying that for centuries.
December. IKEA discontinues the Poäng. Usage of armchairs is deprecated in favor of the “one-person couch”, which is a remarkable new piece of technology destined to revolutionize the way we think about sitting.
Nobody really remembers how to put together an old-fashioned armchair (just like they don’t remember how to build a gramophone). Some engineers work together to build their own version of the Poäng called the LibreChair. However, it is only used by hardcore carpentry enthusiasts since the manual is twelve pages long and building it requires you to weave your own cloth.
Epilogue. Let’s talk about customization. The etymology of the word custom can be traced to the Latin consuetudo, which means “habit”. But it means more than “habit”. It means “experience”, “tradition”, “convention”, “familiarity”, “companionship”, “conversation”… even “love affair”.
And it’s this dichotomy between the individual and the communal that makes the idea of “customization” (which is so central to hackerdom) paradoxical. Our identity is as much our own as not; we forfeit our identity to others.
There’s something to be said about having a fortress of solitude. A world which you control, which you make your own with endless tweaks towards your ideals of perfection. Programmers don’t need to carve their fortresses out of rocky cliffs; they can find solace in editors, shells, browsers, and personal websites.
The key is in customization.
Yet even though we spend hours making our tools “our own” with color schemes, macros, and key bindings, we still choose to publish our dotfiles as open-source “projects” on Github. We scarcely bother to read the original documentation of our software, choosing instead to search for solutions written already on StackOverflow. We happily hand over our content to the corporate Cerberus that calls itself Medium. We choose to adhere to style guides written by people who are not us. We foist upon others screenshots of artistically themed editors, that are no better than gilded toothbrushes. We steal boilerplate and eye-candy from others, believing somehow that we’re doing ourselves favors.
It’s foreign, it’s homogeneous, it’s both beautiful and sickening: like a fortress made of cotton candy.
]]>I’m not old, but I’m not young either. There are some little details about the last decade’s websites that bring me a moiety of reminiscence. Some things have gotten better, some have gotten worse, and some are just different: but all of them bring back memories of being a curious middle schooler exploring the secrets of the Web.
Here are fifty things that remind me how far we’ve come and how far we still have to go.
When I was young…
<FONT>
element and the BGCOLOR
attribute.@media
queries.jsc
for command-line JS was a nifty hack.LANGUAGE
attribute on their <script>
tags.#
than after.www
in their canonical URLs.Welcome to the final installment of Meet the Robinson.
We left off last time with a complete but slow theorem-proving algorithm for first-order logic, as well as a promise of a faster algorithm. The faster algorithm depends on a concept called “unification”, so let’s talk about that first.
Propositional resolution involved finding a pair of identical propositions in opposite polarities (one in “positive” and one in “negated” polarity). In first-order logic, though, we can do better. We can find pairs that have the same “shape”.
Wrote[X, hamlet()]
and Wrote[shakespeare(), Y]
are the same
“shape”.The process of substituting variables to make two predicates identical is called unification. If you’ve worked with Hindley-Milner type inference, you know what unification is—it’s the stage where you figure out what the type variables you spawned expand to.
The unification algorithm isn’t hard to implement at all. It tells you whether two predicates unify or not, and if they do unify, it comes up with a substitution for each variable that can be applied to make them identical. Substitutions replace a variable with either a function or another variable. People talk about unification in terms of “solving equations” of functions and variables, if that makes more sense to you.
In particular, unification algorithms come up with the most general such unifier, so variables that don’t need to be substituted are left as-is.
P[X, Y]
and
P[X, f(Z)]
?Unification algorithms are covered in detail in SICP. Most people use the algorithm by Martelli and Montanari. Peter Norvig has published a correction to the algorithm. His paper actually has a very succinct and clear description of the algorithm, and can be found on his site here.
P[X]
and P[f(X)]
don’t unify (why do we need the “occurs
check”?). Come up with a way to relate this to (1) the Y combinator, and (2)
finding the roots of a polynomial.And now back to theorem-proving.
Unification allowed Robinson to prove the lifting lemma, which says that if
we have a valid resolution step at the propositional level (with ground
clauses), then we must have a valid resolution at the first-order level. For
example, we can unify P[a(), b()]
and ¬P[a(), b()]
. Since the former is
a ground instance of P[X, Y]
and the latter is an instance of ¬P[a(),
Y]
, we deduce that there must be a resolvent of P[X, Y]
and ¬P[a(),
Y]
. One such resolvent is P[a(), Y]
.
The lifting lemma also guarantees that the resolvent of the ground instances is
an instance of the resolvent of the first-order clauses. In this example, note
that P[a(), b()]
is an instance of P[a(), Y]
because if you substitute
b()
for Y
in the latter you get the former.
P[a(), b(X), Z]
and ¬P[X, b(Y),
c(X)]
have a resolvent. Find the ground instance and the first-order resolvent
and show that the ground instance is an instance of the first-order resolvent.The lifting lemma lets us “lift” propositional resolution to first-order resolution. Instead of checking if two terms are equal as in propositional resolution, we check if they unify, and if they do, we apply the substitution to the resolvent. Thus, we end up “iteratively” building up the ground instance at each resolution step. This is much more efficient than the Davis-Putnam algorithm, which had to guess the ground instance out of the blue.
Here’s an example. Suppose we had (P[a(), Y]
∨ ¬ A[X]
) and (A[X]
∨ ¬P[X, b()]
). First, we note that P[a(), Y]
and P[X, b()]
appear
in opposite polarities in the two clauses, and that they unify. The
substitution is X
becomes a()
and Y
becomes b()
. Applying this
substition yields (P[a(), b()]
∨ ¬ A[a()]
) and (A[a()]
∨
¬P[a(), b()]
). Resolving out that term, we have A[a()]
and
¬A[a()]
which clearly resolve to the empty clause, which completes the
proof.
X
) P[X]
then
(∃ Y
) P[Y]
. Then, translate your proof to English.Hilbert’s Entscheidungsproblem, posed in 1928, asked whether there was an algorithm that would tell you whether a first-order set of sentences was valid or not (he believed that there was!). Alonzo Church used the lambda calculus to prove that there was not, in fact, such an algorithm. That same year, Turing used Turing machines to prove the same thing by reducing the halting problem to the Entscheidungsproblem. That is, they found a way to encode programs as statements in first-order logic such that asking whether the statements are provable is the same as asking whether the programs terminate.
A classic logic puzzle goes as follows:
Anyone who owns a dog is an animal lover. No animal lovers kill cats. Either Jack (who owns a dog) or Curiosity killed the cat. Who killed the cat?
This isn’t a “proof” as such, it’s a question. It turns out that the same tricks work for answering questions (or “querying”).
Suppose we are asking for an X
such that Killed[X, cat()]
. If we were
trying to prove something, we would negate our goal and add it to the knowledge
base. Since we’re querying, we need to make a small modification. We add this
sentence to the knowledge base: Answer[X]
∨ ¬Killed[X, cat()]
. Now,
rather than looking for empty clauses, we look for clauses which only contain
one predicate, which is Answer[*]
.
Once we extract the X
from the Answer[*]
predicate, it’s easy to see why
it must be the answer. Simply re-run the theorem prover asking it to prove that
Killed[X, cat()]
(but substitute in the actual value of X
you got). Since
the proof is basically the same as above (ignoring the Answer[*]
predicate),
it must succeed. So, we know that our answer must be “correct” (in the sense
that it is consistent with the knowledge base).
Let’s work through a small example. Suppose we have Wet[water()]
and we want
to query for an X
such that Wet[X]
. We construct the answer clause
Answer[X]
∨ ¬Wet[X]
. Then, we resolve against Wet[water()]
,
unifying so that X
is water()
to get Answer[water()]
.
Animal lovers love all animals. At least one person loves every animal lover. Nobody loves a person who has killed an animal. Either Jack (who loves all animals) or Curiosity killed Tuna, who is an animal. Who killed Tuna?
All texts on resolution theorem proving talk about heuristics, so I guess I will too. But I won’t spend too much time on it. There are a few ways to be “clever” about how to pick which clauses to try to resolve. The first one is unit preference, which simply says clauses that have a single predicate are a good choice because if the resolution does work, you’re done. You probably use this heuristic without even knowing it: you’re likelier to try to resolve shorter clauses because it “feels” like a reasonable choice.
The second one is the set of support, which says that you can divide up your clauses into the axioms (which are supposed to be consistent within themselves) and the stuff to be proved (which should have a contradiction with the axioms). Then, you make sure you always use a sentence from the latter set when you resolve, because if you use two statements from the set of axioms, you won’t get a contradiction because they’re supposed to be consistent among themselves.
In other words, this is the heuristic form of “if you’re stuck, check to see if you have used all the information in the problem”. If you’re too aggressive with the set-of-support strategy, you might miss an important resolution and so the algorithm might become incomplete. Use responsibly at your own peril.
The last is called subsumption, which is basically spring cleaning. Every once in a while, clean out duplicate clauses. Be clever, so if one clause “subsumes” another (i.e. one is a ground instance of a more general clause) then delete the more specific one. Fewer clauses means faster resolution, but subsumption itself can get kind of slow.
And that’s it. I don’t know why this is such a big deal, but these three things always show up on every piece of literature on resolution-refutation theorem proving. Maybe it’s because Russell and Norvig covered them in their textbook and everyone else thought they were really important.
One last thing we need to talk about: equality.
Euclid’s first common notion is this: Things which are equal to the same things are equal to each other. That’s a rule of mathematical reasoning and its true because it works - has done and always will do. In his book Euclid says this is self evident. You see there it is even in that 2000 year old book of mechanical law it is the self evident truth that things which are equal to the same things are equal to each other. – Lincoln (2012)
Our theorem prover doesn’t support equality out-of-the-box. That is, we can’t
tell it that father(father(X))
is the same as grandfather(X)
, and so those
two functions are interchangeable.
We can, of course, write our own equality axioms (as we did for the Peano arithmetic above).
The issue is that we then need to also define the “replacement” axiom for every
single predicate: Equal[A, B]
∧P[A]
⇒ P[B]
.
The solution is to use the paramodulation rule, which is an additional
inference rule just like resolution is. It says that if you have a clause with
a term that contains some subterm t
and you also have a clause that contains
T
=U
where T
and t
unify, then you can replace t
with U
, apply the
substitution from the unification to both clauses, and then join them
together, taking out the equality statement.
For example, given P[g(f(X))]
∨ Q[X]
and f(g(b()))
=a()
∨
R[g(c)]
, we can derive P[g(a())]
∨ Q[g(b())]]
∨ R[g(c())]
.
In his thesis, Herbrand showed that you don’t need equality axioms to prove theorems if your knowledge base doesn’t have any equality statements in it.
…and that’s it. That’s actually all there is. Combining resolution, unification, and paramodulation let us build the theorem prover that Robinson used to prove the Robbins conjecture. You can check out my own implementation here. It’s lovingly named Eddie, after the shipboard computer aboard the Heart of Gold which froze when asked for a cup of tea by Arthur Dent.
Epilogue: If you’ve stayed with me on this journey, you’ve learned the basics of formal logic, model theory, and proof theory. You’ve explored several famous theorems in each field and seen (human-generated!) proofs of them. You’ve discovered how math is rigorized. And, finally, you’ve seen some of the rich history of logic and how it connects not just to various branches of math, but also to subjects as abstract as philosophy and as practical as computer science.
Yet, in a way, this isn’t about having a machine that can prove theorems. Like many things in life—marathons, pie-eating contests, and bank robberies—I think the pleasure is more in knowing that you can do it than in actually doing it.
Why? Because contrary to Rényi, mathematics is not about turning coffee into theorems. An oracle that just tells you whether or not a statement is true is useless; the real beauty is in understanding why it’s true. A world where math is an endless stream of abstract, intuition-less symbol-shunting is bleak. Resolution-refutation proofs have no insight or motivation. They are completely mechanical.
But then again, maybe that’s exactly what we were going for.
I’ve admittedly been extremely lazy about citing my sources when writing these articles. I have, however, diligently kept a list of links to resources I found helpful. It feels appropriate to give them the last word here, so, in no particular order, here they are:
Welcome to today’s edition of Meet the Robinson.
We left off the previous article wondering if there’s any way to handle an infinite number of propositions with our resolution-refutation scheme. It turns out that there is—we can “tame the infinite” using first-order logic.
First-order logic is based on two ideas: predicates and quantifiers.
Predicates are just what your English teacher said they were, but (like most things) they make more sense when you think about them in terms of computer science. Predicates questions you can ask about the subject of a sentence. They are, in a way, functions that return boolean values.
Here’s an example. The predicate of “Eeyore is feeling blue.” is “is feeling blue”. We can use this to ask the question “Is Eeyore feeling blue?”. The boolean function version is the function that takes an input (such as “Eeyore”) and tells you whether that input is feeling blue or not.
The standard notation for predicates is, unfortunately, similar to that for
functions: we would write isFeelingBlue(Eeyore)
to denote that predicate.
This turns out to cause some confusion, because first-order logic also has
real functions (more on that later). In this article, I’m going to use square
brackets for predicates and round ones for functions: isFeelingBlue[Eeyore]
.
Nobody else does this, so don’t blame me if this causes you any issues later
on in life. You have been warned.
Predicates are the propositions of first-order logic. So, we can join them just
like we joined propositions earlier: smart[Alice]
∧ funny[Alice]
. You
can have “empty” predicates such as maryHadALittleLamb[]
, which correspond
directly to propositions. Predicates can also have multiple inputs, such as
killed[Macbeth, Duncan]
.
Predicates can operate on either “concrete” inputs like “Eeyore” (which we call “constants”) or “variable” inputs. Variable inputs are quantified generalizations, which means that when you use a variable, you say that that variable can be replaced by any constant and the statement would hold.
For example, the sentence “¬ (∀ X
) (gold[X]
⇒
glitters[X]
)” is read as “it is not true that all that is gold must glitter”.
The symbol “∀” is read as “for all”, and it binds the variable X. Why
do we need it? Depending on where the binding quantifier is placed, the
sentence can actually have a different meaning.
X
) ¬ (gold[X]
⇒
glitters[X]
)” has a subtly different meaning from what we had above.The best way to think about quantifiers is not in terms of variables and
substitutions. Think about quantifiers as a way to select a subset of
predicates from an infinite set of predicates, and then apply some operation on
them. For example, “(∀ X
) Foo[X]
“ selects all predicates that “look
like” Foo[_]
and then “ands” them together (we’ll revisit the idea of “looks
like” in more detail later).
This isn’t a rigorous definition, really, mainly because it’s kind of tricky to talk about “and-ing” together an infinite number of statements (why infinite?). You also need to introduce the concept of a “domain of discourse”, which basically means “what can I fill into the hole?”.
First-order logic also has functions, which have a misleading name because
you don’t want to think of them as functions. Functions in first-order logic
are really more like prepositional phrases. For instance, father(Luke)
means
“the father of Luke”. You don’t have to “define” these functions. They
are just ways of transforming data by adding structure.
Functions can be used anywhere variables and concrete values can. Together, functions, variables, and constants are called terms.
And example using functions is “(∀ X
) winner[X]
⇒
proud[parents(X)]
.”
proud[parents(parents(Amy))]]
.First-order logic is pretty powerful. We can express a great deal in it. To let it sink in, we’re going to quickly describe arithmetic in first-order logic, using the Dedekind-Peano axioms:
isNaturalNumber[zero()]
” says that 0 is a natural number.X
Equal[X, X]
X
∀ Y
∀ Z
(Equal[X, Y]
∧ Equal[Y, Z]
)
⇒ Equal[X, Z]
X
∀ Y
Equal[X, Y]
⇒ Equal[Y, X]
isNaturalNumber
. But
this version is simpler.)X
isNaturalNumber[X]
⇒
isNaturalNumber[Successor(X)]
” says that the successor of all natural
numbers is also a natural number.X
∀ Y
Equal[X, Y]
⇔ Equal[Successor(X),
Successor(Y)]
” says that the successor function is injective.X
¬Equal[Successor(X), zero()]
” says that no natural
number is before zero.Peano had one more axiom, which represents induction.
Perhaps you’re unimpressed with this. Another powerful result of first-order logic is Tarski’s axiomatization of plane geometry (elementary Euclidean geometry). Using a bit of magic (called “quantifier elimination” which does exactly what you guessed), he showed that there exists an algorithm that can prove any statement about plane geometry.
This should be impressive, because humanity has been tinkering with geometry for at least two thousand years now. Suddenly being given a magic algorithm to answer any question you’d like about geometry is amazing.
(What’s the catch, you ask? The algorithm is slow. Impractically slow. As in, two-raised-to-two-raised-to-n slow, also known as will-not-terminate-in-your-lifetime.)
If you’ve read SICP (The Structure and Interpretation of Computer Programs by Abelson and Sussman, free online and often called the “Wizard Book”), you might be having flashbacks to their section on logic programming: section 4.4. This section describes a logic programming language, like Prolog. Prolog-like languages operate on first-order logic and allow you to ask questions.
Here’s an example of a rule in Prolog:
male(charles).
parent(charles, martha).
parent(martha, agnes).
grandfather(X,Y) :- male(X),
parent(X,Somebody),
parent(Somebody,Y).
Prolog allows you to then make queries such as grandfather(charles, X)
, and
Prolog would go along and discovered that X = agnes
is a valid solution.
This should remind you of database querying and nondeterministic programming
and a whole host of exciting ideas which are fun to explore.
Now that you’re a first-order logic expert…
Remember Gödel’s Completeness Theorem? It said that all true statements are provable in propositional logic. Turns out I lied. It’s doesn’t just hold for propositional logic; it also holds for first-order logic. The rest of this post will explain how that works.
“But hang on!” you say, “We just saw some arithmetic modeled in first-order logic though, and arithmetic implies Gödel’s Incompleteness Theorem. How can the Completeness and Incompleteness theorems live together peacefully?”
Good question. Turns out there are models besides the natural numbers that satisfy the Peano Axioms, and so there are statements that are undecidable because their truth value depends on which model is being considered. In other words, the Completeness theorem applies only to sentences that are necessarily true, while the Incompleteness theorem applies to sentences that could be either true or false. Don’t let the related names confuse you.
We haven’t talked about proofs in first-order logic yet. For propositional logic, our proofs had two components: reducing to CNF and resolution. It turns out that we can extend each of these components to first-order logic.
The conversion of first-order logic sentences to CNF should be simple enough. The only real complication comes from quantifiers.
X
P[X]
really mean? What does “not
everybody” went to the dance mean?X
P[X]
is equivalent to saying that there
exists an X
such that ¬ P[X]
.X
such that P[X]
holds” as “(∃ X
)
P[X]
” Think back to above when we thought of ∀ as
“and-ing” together everything that matches a pattern. Explain how ∃
is like “or-ing” them together.X
) P[X]
in terms of De Morgan’s Laws?A[]
∧
(∀ X
) B[X]
” is equivalent to “(∀ X
) A[]
∧
B[X]
”.There’s a nice little trick that lets us get rid of all the existential quantifiers (∃). Once the quantifiers have been moved outside, you can replace all instances of existentially quantified variables with a constant!
X
P[X]
” is equivalent to “P[x()]
”,
assuming the function x()
isn’t used anywhere else.X
∃Y
P[Y]
”? Show that the right
thing to do is “∀X
P[y(X)]
” instead of “∀X
P[y()]
”. What is the difference between these two sentences?This process is called Skolemization (or, sometimes, Skolemnization). The functions are called Skolem functions (some textbooks also say “Skolem constants”, but we know that constants are just special functions!).
P[X, f()]
is a predicate
applied to a universally-quantified variable X
and a constant f
.Sentences in this final form are said to be in prenex form.
Let’s talk about Herbrand’s Theorem, which states that a sentence is unsatisfiable in first-order logic if and only if there is a finite subset of ground instances that is unsatisfiable in propositional logic.
A ground instance is simply a version of a sentence where all variables have
been substituted so there are no variables left. For example, P[a()]
is a
ground instance of P[X]
.
P[a(), b(X, Y), Z]
?In other words, if you replace all variables with valid substitutions (“valid”
as in X
has the same substitution everywhere) and a finite subset of the
resulting propositional logic statements are unsatisfiable, then the
first-order logic statements are unsatisfiable as well. This is perhaps
unsurprising, but, more excitingly, Herbrand’s theorem guarantees that the same
holds in reverse: if it’s unsatisfiable, then you must be able to find such a
finite set of substitutions. This shouldn’t sound too trivial, since there
are an infinite number of substitutions and so guaranteeing that one exists is
something “interesting”.
P[X]
∧
¬ P[Y]
? Use this substitution to show that the first-order sentence is
unsatisfiable.P[X, Y]
∧ ¬P[X, Z]
is unsatisfiable. Herbrand’s theorem
guarantees an unsatisfiable ground instance. Find such a ground instance.One way of thinking about why this is true is by looking at the “saturation” of the sentences, which is what you get when you take all predicates and apply all possible concrete inputs to them. Each predicate in the saturation is practically a proposition because it has no quantified variables (as we discussed above), and is a logical consequence of the first-order sentences that were saturated (why?).
The argument then goes something like this:
Suppose the sentences of the saturation were satisfiable. Then we can assign a
truth value to each predicate in the first-order world by finding the truth
value of the corresponding ground instances. For example, if we had a model for
the saturation where P[a()]
was true and P[b()]
was false, then in the
first-order case, P[X]
is true if X
=a()
and false if X
=b()
.
It turns out that this is the contrapositive of the “unobvious” direction of Herbrand’s theorem, that is, that if the first-order sentences are unsatisfiable then the saturation is unsatisfiable in propositional logic. A satisfiable saturation in propositional logic implies, almost “by definition”, that the first-order sentences are satisfiable.
The “finiteness” guarantee that Herbrand’s theorem makes comes from a theorem called the compactness theorem.
The compactness theorem says that in propositional logic, if all finite subsets of sentences are satisfiable, then the entire set of sentences is also satisfiable. Equivalently, if a (potentially infinite) set of sentences is unsatisfiable, then there must be a finite unsatisfiable subset.
Just for fun, here’s a proof sketch:
Suppose you have a finitely satisfiable set of sentences. First, you extract all of the propositions and list them out. Number all your propositions from 1 onwards (axiom of choice alert!). Now, we do an inductive proof, where at each step we assign the next proposition a truth-value. By showing that each assignment preserves the “finitely satisfiable” property, we basically describe an algorithm that gives you the truth-value of any particular proposition, which is practically a model. Since we can find a model, the set of sentences must be satisfiable.
The base case of the inductive proof is to show that if you assign no propositions any truth-values, then the set of sentences is finitely satisfiable. This was the assumption of the theorem, so we’re good.
For the inductive step, assume that you have truth-values of the first k propositions, and the sentences are finitely satisfiable under these truth-values.
Now, let’s look at the (k+1)th proposition. If the set of sentences is finitely satisfiable when that proposition is false, then simply assign that sentence to false and move on. Otherwise, we will show that you must be able to assign that proposition true and maintain the finite-satisfiability of the set of sentences.
If you are forced to assign the (k+1)th proposition true, then there must be a subset of sentences that is unsatisfiable if the (k+1)th proposition is false (and all the previous k propositions are assigned their respective truth-values as well!). Let’s call this set of sentences A. Now, we will show that any finite subset of sentences B is satisfiable if the (k+1)th proposition is true. Thus, the set of sentences is still finitely satisfiable and we can move on.
The idea is to look at the union of A and B. Since a union of two finite sets is still finite, the union is also finite, and so it is satisfiable. Thus, the (k+1)th proposition is either true or false. If it is false, then set A of sentences will be unsatisfiable, and so the union of A and B will also be unsatisfiable (why?). Thus, the (k+1)th has to be true. Since this holds for all subsets B, setting that proposition to true maintains finite satisfiability.
This completes the inductive proof of the compactness theorem.
Another proof of Herbrand’s theorem relies on so-called semantic trees, which are trees where each node is a ground instance of a predicate and the left and right branches represent the world if that predicate were true or false, respectively. You end up making some simple arguments related to whether or not you can find an infinitely long path by traversing the tree.
With Herbrand’s theorem, we can construct a first-order theorem-proving algorithm! This algorithm does resolution by generating all ground instances of the first-order sentences (i.e. the “saturation”). Ground instances are “recursively enumerable”, which means you can list them out one by one and eventually list each one (the real numbers, for example, are not recursively enumerable because you can’t list them because they have a higher cardinality than the rationals).
Since each ground instance in the list is a propositional logic formula, you can simply resolution-refutation on it. So, the algorithm is:
Davis and Putnam came up with this algorithm in 1960… and their work was an improvement on Gilmore’s method which was published even earlier. But we associate Robinson with the magical resolution-refutation stuff. Why? Robinson was the first one to do it practically.
Listing out all the ground instances of the sentences is slow! There’s a sort of combinatorial “explosion” where every time you have a new variable it makes things many times slower, because you need to generate substitutions for that variable as well. While the algorithm works, it’s too slow to be practical.
To talk about Robinson’s optimization, we need to discuss a whole new kind of algorithm. But more about that in the next installment of this, uh, ex-trilogy. For now, rejoice in the knowledge of a complete—albeit slow—theorem-proving algorithm that “tames the infinite”.
Wikipedia has an amusing article on “mathematical coincidence”, where they say that it’s a “coincidence” that ($ 2^{10} $) is very close to 1000 (it’s actually 1024). This is why it’s occasionally confusing whether you mean 1000 or 1024 bytes when you say “kilobyte”.
I’m not sure whether this is something to get excited about, but you know what we say about coincidence…
Here are some fun facts to inspire today’s post:
($ 2^{12864326} \approx 10^{3872548} $) to within 0.0001%, which means it begins with the digits “10000000…”.
($ 1337^{47168026} \approx \pi\cdot10^{147453447}$) to within 0.00000001%. It begins with the digits “31415926…”.
The hex expansion of ($ e^{19709930078} $) is around 10 billion digits long,
and it begins with the digits deadbeef...
.
There is definitely something going on here. It’s time to investigate!
Let’s go back to powers of two. It really comes down to the fact that we’re trying to solve equations that look somewhat like this one
\[ 2^{\alpha} = \delta10^{\beta} \]
for integers, where we can make ($ \delta $) as close to 1 as we want.
It should feel intuitive to take logs of both sides at this point. So let’s go ahead and do that:
\[ \alpha \ln(2) = \ln(\delta) + \beta\ln(10) \]
Since ($ \delta $) is close to 1, its natural log is close to 0. So this equation reduces to finding a very close rational approximations of the ratio of the natural logs of 2 and 10.
Rational approximation, also called Diophantine approximation, is the “art” of finding rational numbers very close to real numbers. Since the rationals are dense in the reals, we can find a rational number arbitrarily close to any real number. The naïve way to do this is to simply take the decimal expansion to as many digits as we want. So, for example, we can find rational approximations of pi such as 3/1, 31/10, 314/100, etc.
So, it follows that we can find arbitrarily precise rational approximations of ($ \ln(10) / \ln(2) $), which is what we’re looking for! The numerator gives the power of 2 and the denominator gives the power of 10.
That ratio is around 3.321928094, so ($ 2^{3321928094} $) should be really close to a power of 10, right?
…wrong. The power of 10 is spot-on, but our first digit is completely off. This is tragic! We’re close, but not close enough.
How can we fix this?
We could add more digits, but eventually WolframAlpha stops doing those calculations for us. (There’s a nice online calculator here that seems to handle much bigger problems, but loses precision eventually.)
The problem is that even though we’re close, we’re not close enough. Remember that our worst-case scenario with the decimal-truncation strategy is that we’re off by ($1/\beta$). That is, we have
\[ \left| \frac{\alpha}{\beta} - \frac{\ln(10)}{\ln(2)} \right| = \frac{1}{\beta} \]
Rearranging this a little bit, we have:
\[ \alpha \ln(2) = \ln(2) + \beta\ln(10) \]
In other words, we have:
\[ 2^\alpha = 2\times10^\beta \]
We could be off by up to a factor of two! That means that even though our rational approximation is getting closer, our first digit could still vary pretty randomly.
What’s an easy fix here? We could start by rounding rather than truncating. This means our worst-case scenario drops to ($ 1/(2\beta) $) (why?), which corresponds to being off by up to a factor of the square root of two (around 1.4).
If we round the example above, we get ($ 2^{3321928095} $), which is better. But percent-error wise, we’re still doing worse than ($ 2^{10} $). We need to take more drastic measures.
It turns out that there is a way to find the best rational approximation of a number for a given denominator. This is a beautiful field of number theory that relates images like the one below to computing GCDs efficiently.
I’ll leave it to you to discover the math on your own, but the result we seek is Dirichlet’s approximation theorem, which states that we can always find a rational approximation which is within ($ 1/(\beta^2) $) of the target. In fact, there are an infinite number of such rational approximations, which means ($ \beta $) can get as large as we want (why?).
Since we have a ($ \beta^2 $) term in the denominator, the error decreases faster than the denominator. This means we can get within ($ 2^{(1/\beta)} $) of a power of 10. Since there’s a factor of ($ \beta $) in that expression, we can make it as large as we want to get as close to a power of 10 as we want! Win!
How do we compute these best rational approximations? The trick is to express our target number as a continued fraction, and then to simplify those continued fractions.
It’s not hard to write code to do this quickly. WolframAlpha and Mathematica
come with a built-in function Rationalize
that does exactly what we want.
With a little twiddling of the “delta” parameter, we can
get
approximations within whatever interval we want, and they
work!
Pushing this gives us lovely results, like ($ 2^{44699994} $), which is around ($ 9.9999997\times10^{13456038} $), within 0.0000003% of a power of ten. Wonderful.
The natural question to ask now is whether we can do even better. Can we get an arbitrary sequence of digits at the beginning of the result? It turns out we can. By manipulating Euclidean algorithm a bit, we can generate any remainder, not necessarily one that is close to zero. Since the remainder controls the first few digits, we need to find an approximation with error ($ \ln(\delta) $).
The trick is to use a “secondary” version of the Euclidean Algorithm where we approximate ($ \ln(\delta) $) by adding together the errors of successively more precise approximations.
Here’s an example. Suppose we compute a series of rational approximations of a number and we get the following two rows:
Numerator | Denominator | Error |
---|---|---|
2 | 1 | 0.0131 |
175 | 87 | 0.0028 |
Adding these two rows gives us a new row:
Numerator | Denominator | Error |
---|---|---|
177 | 88 | 0.0159 |
(Why does this work?)
This gives us an approximation with error 0.0159. We can keep doing this in a method that resembles a cross between Gaussian elimination in matrices and the Euclidean algorithm for integers, and get as close as we want to any target “error”.
You can download a Python program I wrote to generate these expressions
here. It uses the lovely
mpmath
library. A sample session with it, used to
compute one of the examples above:
$ !!
Prefix? --> 0xdeadbeef
Base? ----> e
Radix? ---> 16
Accurate to 4 digits:
2.71828182845905^16311 ~~ 3735928559*16e+5875
Accurate to 5 digits:
2.71828182845905^4407903 ~~ 3735928559*16e+1589807
Accurate to 7 digits:
2.71828182845905^1044698524 ~~ 3735928559*16e+376795337
Accurate to 7 digits:
2.71828182845905^1044698524 ~~ 3735928559*16e+376795337
Accurate to 7 digits:
2.71828182845905^5021368668 ~~ 3735928559*16e+1811075911
Accurate to 7 digits:
2.71828182845905^5021368668 ~~ 3735928559*16e+1811075911
Accurate to 8 digits:
2.71828182845905^19709930078 ~~ 3735928559*16e+7108854587
If you enjoyed that journey, here are some exploration questions:
Welcome back.
In the previous article, we learned how rules of inference allow us to prove theorems. We also learned that a rule of inference is not necessarily complete: it might be impossible to prove a true sentence with a valid rule of inference.
Today’s article is about a complete rule of inference.
Before you get excited, though, let’s talk about what we really mean by “complete” in this case.
If a statement could either be true or false given some axioms, then the rule of inference will not construct a proof. It is not guaranteed to even terminate in this case.
Prove that “Robert is a frog” given “It rained on Monday” and “Seven is a prime number”.
For the purposes of this article, we can assume that the algorithm will terminate. Later, though, we will generalize to higher forms of logic where termination is not guaranteed.
The rule of inference we are going to use is called resolution, first used practically by John Alan Robinson in 1965. It goes like this:
\[ \frac{(P \vee \neg Q), (Q \vee R)}{P \vee R} \]
Resolution proofs rely on the idea of refutation or reductio ad absurdum, which is more commonly known as proof by contradiction.
Refutation relies on the idea that if not-S causes a contradiction with your axioms and rules of inference, then not-S is invalid, so S must be valid.
To use the resolution-refutation scheme to prove S given some axioms, we can repeatedly apply resolution to pairs of sentences taken from the axioms and not-S. If you end up deriving a contradiction, you have proven S.
The rest of this post will be dedicated to proving that this scheme actually works.
Resolution operates on clauses, which is just a set of propositions that are joined with “or”s. These propositions can be negated.
Sentences are in clausal-normal form (CNF) if they are a set of clauses that are joined with “and”s.
It turns out that any sentence can be written in CNF. To do this, you can repeatedly apply the following rules:
You can use induction to show that applying these rules again and again will eventually turn your sentence into CNF.
Why do we care so much about sentences in CNF? We can now extend the resolution rule to sentences in CNF:
\[ \frac{(P_1 \vee \dots \vee P_{i} \vee \neg X), (Q_1 \vee \dots \vee Q_{i} \vee X)}{P_1 \vee \dots \vee P_{i} \vee Q_1 \vee \dots \vee Q_{i}} \]
In short, if a proposition appears in both positive and negative polarities in two clauses, you can join the clauses and remove that proposition.
We’re now going to prove if you have an unsatisfiable set of clauses, then you can repeatedly resolve them to derive the empty clause. This provides a model for us to have a terminating theorem-proving algorithm!
This proof is taken almost directly from Russell and Norvig’s textbook, though I tried to cut down on the notation.
Imagine we’ve already applied resolution to each pair of clauses, and recursively resolved the results, and so on to obtain an (infinite?) set of clauses. We call this the resolution closure of these clauses.
Suppose also that the set of clauses was unsatisfiable, but this set of resolved clauses does not contain the empty clause. We’re going to show that you can, in fact, satisfy the clauses, leading to a contradiction.
To satisfy the clauses, follow the following algorithm:
The claim is that this procedure will always work, that is, none of these assignments will make a clause false. To see this, suppose assigning P[i] did, in fact, cause a clause to be false. Specifically, no clauses were falsified by any of the previous assignments. Then the following could happen:
If the latter case occurs, then we must also have (false ∨ false ∨ … ∨ ¬ P[i]), because that’s the only situation in which P[i] is assigned false.
Now, here’s the crucial bit: these two clauses resolve! Since they contain P[i] and ¬P[i], respectively, we resolve them to obtain a clause that has only false values in it. This contradicts our assumption that no clauses were falsified by any of the previous assignments.
So, we know that if the empty clause is not part of the set of resolved clauses, then the clauses are satisfiable. By contrapositive, it follows that you can always resolve some set of them to get the empty clause. Q.E.D. ■
There’s another argument that explains this: an inductive one which you can read here. In short, it inducts on the total number of propositions, where the base case is that if you have only one literal per clause, then either it’s satisfiable, or it’s unsatisfiable and you can resolve to get the empty clause.
So where are we? We have an algorithm that will definitely terminate on a valid statement, though we have not yet said anything about what happens when it is given anything else.
The resolution method should feel extremely powerful. Of course, as the number of propositions increases, it will get exponentially slower (boolean satisfiability was actually one of the first problems to be proved as NP-complete—and, in fact, constraint solving algorithms like DPLL correspond pretty much to resolution-refutation).
But that’s okay! It works!
One question you should have is, “What can I prove with propositional logic?”
Propositional logic is bound to a finite number of propositions. That means you can’t say things like “1 is an integer”, “2 is an integer”, “3 is an integer”, etc.
In the next article, we will extend propositional logic to a much richer form that supports an infinite number of propositions, and show that resolution-refutation still works.
On October 10, 1996, a program called EQP solved a problem that had bothered mathematicians since 1933, almost 60 years ago. Using a powerful algorithm, EQP constructed a proof of a theorem on its own. EQP ran for around 8 days and used about 30 Mb of memory.
The theorem was known as Robbins’ conjecture; a simple question about whether a set of equations is equivalent to the boolean algebra.
You read more background here or on the New York Times archives. You can also view the computer-generated proof here.
This article (or series of articles) will shed some light on how EQP worked: the fascinating world of automatic theorem proving.
Along the way, we’ll explore ideas that have spanned millennia: formal logic, Gödel’s (In)completeness theorem, parsing, Chapter 4.4 of The Structure and Interpretation of Computer Programs, P=NP?, type theory, middle school geometry, Prolog, and philosophy. Over time, we’ll build up the tools needed to express a powerful theorem-proving algorithm: Robinson’s Resolution.
This isn’t meant to be very notation-heavy. I’ve avoided using dense notation in favor of sentences, hoping it will be easier for people to jump right in.
Exercises are provided for you to think about interesting questions. Most of them can be answered without a pencil or paper, just some thought. Some of them are introductions to big ideas you can go explore on your own. So really, don’t think of the exercises as exercises, but as the Socratic part of the Socratic dialogue.
If you want a formal education in all this, consult Russell and Norvig’s textbook, Artificial Intelligence: A Modern Approach. All I’m writing here I learned on my adventures writing my own theorem prover, lovingly named Eddie.
To prove theorems, we really need to nail down what words like “prove” and “theorems” mean. This turns out to be kind of tricky, so the first post will be devoted to building up some formal logic: mainly vocabulary and big ideas.
Logic begins with propositions. Propositions are sentences that can be true or false, like “Charlie is a basketball player.” or “Two plus two is five.”.
Except, we don’t necessarily know whether a proposition is true or false. Do you know whether Charlie is a basketball player? I don’t.
Then again, there are certain propositions that everyone knows are true. Propositions like “Either it is raining or it is not raining.”.
How do we know that that’s always true? One way is to list out all possibilities and see what happens:
It is raining. | It is not raining. | Either it is raining or it is not raining. |
---|---|---|
True | True | True |
False | True | True |
True | False | True |
False | False | True |
This sort of listing is called a truth table, and it assigns a truth value for each proposition. Each row is called a model.
Statements like these are called tautologies, and they seem to be pretty meaningless (like “no news is no news” or “if you don’t get it, you don’t get it”). We’re also going to refer to them as valid statements.
A statement could also be never true. These are called invalid statements.
Finally, a statement could possibly be true: these are called satisfiable. A satisfiable sentence is a sentence that could be true if certain propositions are true and others are false. Here’s an example:
It is raining. | Mark is strong. | It is raining and Mark is strong. |
---|---|---|
True | True | True |
True | False | False |
False | True | False |
False | False | False |
As we get into more complicated stuff, it’ll be annoying to write out these propositions in full. Also, sometimes we will want to talk about an arbitrary proposition, like a variable. So, we’re going to use capital letters to denote propositions. For example, I might say “let P be the proposition ‘It is raining.’”. You can then replace all instances of P with that proposition.
So I cheated a bit above; I made up sentences like “It is raining and Mark is strong.” without talking about what “and” really means.
Logicians generally use three main ‘operators’: “and”, “or”, “not”. You can use them to modify or join propositions and form bigger propositions (called sentences or formulas or expressions).
We have symbols to denote these: we use ¬A for “not A”, A∧B for “A and B”, and A∨B for “A or B”. It’s easy to get those last two confused at first; a nice mnemonic is that ∨ looks like a branching tree, which relates to choice (“or”).
In practice, this lets you turn sentences like “Either I am dreaming, or the world is ending and we do not have much time left.” into something like:
D ∨ (W ∧ ¬ T)
Another operator is implication. We say “A implies B” if B is true whenever A is true. We denote this in symbols with “A ⇒ B”.
That last exercise was a little tricky. How did you know your answer was correct? The foolproof way is to write out a truth table for A and B. But, as you can imagine, that gets tedious as you add more and more propositions.
And it gets worse. What if you have an infinite number of propositions? Like “1 is a positive number” and “2 is a positive number” and so on ad infinitum? Infinitely long truth tables sound gnarly. Clearly, we need a better way to deal with this.
The better way is to think in terms of rules of inference. Rules of inference are ways to transform expressions.
A rule of inference you’ve probably used is modus ponens, which states that if you have “P is true, and P implies Q” then you can deduce that “Q is true”.
For example, if Rover is a dog, then Rover is an animal. Since Rover is a dog, we can deduce that Rover is an animal.
Rules of inference are often written as fractions where the preconditions (also called antecedents or premises) are written as numerators and the results (also called consequents or conclusion) are written as denominators:
\[ \frac{P,\; P\implies Q}{Q} \]
Note that sometimes we elide the “and” from a series of premises.
A rule of inference is valid if its conclusions are true in all models where its premises are true.
A rule of inference should also be consistent, which means you shouldn’t ever be able to use that rule of inference to prove both A and not-A.
Now, finally, we have the tools to talk about proofs. In a “logical system”, you pick certain axioms, which are propositions that seem true. Then you use your rules of inference to show that those axioms imply other exciting things. Theorems are propositions that you can prove in this way, and proofs are chains of these rules of inference. A statement that has a proof is a theorem.
One of the first logical systems was Euclid’s postulates. With a handful of simple axioms that anyone would agree with, Euclid built up all of geometry. While other philosophers (like Thales) had come up with the same results Euclid had centuries before him, Euclid put all that machinery on a solid, rigorous foundation. In a future episode, we might even go ahead and encode Euclid’s axioms in formal logic.
Meanwhile, in the early 20th century, a crisis was brewing. People were coming up with all sorts of messy paradoxical results. The biggest one was Russell’s Paradox, which went somewhat like this:
Imagine the set of all sets that don’t contain themselves. So, for example, the set {1} is in that set, while the set of all sets is not in that set (because it contains itself). Does this weird set contain itself?
With these messy loopholes popping up all over the place, Alfred North Whitehead and Bertrand Russell decided it was a good idea to take matters into their own hands and put math on solid foundations like Euclid did centuries before.
The result was the Principia Mathematica*, a treatise that stated a set of axioms and rules of inference for set theory, and then built up arithmetic and the rest of math from that.
The Principia was careful to avoid any sets that contained themselves, and so Russell’s Paradox and the Liar’s Paradox could be avoided.
It seemed like a sweet deal, until Kurt Gödel came along and broke everything.
(*We stopped using the Principia, by the way. Most mathematicians use the axioms of Zermelo–Fraenkel set theory, abbreviated ZF. But it still has the problems Gödel discovered.)
Modus ponens isn’t a magic bullet.
This is really bad, if you think about it. It means that there are statements that can be true according to our axioms, but impossible to prove in our logical system. This is what is referred to as incompleteness—the subject of Gödel’s Incompleteness Theorem.
The Incompleteness Theorem says that if your logical system can be used to represent arithmetic, then you can come up with a statement that you can neither prove nor disprove.
This is horrible. It means that there are statements we can’t prove in math. There are questions without answers!
Here’s a question without an answer: those of you who have read Harry Potter and the Diagon(alization) Alley might remember that there are more real numbers than integers. The Continuum Hypothesis states that there isn’t a set of numbers that is “bigger” than the integers but “smaller” than the real numbers. It turns out that we can’t prove that. The Continuum Hypothesis is independent of the ZF axioms.
If that sounds a bit abstract, here’s another one: Euclid’s parallel postulate, which says something really obvious about parallel lines, turns out to be independent of his other axioms.
If a line segment intersects two straight lines forming two interior angles on the same side that sum to less than two right angles, then the two lines, if extended indefinitely, meet on that side on which the angles sum to less than two right angles.
Finally, the axiom of choice is pretty controversial. Most mathematicians accept it as true, even though it leads to all sorts of weird results. The weirdest one is the Banach-Tarski Paradox, which shows how you can take a sphere, cut it up into five pieces, and reassemble them to get two spheres.
When we talk about ZF with the axiom of choice, we call it ZFC.
As exciting as the Incompleteness Theorems are, there’s a much less celebrated result of Gödel’s called, oddly enough, the Completeness Theorem, published in 1929. It says that there is a rule of inference that is complete (every true statement is provable) for propositional logic.
Gödel did not, however, show what this rule was: we had to wait until 1965 for it. In the next post, we will discover this magical rule of inference, prove its completeness, and show how to use it to write an automatic theorem prover. And we’re going to find out why the title of this post makes sense.
I’m going to pick on a math problem I find very annoying. It’s completely arbitrary, it has no beauty or elegance, and it is tedious and unenlightening to solve.
A long thin strip of paper is 1024 units in length, 1 unit in width, and is divided into 1024 unit squares. The paper is folded in half repeatedly. For the first fold, the right end of the paper is folded over to coincide with and lie on top of the left end. The result is a 512 by 1 strip of double thickness. Next, the right end of this strip is folded over to coincide with and lie on top of the left end, resulting in a 256 by 1 strip of quadruple thickness. This process is repeated 8 more times. After the last fold, the strip has become a stack of 1024 unit squares. How many of these squares lie below the square that was originally the 942nd square counting from the left?
— 2004 AIME II, #15
(The AIME is a prestigious invite-only mathematics exam used to select the US team for the International Mathematics Olympiad.)
Imagine this problem is on your homework. Your math teacher explained how to do it on the blackboard. She worked it out in detail, and the steps make sense at the time, but when you get home, you can’t remember what they were.
And then this problem shows up on your final exam, and you remember it’s that one problem you didn’t memorize how to do. You do your best, but you can’t figure it out. Maybe you made a silly mistake earlier in the problem and that messed everything up. You don’t have enough time to check your work, and your heart starts to beat faster.
Your teacher takes time out to talk to you after class, and explains that you really should concentrate more. She tells you that these simple “reasoning and problem-solving skills” are really important in most careers. You’ll need a good grade in math if you ever want to take AP Origami or Advanced Paper Folding Honors.
So you start going to a tutor after school. Even though it’s kind of expensive, your parents agree that it’s important for you to catch up to your peers.
Your tutor does the same things in class, only, you’re much more tired after school and you can’t focus as well. But your friends say they go to tutors, and they get stellar grades, so you stay with it.
Next year, you take an easier math class. You only need to survive until the end of high school, right? Then you can forget all this nonsense and spend your time learning what’s useful.
There are two very different ways we could look at math.
The first way to look at math is “math-is-a-hammer”. Trying to measure the height of this building? Trigonometry! Computing your odds at a casino? Probability! Math gives you a huge set of tools you can throw at problems.
A couple of my posts on here have been about math-hammering. You math-hammer whenever you take a system in the real world and model it in a logical way. Usually, you do it in order to predict something about your system.
Our education system, of course, focuses entirely on math-hammering. We use math-hammers in elementary school arithmetic to find out how many apples Joe should give Bob, and we use math-hammers in high school calculus to figure out whether a particle is speeding up or slowing down at time t.
And that’s where math-hammering fails. How many second graders care about the price of apples? How many high schoolers really care about the speed of a theoretical particle in a frictionless room?
(Image: Calvin and Hobbes, Bill Watterson.)
We tell them that all these hammers will be useful in careers.
And that’s true. Economists use math-hammers all the time. So do physicists and chemists and statisticians. The lady at the checkout at your local grocery store uses the hammer labeled “subtraction” every time she gives you change.
But when you’re a kid, you don’t care. Why should you? You don’t need to count your change or do your taxes. The only way math ever helps you is in getting good grades.
So let’s talk about the second way to look at math.
The second way is to look at math is in terms of building hammers, rather than using them. Math isn’t about “reasoning and problem-solving”. Math is about design. Mathematicians are just like architects who decide where the bathroom should be.
Let’s take my AP Calculus class as an example. It’s taught, justifiably, in a “this-is-the-kind-of-problem-that-will-be-on-the-AP-test” fashion. So when you are first introduced to limits, you’re asked to calculate, by hand, dozens of deltas for different functions with increasingly tiny epsilon values.
Hardly anything is said about the intuition behind the definition of a limit: it seems like an arbitrary set of rules. Or, for that matter, hardly anything is said about why the notion of a limit is useful. Why is continuity such a tricky thing to pin down? Why do we need such a crazy formal definition of this simple notion?
Something fun to think about why this is a big deal is Thomae’s Function, which is zero for all irrational numbers, and otherwise depends on the denominator of the rational number. Does this function look continuous? How can you classify it without a solid, rigorous definition of continuity? Continuity gives rise to all sorts of interesting questions and, in fact, if you keep asking tricky questions and generalizing, you end up with a whole field of math: topology.
Speaking of fields: why do mathematicians care so much about these strange algebraic objects like groups and rings and fields (not to mention monoids and vector spaces and lattices)? They’re all generalizations of familiar structures! Fields are like the numbers we know and love, except, there are other fields (like the rational functions). Theorems we prove about a generic field can be used for anything that you can show is a field. Proving a theorem about a field is like building a new hammer.
Similarly, Haskell programmers are familiar with how generic functions defined on things like Monads and Monoids turn out to be useful in all sorts of settings, from managing config files to safe I/O. Every time you program a polymorphic function so that you can reuse for different kinds of data types, you’re thinking like a mathematician does.
It’s the same deal with continuity. If you can prove exciting things about any generic continuous function, then that proof works for all continuous functions. The fundamental theorem of calculus holds for all continuous and differentiable functions. It’s a very versatile hammer. That’s why we care about continuity and give it a formal, rigorous description.
We could have chosen a different definition of continuity, of course, and maybe had subtly different theorems as a result. Maybe there’s a better definition that nobody has thought of yet. And that’s okay! Contrary to what just about all of K-12 math education tells you, you get to make up your own rules in math.
We need to purge “word problems” from as much curriculum as we can. Word problems turn math into a boring, utilitarian tool with very little practical value outside of contrived examples where Alice wants to buy apples from Bob.
We need to purge the idea that math is about “clever” things like realizing you should square both sides of the equation.
We need to purge the idea that algebra is “solving equations” and geometry is “calculating lengths”.
And we need to purge the idea that there are rules in math.
]]>Everyone’s talking about an addicting new webgame that lets you try and land SpaceX’s Falcon 9 Lander.
Gizmodo says the game “will frustrate you no end”. GameSpot, TechCrunch, EnGadget, Popular Mechanics, The Verge and Popular Science seem to agree. The Twitterverse is forming conspiracy theories about SpaceX’s recruitment strategy.
But, as usual, everyone is missing the point:
Popular Science parenthetically remarks that the game is not affiliated with SpaceX. The Verge mentions that it was made with “software born out of the MIT Media Lab”, whereas GameSpot gets it completely wrong and says that the game was “developed and published for educational purposes by the MIT Media Lab”.
In their rush to sensationalize every little bit of news, all of these websites missed perhaps the most exciting bit of information. SpaceX Falcon 9 Lander was probably written by a kid.
Scratch, you see, is a programming language designed by researchers at MIT, intended to teach children how to program. I wrote some of my first lines of code in Scratch, when I was nine. I still frequent the site.
Here’s something you might not know: unlike just about any other popular Flash game, SpaceX Falcon 9 Lander is open-source. That means you can read the source code. Right now.
In fact, if you sign up for your free Scratch account, you can also change the source code, and publish your changes. You can make Mars Rover Lander by changing a few of the images that the game uses—Scratch has a built-in paint editor. You can twiddle some of the physics to make it easier or harder to win. You can embed a cheat code, and then take screenshots to post proudly on the Internet.
You can do all of these things right now, for free, with absolutely no technical knowledge. Assembling a program in Scratch is similar to putting together LEGOs: easy and intuitive. Left alone for an hour or so, your kids will figure it out on their own. In the process they will learn fundamental skills in math and logic that will show up again and again throughout their schooling and career. They will impress you.
Not convinced? Here’s what the code to play the sounds looks like.
Someone who has never seen a computer before should be able to understand what that’s doing. That’s the beauty of Scratch.
Scratch is the future of computer science education. Scratch has inspired beginners’ computer science classes at Berkeley and Harvard. Scratchers go on to build amazing things, go to amazing universities, and lead amazing lives.
In a society increasingly dependent on technology, it’s scary how few of us know how it all works. We’re breeding armies of muggles afraid of the handful who try to discover how things work. We’re imprisoning inquisitive schoolchildren and trying to “protect” the public with a war against “hackers”.
And it’s not going to change anytime soon. Of the over twenty thousand views the game has gotten, only nineteen people have bothered to look inside at what makes it tick.
Wake up, sheeple.
]]>Your hosts for the evening are
Hardmath123
andtechnoboy10
from Scratch. Together, we have over 13 years of experience with Scratch and the computer science world beyond.
So you think you’re a Scratch expert. You know the ins and outs of Scratch like the back of your hand. You may even have hacked around with the Scratch source code.
But you want something new. You want to learn more, explore, and discover. And you don’t know where to start. Everyone you ask gives you their own advice.
Here’s ours.
The eternal question. A lot of people define themselves by the programming languages they use. People have very strong, emotional opinions about these things. The truth is, in the big picture, you can get most things done with most languages.
So before you pick a language, here’s a piece of advice: don’t collect languages, collect paradigms (that’s CS for “big ideas”). Once you know the big ideas, learning a new language should take you at most a weekend. Paradigms stay the same; they just show up in different languages hidden in a new syntax.
Having said that, here are the big ideas we think you should look at.
Functional programming is programming with functions. That means, in a way, that you’re more focused on reporter blocks than stack blocks. You don’t assign to variables much; instead of changing data in-place, you create copies that have been modified.
A lot of people think functional programming is impractical when they first get started–they don’t think you can get useful things done, or they think functional languages are too slow. Paul Graham, founder of Y Combinator (a Silicon Valley start-up which has more money than you want to know about) wrote this piece on how his company very successfully used a functional language called Scheme to “beat the averages”.
Scheme is one of the oldest languages around. It started off as an academic language used in MIT’s AI labs but over decades has evolved into a more mainstream language.
Many people have written their own versions of Scheme. The most popular one, and the one we recommend, is Racket, which was built by an academic research group but used by everyone–even publishers of books. It comes with a lot of built-in features.
The best book to learn Scheme (and the rest of computer science) is The Structure and Interpretation of Computer Programs, aka SICP. It’s free to read online. Another good one is The Little Schemer and its sequel The Seasoned Schemer.
If you want to learn functional programming but still want Scratch-ey things like blocks and sprites, learn to use Snap!. Snap! is Scratch with the advanced Scheme features thrown in.
(Bonus: Here’s Brian Harvey, the creator of Snap!, on why Scheme should be taught.)
If you want to know more about Snap!, ask us on the forum thread.
Object-Oriented Programming is based on the idea of grouping data and functions under structures called “objects”. A lot of languages provide object-oriented features, and so you’re likely to run into the ideas no matter what you choose to do.
The classic object-oriented language is Smalltalk—if you recall playing with the Scratch 1.4 source code, you were writing in Smalltalk. Smalltalk is an old language and isn’t used by anyone anymore. However, its legacy lives on: much of its syntax and some of its ideas are evident in Objective-C, the language in which you write iOS and OSX apps.
Another OOP language, Self, preached prototyping: a special kind of object-oriented programming where objects come out of “factories” called prototypes. You literally make a copy of the prototype when you make a new object. Self let you treat prototypes as objects themselves. It was very meta. This tradition lives on in JavaScript’s object-oriented style. JavaScript is the programming language of the web: originally designed for making webpages interactive, but now used for desktop software as well using Node.js.
The other kind of object-oriented programming is based on classes, which are pre-defined kinds of objects. Your program is a list of definitions for classes, and right at the end you create some instances of the classes to get things done. This pattern is the focus of the AP Computer Science course, which teaches you Java (more on this later). Java was widely used in industry for many years, and is still popular (though less so than before).
Statically Typed Languages are languages which care a lot about what kind of thing your variables are. In Scratch, you can put a string into the addition reported block and not have horrible things happen to you. Scratch isn’t statically typed.
In a statically typed language, you will be warned that you can’t add a string and a number even before you try to run the program. It will refuse to let you run it. For huge codebases managed by hundreds of programmers in a big company, this helps prevent silly errors. Though static typing probably won’t help you much on a day-to-day basis, the ideas are worth learning about and you should eventually get familiar and comfortable with the paradigm.
Java, as mentioned before, is statically typed. Other such languages are Mozilla’s Rust and Google’s Go-lang. A recent trend is to add static typing onto JavaScript, because plain old JavaScript is not very type-safe. You may have heard of TypeScript by Google or Flow by Facebook.
The C programming language is also statically typed. C is extremely low-level. It gives you a lot of control over things like memory use, the operating system, hardware, networking, and processes. This lets you write very efficient programs, but also makes it difficult to learn. It’s useful to have a working knowledge of what the C compiler does, and how assembly languages work. As such, learning C won’t teach you as much of the mathematical side of computer science as the practical side.
Finally, Haskell is an old academic language that is making a serious comeback. Haskell is statically typed with a very advanced type system. It is also functional. It has a lot of neat language features, but is not very beginner-friendly for a variety of reasons.
Logic programming or declarative programming is a completely different outlook on programming. Rather than telling the computer how to do something, you tell it what to compute and the computer tries all possible inputs until something works—in a clever and efficient way, of course. You can use it to solve a Sudoku by explaining the rules of the puzzle without giving any hints on how to solve it.
It looks like the computer is reasoning on its own, and in fact logic programming is closely related to automatic proof generation.
Logic programming is mainly an academic thing with not too many practical applications in the Real World™. One application is database querying with languages like SQL, which try to find all elements in a data base which satisfy some criteria.
The popular languages of this paradigm are Prolog and Mercury.
We mention logic programming here only to give you some idea of what other paradigms are out there. If you learn functional programming with SICP (above), you’ll learn the basic ideas. As such, don’t worry too much about learning logic programming unless you’re really interested in this stuff. You won’t find yourself writing any “practical” code in Prolog.
Other well-known books on this material are The Reasoned Schemer and The Art of Prolog.
Having said all that, here’s a recap of the languages you might care about, what they’re good for doing, what paradigms they try to embrace, and what we want to say about them.
Snap! is just a step up from Scratch. The motto was “add as few new things as possible that let you do as many new things as possible”. It’s a great program with a nice community behind it, and can teach you a lot of CS.
Scheme is like a text-based Snap!. Though there isn’t too much “real-world” software written in Scheme, it’s certainly not “impractical”. There exist popular social networks (Hacker News) and music notation software (LilyPond) running on Scheme. Scheme will teach you to think in a new way.
Python is a relatively easy language to learn after Scratch for most people. Its syntax is supposed to resemble English. Python comes with a lot of batteries included: there are modules that let you do many cool things. It’s the most popular language out there for programming websites, and along with IPython and sage/scipy/numpy/matplotlib, it’s used in the scientific community. Python is good for automating some quick tasks (“I want to save all Wikipedia articles in this category as HTML files in this folder on my computer”). However, Python has its share of issues: it’s not easy to distribute your code, and the language itself isn’t as “pure” or “clean” as Scheme: it’s object-oriented (class-based) but also tries to be functional.
JavaScript is the programming language of the Web: almost every website
you visit has some JavaScript on it. It’s functional, and it has prototyping
OOP. It resembles Scheme in a lot of ways. However, it is often
criticized for a really weak
type system (you can add two lists and get a string as a result). Though it
has flaws as a language, it is worth learning for its versatility: it runs on
anything that has a web browser (phones, laptops, televisions, watches,
toasters). You can make games with the <canvas>
element, and even use
Node.js to run servers on your local computer (like Python). The blog you are
reading this post on runs on is compiled by JavaScript. There are tons of
libraries out there—don’t bother to learn jQuery or React or any other
“framework” when you’re getting started. Learn to manipulate the DOM
manually, and you’ll discover that you don’t need a framework for most
things.
C is a low-level language, which makes it kind of messy to use. You need to manage memory on your own (people say it’s the difference between driving an automatic and a stickshift). However, it’s incredibly fast, and has some surprisingly elegant features. C teaches you how the insides of your computer really work. It also gives you access to a lot of low-level details like hardware and networking. Variations on C include C++ and C#, which add various features to C (for better or for worse). We recommend learning pure C to get started.
Java is an object-oriented language which was originally loved because, like JavaScript today, it ran everywhere. The language itself is very verbose—it takes you a lot of code to get simple things done. We recommend only learning Java if you’re writing an Android app, or if you’re taking the AP Computer Science course (which, for a smart Scratch programmer, is pretty straightforward).
Should I take AP CS?
Yes. Take it if you have a free spot in your schedule, because you will probably breeze through the course. Don’t stress out too much about it. Use it as an opportunity to make friends with fellow CS students at your school and to get to know the CS teacher there.
If you don’t have a free spot, you could probably self-study it, but we see little educational value (it might look good on a college application?). You have better things to do.
Haskell is Scheme with a very powerful, “formal” type system. It is much easier to understand the concepts once you’ve used Scheme and Java a bit. We list it here only to warn how unintuitive it is at first and how hard it is to get practical things done with it (like printing to the screen), and to recommend Learn you a Haskell as a learning resource.
Here are some languages you shouldn’t care about:
CoffeeScript, TypeScript, anything that compiles to JavaScript: Don’t use one of these unless you really feel the need for a particular language feature. Certainly make sure you know JavaScript really well before, because all these languages tend to be thinly-veiled JavaScript once you really get into them.
Blockly, App Inventor, Greenfoot, Alice, LOGO, GameMaker, other teaching languages: unless you have any particular reason to use one of these languages, you will probably find it too easy: they are designed for beginners, and won’t give you the flexibility or “real-world” experience you want.
Rust, Go, anything owned by a company: not because the language has any technical flaws, but because of “vendor lock-in”: you don’t want your code to be at the mercy of whatever a private company decides to do with the language. Languages that have been around for a while tend to stick around. Go for a language that has more than one “implementation”, i.e. more than one person has written a competitive interpreter or compiler for it. This helps ensure that the standards are followed.
PHP: PHP is not a great language. The only time the authors use PHP these days is when trying to exploit badly-written PHP code in computer security competitions.
Perl, Ruby, Lua: Perl has essentially been replaced with Python in most places: a good knowledge of the command line and Python are far more useful to you. We haven’t seen any new exciting software written in Perl in a while. Ruby also resembles Python in many ways—many say it’s much prettier—but it’s known to be slower. It’s not as widely-used, but there exist big projects that do (Homebrew and Github are both Ruby-based). Same for Lua: it’s a nice language (Ruby’s syntax meets JavaScript’s prototyping OOP), but it isn’t used by enough people. Languages like this tend to be harder to find documentation, help, and working examples for.
Esoteric languages: these are written as jokes. You should be creating new ones, not programming in existing ones!
Still confused? Here’s a flowchart to help you out (click for a large SVG, or try Tim’s interactive Scratch project):
Finally: your personal experience and preference is far more important than anything we say. We cannot recommend a practical language any better than one you are already productive in (pedagogical languages, however, are a different deal).
So we’ve come a full circle: collect paradigms, not languages.
Programmers in the Real World™ are very dependent on their tools. In fact, though we don’t recommend it, most people define themselves by their tools.
This is largely because many of the more established tools in the CS world have a very steep learning curve (because they were written by lazy hackers who want to minimize keystrokes).
Here are our recommendations for what to invest in.
A text editor: Text editors are sacred, because in theory they’re the tool we spend the most time using. We can’t really recommend one because whatever we say, we’ll face lots of heat from supporters of some other one. Instead, here’s a short list of editors you might consider. All of these are free and available for Mac/Linux/Windows.
Atom: This editor was created by the GitHub team. It’s hugely customizable, with hundreds of themes and packages written by people. Atom is notoriously slow, though, because it essentially loads an entire web browser for its GUI. Cousins of Atom are Brackets by Adobe (just as slow, fewer packages) and Sublime Text (faster and prettier).
Tip from
technoboy10
: Oodles of packages are available, but don’t try to use all of them. Find a few that really improve your coding workflow and use those.
Vim: Vim is one of the oldest and most well-known text editors—if you’re on a UNIX, you already have it installed. It runs in the command line, and requires you to learn several key-combinations to get started. It really is worth it, though, because Vim is amazing for productivity. In addition, several other applications use Vim-style keybindings. It’s an informal standard.
Tip from
hardmath123
: trying to learn Vim? Write a blog post entirely in Vim. Maybe just stay in insert mode for the first hour. Usevimtutor
to get started.
Emacs: Like Vim, it runs in the command line and has a nontrivial learning curve. Unlike Vim, it is anything but lightweight. You can use Emacs to check your email, play tetris, and get psychiatric counseling (no joke). Vim and Emacs users are old rivals. See this comic for details.
The command line: learn to use bash
from your terminal. It is extremely
empowering. The command line is the key to the insides of your computer, and
turns out to be surprisingly easy to get started with. Start with simple
operations like creating (touch
) and moving (mv
) files. Use less
to read
them, and use pico
or nano
(or vim
!) to edit them. Along the way, learn
important components of the UNIX
philosophy by piping programs
to each other. Learn how to use regular expressions with grep
; this is
life-changing because regular expressions show up in every language and give
you a lot of power over strings. Figure out how to use man
pages and
apropos
to get help. Soon you won’t be able to live without the command line.
Git: For better or for worse, the most popular way for you to share code
these days is using a website called Github. Github is a
web interface for a tool called git
, which is a version control software
(another one is hg
, a.k.a. Mercurial). git
lets you keep track of your code
as it changes, and lets other people contribute to it without having to email
different versions of code around. It’s not that hard to learn, and you’ll need
to learn it if you want to contribute to any projects these days.
Tip from
hardmath123
: Don’t worry if you don’t truly grokgit
. It’s my personal hypothesis that nobody really understands it. Just have enough working knowledge to get stuff done.
IRC: IRC or Internet Relay Chat is a decentralized chat protocol, which
means it’s like Skype except not controlled by any one company. It’s been
around for a while–it was used to organize a 1991 Soviet coup attempt. You
want to learn to use IRC because it’s not very intuitive at first look. But
many communities in the tech world communicate through IRC chatrooms, called
“channels”; it’s a great way to reach out and get help if you need it. The best
way to get into IRC is to just dig in–use Freenode’s
webchat client at first, then experiment with
others (Weechat, IRSSI, IRCCloud, etc). Feel free to say hi to us: we’re
hardmath123
and tb10
on Freenode.
Find (good) documentation: know about StackOverflow, Github, MDN, etc. We won’t drone on about these sites. Just know that they exist.
Tip from
technoboy10
: Google is a coder’s best friend. Everybody has a different way of finding solutions to programming problems, but learning to search and find answers online is an immensely valuable skill. If you’re not a fan of Google, I recommend the DuckDuckGo search engine.
The CS world, like any community, has its own set of traditions. It’s said that UNIX is more an oral history than an operating system. With that in mind, here are some books, articles, and websites for you to peruse at your leisure.
Don’t take any of them seriously.
(Disclaimer—some of these may have PG-13-rated content.)
forbin
that hm
says hi)You’re about to start on a wonderful journey. Enjoy it. Make friends. Make mistakes. These choices are all meaningless; you’re smart and you’re going to be alright no matter what text editor you use or which language you learn.
Hack on!
]]>He [Norton] had a suspicion of plausible answers; they were so often wrong.
— Rendezvous with Rama, Arthur C. Clarke
Clarke’s Rendezvous with Rama describes the exploration of a giant spaceship called “Rama”. If you haven’t read the book yet, go read it and come back, because the rest of this post is a really big spoiler. I’ll be waiting; don’t worry.
Welcome back.
To me, the charm of Rendezvous with Rama is the way Clarke introduces parts of the spaceship, lets you guess what they’re for, and then reveals their purpose in a series of intertwined narratives. The book is a guessing game.
One of the central mysteries, of course, is the Southern Cliff of the Cylindrical Sea: why is it so much higher than the Northern Cliff?
In a flash of inspiration, the exobiologist Dr. Perera realizes that when Rama accelerates (to the north), the Sea would rise against the Southern shore; the Cliff is a barrier to prevent a great flood. In his own words:
“The Cylindrical Sea is enclosed between two cliffs, which completely circle the interior of Rama. The one on the north is only fifty meters high. The southern one, on the other hand, is almost half a kilometer high. Why the big difference? No one’s been able to think of a sensible reason.
“But suppose Rama is able to propel itself—accelerating so that the northern end is forward. The water in the sea would tend to move back; the level at the south would rise, perhaps hundreds of meters. Hence the cliff.”
— Rendezvous with Rama, Arthur C. Clarke
On a roll now, Perera goes on to predict—with not more than twenty seconds of thought and scribbling—the maximum possible acceleration of Rama based on the height of the Cliff (500 meters). His result, 0.02g (2% of Earth’s gravitational acceleration), is confirmed at the end of the book.
How did he do it? Let’s investigate.
Here’s a diagram of Rama’s cross-section to help you follow along. It’s taken from the cover art of a video game based on the novel; the image was posted to a forum thread by someone named DELTA.
The first page of this paper contains a simpler schematic as well as some pretty pictures.
First, some raw data scraped by scouring the novel.
We have a couple of ways of determining the unaffected “gravity” (centrifugal effect) on Rama. The centripetal acceleration is proportional to the square of the angular velocity and the radius. Knowing that the Plains are 8km from the axis about which Rama rotates, and that it rotates at 0.25rpm (a rotation every 4 minutes), we calculate that Rama’s gravity is 0.6 that of Earth’s:
\[ g_\text{rama} = \omega^2r = \left(\frac{2\pi}{4\times60\,\text{sec}}\right)^2\times(8000\;\text{m}) = 5.483 \text{m}/\text{s}^2 \approx 0.6\, g_\text{earth} \]
For those without MathJax, that said
g_rama = w^2 r
= (2pi / (4*60s))^2 * 8000m
~= 0.6 g_earth
This result agrees with a statement by one of the explorers, Mercer: when he was less than 2km down the stairway, he said his weight was around a tenth of what it would be on Earth.
We know the width of the sea: it’s 10km across. Unfortunately, we know very little about the depth of the sea. We do know that the seafloor is not uniform; it’s ridged to disperse large waves. We even have a lower bound on the deepest portion: at one point, an anchor is lowered 30 meters into the sea. However, these facts were discovered after Perera’s calculation, so we could be justified in assuming that the seafloor is uniformly “flat”. We also have an upper bound of 2km because that’s the difference between Rama’s inner and outer radii.
Let’s assume that the sea is at least 0.5 kilometers deep*. As long as it’s at least that deep, we don’t need to know exactly how deep it is, because the sea surface will never intersect with the seafloor.
Here’s what it looks like^{†}:
Based on the force diagram, a bit of similar triangles magic shows that for the water to reach a height of 500 meters, the ship must accelerate northwards at 6% of Earth’s gravity. This is arguably pretty close to 0.02 g, considering all the approximations and eyeballed measurements involved. (At least, it’s the correct order of magnitude, which is supposedly all astronomers really care about.)
*Interestingly, if you assume that the sea is shallower than 0.5 km, you can try to reverse-engineer the exact depth Perera must have assumed. It turns out that there are no real solutions for depth which yield 0.02g as the acceleration. Here’s the diagram that corresponds to the scenario—the key insight is that the area of the rectangle and the triangle must be the same since the volume of liquid is the same.
[^{†}Apologies for the Paleolithic-era hand-drawn diagrams which look like they were scanned in the ‘80s. I would appreciate it if someone could whip up a nice computerized image in a vector format…]
Overall, I’m impressed with the accuracy of Clarke’s physics. I suppose attention to detail like that is what makes these novels so fascinating.
I would love to see other explanations of Perera’s result. Maybe I missed something. Let me know if you make some discoveries.
]]>To all the friends I made this summer. Thanks for all the memories. Amicabilibus!
As with many exciting things, this one began with the prospect of bubble tea.
In case you’ve been missing out, bubble tea contains large tapioca pearls in it and so it comes with extra-special straws that have twice the diameter of normal straws.
Have you ever really thought about those straws? Of course you haven’t. I applaud you and your sanity.
I, of course, have really thought about those straws. I occasionally say things like “what if you tried to drink mercury through a meter-long straw?”, which causes people to make excuses and leave, and TSA agents to make you take your shoes off.*
But this being my blog, I’m now going to impose some thoughts about those straws upon you. Brace yourself.
See, the trouble is that you have to poke these straws through the top of your drink. They have pointy tips to facilitate this, as kindly illustrated by the very legit-looking website free-stock-illustration.com:
Actually, anyone who’s ever poked a bendy straw through a juice box knows exactly what I’m talking about. They have pointy tips, right? Right.
It turns out that you should be very concerned about this. Or not, if you’re more of the “big picture” kind of person. Personally, I find this very disturbing.
What happens when you cut a straw? Well, a straw is a cylinder, so you should get a cylindrical section. Wolfram Mathworld reminds us that planar slices of cylinders are ellipses:
But ellipses are not pointy. This is bad. Where does the pointiness come from?
If you ask someone, they’ll probably say things like “maybe it’s just a really thin ellipse” or “maybe it’s actually not pointy and only cuts because the plastic is very sharp”. So I guess it’s worth mentioning that (1) a thinner ellipse would be just as blunt, and (2) if the plastic was that sharp, your tongue would bleed each time you drank a Capri-Sun.
I guess a more fundamental question to ask is, how are bubble tea straws cut? This turns out to be surprisingly hard to find useful information on. This site suggests some sort of knifing mechanism. I envision a large-scale straw guillotine that chops up hundreds of straws a minute and leaves a pile of straw-rubble on the factory floor. (It turns out that straw guillotine is a thing, and happens to be a thing in my Google search history now. Please don’t judge me?)
In any case, the point (ha!) is that when you use a blade to cut a straw, you momentarily flatten it. This is probably easiest to see if you pick up a straw and try to cut it with scissors: the part right at the blade gets flattened the way a garden hose does if you step on it.
So really, what we’re thinking about is, “what happens when you squish a straw?”. I whipped up some images to help think about what’s happening (ping me if you want the code).
This is what happens when you unsquish a cut straw (squishedness is on the Y axis, different profiles are along the X axis):
Yay, the rounded cylinder is pointy!
Depending on how traumatic your trigonometry class was, that curve in the bottom right corner should look vaguely familiar. It’s half a period of a sinusoid.
Can you convince yourself why this makes sense? Think about what the side profile of a spring looks like…
We knew that a planar slice wouldn’t form a straight-line cut because ellipses aren’t pointy. So a natural question to ask is, what shape does a planar cut correspond to?
Do you have a guess?
Yep, it’s the inverse sine function (why?). Can you come up with a way to map “squished” functions to “cylindrical” functions? Is this function invertible? So many questions!
To be honest, this post isn’t about straws—as interesting as they are. It’s about looking around and finding exciting things. It’s about ignoring the grown-ups who think straws are unexciting. Because we’re surrounded by exciting things. Everywhere. Like happiness and magic and diatomic nitrogen. It’s your job to go seek them out.
(*But really, could you drink mercury out of a meter-high straw? Even if you managed to suck out all the air in the straw, the vacuum would only manage to hold up 76 centimeters of mercury… Gasparo Berti thought about this stuff in the 1600s, and sent a letter to Galileo which got the ball rolling and led to Torricelli proving the existence of a vacuum and inventing the barometer. Read more in this delightful article by Karl Dahlke.)
]]>I have published over 40,000 words of writing now—more than The Lion, the Witch, and the Wardrobe and Hamlet. Not that word count matters, of course. Words, like people, are meaningless on their own.
And every time I sit down to write a new post, I wonder why I do it.
Readership in itself is hardly a goal. Unread words are just as meaningful.
Communication, on the other hand, is a goal. Every person who agrees—or disagrees—with my ideas; every person who is inspired to write something of their own; every person who shares a discovery with me or (dare I hope) learns something from me: each one of you inspires me to write. You know who you are.
And as much as you, dear reader, may have learned from my writing, I’m confident that I learned more. In attempting to publish at least twice per month for the past year and a half, I discovered that it is not an unattainable target. I have 2,000 words worth of material worth sharing every month and I’m proud of every sentence.
That, of course, is not easy for me to say. If you have been with me from the very beginning—and some of you have—you have seen pieces with questionable claims and controversial ideas. You have seen opinions any self-respecting person would disagree with. You have seen typography that may have left you with permanently impaired vision.
In most cases, I agree with you wholeheartedly. I am thankful for both the feedback you offered and the criticism you kept to yourself.
But—you can check this yourself—these pieces can still be seen in their full, unmodified glory.
Their persistence is not a reflection of any dogmatic sentiments regarding censorship or free speech. Rather, it is a reflection of my pride in every piece I have written.
The truth is, if asked “Are you proud of your writing?”, I would say, “no”. Like a recording of your voice, your own writing always has a slightly nauseating quality.
What am I proud of, then?
I’m proud of the fact that I can look down on my old writing. Because it means that somehow, over the years, I’ve risen. You can only look down from up above.
That’s what it means to be embarrassed about past writing; it means you’ve grown, both as a writer and as a person.
Respect and pride: they’re measured on a relative scale. All you have is the derivative; all you have is whether or not you’re a better person than you were yesterday.
And all you have is the indefinite integral. Indefinite.
We don’t know the constant, the reference point, the absolute scale. We will never know the constant. The constant, as always, represents conditions. Things you can’t change. Things you don’t feel you deserve and things that aren’t your fault. Things that, for better or for worse, are constant.
Maybe that’s why I write.
]]>This is an idealized transcript of a talk I gave a couple weeks ago at our school’s “Engineering Night”, an event where students are invited to speak and get others excited about engineering. I’m putting this up here because it makes a great blog post, and because I want to be able to come back and find these ideas in the future.
When I was five, my parents took me to go see Finding Nemo. It’s an amazing movie, except when you’re five, all you really process are terrifying scenes like this one.
So we didn’t really go out for many movies after that.
The next one I remember seeing was in the fifth grade, actually, with my best friend. We went to see Cloudy With a Chance of Meatballs.
I loved it. Not just the story—the pictures. I loved how everything was so realistic. I mean, I was told that these movies are computer-generated, but I felt that I could reach out and touch those hair, or poke the huge mound of Jell-O.
So when I got back, I wondered “how do you do it?” and like any fifth grader, I came up with an explanation. I’d been watching NOVA, so I had heard of these things called “photons” which are like tiny golf balls. And so my explanation was that computers shoot these golf balls into a 3D model just like how you can shoot bullets in Halo and Counter-Strike. And by knowing how these golf balls bounced around, you could figure out what things looked like.
Yeah, it’s silly. But when I was in fifth grade, I also thought I could stick out my hand under the sun and catch some photons like rain, and if I collected enough I could drink them like any other liquid. I tried this, and it didn’t work, and when I asked my teachers they told me I was watching too much NOVA.
And I don’t mind. I had an idea, I tested it, it didn’t work, and I wondered why. As far as I’m concerned, that’s as close as I’ve ever gotten to the “scientific method” that they teach you at school.
In any case, I ended up getting really lucky. Out science teacher used to order these Scholastic magazines. They were mostly advertisements for books, but some of them had short articles about new science.
That week’s magazine actually had an article about Cloudy With a Chance of Meatballs. And it explained… nothing, actually. It was a thinly veiled advertisement. But it did have a link to a website called Scratch.
Scratch might be one of the best things to ever happen to me. It’s a website that’s sort of analogous to YouTube, except instead of videos, you can upload small programming projects, like Flash games. Scratch comes with its own really simple programming language, and some of you in the audience might have already used it if you took a CS class here.
I loved Scratch. Over then next two years, I made games, stories, and—most importantly—friends I still talk to today.
I, of course, forgot this whole 3D movie thing completely.
…until one day, Ratatouille was on, and I remembered again.
Look at the water splashing next to Remy’s head. You can see the wall through it. But it distorts the wall a bit. It’s called “refraction”.
If you asked me to draw this, I would have no idea how the water distorted the wall. But when I look at this, it looks right. Somehow, the computers know how this all works.
At this point, I was old enough to find things out for myself. So I Googled it.
It turns out that to make these movies, computers fire these tiny little balls, like golf balls, into a 3D model. By seeing how the balls bounce, they can figure out how the model looks.
It’s called “raytracing”, and it’s a serious academic subject, not a silly idea a fifth grader came up with in the shower.
So of course I wanted to know more. The first thing I learned was that when they say “computer-generated movie”, what they mean is “supercomputer-generated movie”. This is the supercomputer that rendered Cars. Cars has a running time just short of two hours and was rendered at 25 frames per second.
Guess how long it took to render a frame?
Seven hours.
As technology improves, it takes us longer to render movies because we’re getting so focused on detail. We’re simulating individual particles in an ocean, and individual strands of hair.
So I was entranced. I wanted to know more. And at this point, the only way to learn more was to do it myself. So I did.
I’m so proud of this picture. It represents so much to me.
Of course, my story didn’t end there. It ended in math class this year, battling this monster.
This is the “cross product”. If you haven’t touched it yet, you’re lucky. It’s the determinant of a matrix, except, some of its entries aren’t even numbers, they’re vectors. And if you’re really lucky, you get two possible answers.
And to guess which one is right, you need to make gang signs. This is the “right-hand rule”, and basically, everyone looks really funny doing this when taking a test.
Something about this bothered me. And when I got home, I finally realized it.
These lines of code are—unaltered—from my raytracer.
Looks familiar? It’s the cross product!
I wrote those lines of code in seventh grade. I had no idea what a vector was. I don’t know how I got those equations—I just remember doing lots of algebra on paper and somehow getting them.
And suddenly it all made sense to me. All this math we’re learning? It’s useful. You know why the cross product has two answers? Well, the cross product essentially says “if the ground is sloped this way, which direction points away from the ground?”. You use it to say “which way should a photon bounce when it’s bouncing away from a surface”.
Well, there are two directions away from the ground! You can go up into the sky or you can drill down into the Earth. And so the two answers just point in opposite directions. It makes sense!
What I’m trying to say is, you don’t need to take 10th grade math to make something cool. Every single one of you in the audience here is completely capable of creating something incredible. So what are you waiting for? Taking that Stanford course on fluid dynamics someday? You don’t need it.
Every single one of you has so much knowledge right in front of you: the Internet, libraries, and brilliant teachers everywhere. So stop waiting until you’re “ready”, and go build something.
]]>Apparently it has recently become fashionable on Github READMEs to put in a screenshot of a tty rather than explain the usage in actual words. There’s nothing wrong with them for the most part, but what bothers me is that all these screenshots are ugly. The links above are in increasing order of beauty.
It turns out that taking nice screenshots is filled with icky pitfalls and undocumented secrets. Here’s how you do it right—on a Mac running a relatively recent OSX.
First of all, make sure you really want to use a screenshot. Plaintext is generally enough to convey what your project does, and all the text is copyable.
You should only be using a screenshot if your project has some curses-esque behavior—messing with colors and drawing and raw mode and all that jazz.
When taking a screenshot, make sure your terminal profile (custom colors, dark
background, etc.) doesn’t interfere with anything. Also, make sure your
terminal prompt ($PS1
) is sufficiently normal. Yes, a plain >
is
minimalistic and pretty. But it’s confusing—are you running bash, or is the
prompt part of your program’s interface?
Finally: use small windows. It’s hard to read text if you take a screenshot of an enormous terminal window and shrink it down.
Most people know that you can use cmd-shift-4 to enter screenshot mode and select part of the screen. A lesser-known trick is that you can press “space” to enter window selection mode. This lets you click on a window to take a screenshot of that window; and you end up with this (click to enlarge):
Why is this better than taking a normal screenshot and cropping? Because this method has the underlying code actually draw a fresh, high-resolution copy of your window—even in full-screen mode. It also includes that pretty shadow (which, by the way, is rendered with a translucent PNG alpha channel so it looks good on every background).
It’s also, to be honest, much easier for the lazy.
Surprisingly, this screen capture mechanism has a command-line API. Unsurprisingly, it has terrible documentation.
The command itself is called screencapture(1)
. You want to feed it the
undocumented-in-the-man-page -w
option to specify the window ID.
To get the window ID, you can use AppleScript’s tell app "$APPLICATION" to id
of window $N
command, where $N
is the index of the window from “top” to
“bottom”. You want to set $APPLICATION
to Terminal
and $N
to “1” to get
the focused window. Putting it together, we have:
$ screencapture -o -l $(osascript -e 'tell app "Terminal" to id of window 1') screenie.png
This is, of course, terribly un-useful because it’ll take a screenshot of the
window the moment you type this in. I would suggest prefixing it with sleep
5;
, and running it in a separate window (not a separate tab—that would get
captured in your output!). This gives you five seconds to switch to your target
window and get ready for the screenshot.
Sometimes, you can only really show how your project works with an animated GIF.
In general, it’s good to keep GIFs short and small. They should loop cleanly: the easy way to accomplish this is by running “clear” at the end of your program so that the terminal state is restored to what it was before you ran it.
To make a GIF, you need to take multiple screenshots by looping
screencapture(1)
(you don’t really need a delay, since the process of
capturing is so slow):
rm screenies/screenie-*.png
N=0
while true; do
screencapture -l $(osascript -e 'tell app "Terminal" to id of window 1') screenies/screenie-$(printf "%05d" $N).png
printf "Created %05d\n" $N
let N=$N+1
done
Then, you can use Imagemagick (brew install imagemagick
) to stitch them into
a GIF using the convert(1)
command. Make sure you specify -alpha remove
for
each frame so that the shadows get rasterized properly:
convert $(for a in screenies/*; do printf -- "-delay 1x60 -alpha remove %s " $a; done; ) result.gif
This step will take a while and output a monstrously huge GIF, because the PNGs
are pretty big. You can use Imagemagick -quality
and -resize
to shrink them
substantially:
mkdir screenies-compressed
cd screenies
for a in *; do
convert -quality 80 -resize 60% $a ../screenies-compressed/$a
done;
cd ..
If you take screenshots often, it’s probably worth aliasing these commands in
your ~/.profile
.
This is The Awesome Elements Problem. I wrote it for my AP Computer Science class, but I decided to put it up here because I think it’s pretty, uh, elementary.
Perhaps more than the actual problem, I love the bonus problems at the bottom. They show how all these “boring” Scheme exercises can be used to do all sorts of neat things. They’re written to introduce a new idea with lots of questions—the solver is expected to explore them both on their own, with external resources, and, of course, with other friends.
Finally, it’s worth noting that (as the first bonus problem should make amply clear) this problem is an absolute pain to do in an imperative language. I think it lets beginners see a rare outside-the-textbook example of functional programming rocking out in the wild, rather than silly contrived scenarios involving bank accounts or store inventories or parking meters.
Teachers are welcome to steal this for their classes.
Some names are inherently different from others. For instance, the name Casey can be written as a list of element symbols, as Ca-Se-Y (Calcium-Selenium-Yttrium). However, Josh cannot be written this way. In this project, you get to write a Scheme program to break up a word into its—ahem—constituent elements.
We begin by defining a list of all the elements. An element’s symbol is
described by a list of Scheme symbols, so Helium is '(h e)
. As a sanity
check, you can run (length elements)
and get 118.
(define elements '((a c) (a g) (a l) (a m) (a r) (a s) (a t) (a u) (b) (b
a) (b e) (b h) (b i) (b k) (b r) (c) (c a) (c d) (c e) (c f) (c l) (c m) (c
o) (c r) (c s) (c u) (d b) (d s) (d y) (e r) (e s) (e u) (f) (f e) (f m) (f
r) (g a) (g d) (g e) (h) (h e) (h f) (h g) (h o) (h s) (i) (i n) (i r) (k)
(k r) (l a) (l i) (l r) (l u) (m d) (m g) (m n) (m o) (m t) (n) (n a) (n b)
(n d) (n e) (n i) (n o) (n p) (o) (o s) (p) (p a) (p b) (p d) (p m) (p o)
(p r) (p t) (p u) (r a) (r b) (r e) (r f) (r g) (r h) (r n) (r u) (s) (s b)
(s c) (s e) (s g) (s i) (s m) (s n) (s r) (t a) (t b) (t c) (t e) (t h) (t
i) (t l) (t m) (u) (u u b) (u u h) (u u o) (u u p) (u u q) (u u s) (u u t)
(v) (w) (x e) (y) (y b) (z n) (z r)))
Whew. For extra credit, come up with a way to make that list automatically from some table you find on the Internet. Here’s a nice one.
Let’s warm up with some easy helper functions. Write
(get-rest-of-string str len)
. It should return the list str
after the first
len
elements have been removed.
If you’ve written compose
before, think of a super-elegant way to do this.
Now, write (begins-with-element str el)
, where str
and el
are lists of symbols. The function should return true if el
is
exactly the beginning of str
, and false otherwise. Think about what should
happen if either string is empty.
(begins-with-element '(d o c t o r w h o) '(d o c))
--> #t
(begins-with-element '(a m e l i a p o n d) '(a m y))
--> #f
It turns out that these are all the helpers we need to write elementize
.
elementize
is our main function. It breaks up the word str
into elements
in the list els
, and returns all possible results. Fill in the blanks to
complete elementize
.
Or, if you can think of a better way to write it that doesn’t fit in the blanks, do that instead.
(define (elementize str els)
(cond ((null? els) ___)
((null? str) ___) ; Hint: this is not the same as above.
((begins-with-element ___ (___ ___))
(append
(elementize ___ (___ ___)) ; Remember, `append` concatenates
; two lists into one bigger list.
(map
(lambda (list-of-subsolutions)
(cons (___ ___) ___))
(___
(___
___
(length (___ ___)))
___))))
(else (elementize ___ (___ ___)))))
You can use these tests to try out elementize
. I’ve provided the solutions
at the bottom of this page.
(write (elementize '(j a v a) elements))
(newline)
(write (elementize '(i s) elements))
(newline)
(write (elementize '(u n n e c e s s a r y) elements))
(newline)
Great job! Now for the fun part. Try solving each of the bonus problems below. They’re in no particular order of difficulty. Each one is meant to introduce you to a new, exciting CS topic.
Bonus problem 0! Rewrite your solution in C or Java. Time yourself. Then realize how much you love Scheme.
Bonus problem 1! UNIX computers come with a built-in dictionary of English
words in the file /usr/share/dict/words
. Each word is on its own line. Spend
some time hacking Scheme to see if you can find the single English word that
can be elementized the most ways. How about the longest elementizable word? Are
there any “unnecessary” elements which can be removed from the list without
making any words un-elementizable?
Bonus problem 2! You’ve just discovered a new element! With all your modesty, you decide not to name it after yourself. Instead, you decide to name it so that the addition of the new element will maximize the number of new elementizable words in English (based on the list above). What do you name it? (This one is hard because the program needs to be fast! See if memoization can be useful.)
Bonus problem 3! On a small planet in the vicinty of Betelguese, only two elements can exist in a stable state: Zaphodium and Zemzine. Their symbols were carelessly named Z and Zz by the Chief Chemist (who was later thrown into a vat of zaphodous zemzide (ZZz) by his angry confused chem students).
How many different ways are there to elementize the word zzzzzzzzzzzzzzzz
with Z and Zz? How is this related to the question “how many ways are there to
cover a two-by-ten grid with dominoes”? (Hint: this is purely a math problem.
You can use a computer to help find the answer, but try to use math to prove
it.)
Bonus problem 4! Read about lazy lists. Figure out how to implement them in Scheme, and then use them to solve this problem. Is your solution faster? More space-efficient? Does it look prettier?
Once you’ve done that, read about call-with-current-continuation
. Figure out
how to use it cleverly to solve this problem (if you’re confused, read about
backtracking, or consult the Python program linked at the bottom of this
page). Is your solution faster? More space-efficient? Does it look prettier?
Think about how the above two implementations are the same, and how they are
different. Can you use call-with-current-continuation
to implement lazy
lists?
Bonus problem 5! Read about regular expressions. Which regex do names like “Casey” match? Which regex do names like “Josh” match? Which of the previous two questions is easier, and why?
You can use a program called grep
to test your solutions.
(Non-aqueous) Solutions to the tests above:
java (0)
() --- Java is *clearly* not an awesome name. Try Lisp instead.
is (1)
(((i) (s)))
unnecessary (2)
(((u) (n) (n e) (c e) (s) (s) (a r) (y))
((u) (n) (n e) (c) (e s) (s) (a r) (y)))
(This problem was written in November 2014. It is based on a bad Python program I wrote in December 2013. The only modification is that it was originally distributed as a Scheme source file where all the text was commented out, and method stubs were left for students to fill in. I have also added a couple of bonus problems—only 1-3 were in the original source. Please contact me directly if you’d like the original file, along with its solution file.)
]]>Ever since I was a little kid, I loved magic.
I didn’t love rabbits coming out of top hats, or various deceptions with colorful pieces of cloth. I loved real magic: magnetic levitation, periscopes, and predicting the weather.
I owned a few of those books that are filled with fun elementary school science experiments like baking soda and vinegar volcanoes. The only experiments I cared for were the ones where the result looked like a miracle unless you understood the science.
As I got older, I started giving up on the real world’s reliability. But I still found magic everywhere. I found magic in very clever puns (the kind which are so perfect, they can only exist due to willful human interference in the evolution of the English language). I found magic in music, in pieces that seem impossible to play based on a cursory examination of the anatomy of the human hand. I found magic in perfect murder mysteries, with delicate plots hinging on countless trivial details placed harmoniously throughout the chapters.
Most of all, I found magic in math.
I don’t find much pleasure in pretty proofs and groups and all those other things that mathematicians find beautiful. I appreciate them for what they are, but I can’t look at a prime number and feel a rush of joy or tranquility like Christopher Boone does. I don’t like doing math for the sake of math. In that sense, I haven’t seen the true divine light that many successful mathematicians have, and I doubt I ever will. I’m not that kind of person.
But I still find magic in math.
I’m entranced by the fact that you can draw a regular 17-gon with just a compass and a straightedge. I’m amazed by the fact that you can find the sum of the factors of a hundred million million million fast, without a calculator.
I like the math that lets you do magic—real magic—with just a bit of thought. I like proofs whose results I don’t believe (c’mon, there’s no way the ant reaches the end of the rubber rope). Proofs that give you an edge over everyone else, even though everyone else could have come up with them on their own (because believe it or not, switching doors makes a difference).
I like math that beats intuition (how is this the closed form for the Fibonacci numbers? Is this even an integer all the time?!). I like math that gives us superpowers (how can you possibly communicate with someone without agreeing on a shared key ahead of time?).
In short, I don’t find fractals beautiful but I have deep spiritual moments when I realize that you can compute their areas.
And I’m content with this appreciation, because it gives me something to wonder and marvel at, which is all you really need to be happy. Perhaps I’m incapable of appreciating the intrinsic beauty of the existence of certain mathematical truths. But I am certainly capable of appreciating the wizardry required to discover—rather, create—them.
To me, magic is the ability to take something impossible and make it possible. And I believe in magic.
Luckily for us, it turns out that the world is a huge, mysterious, magical place. The moment you realize that, you want to become a wizard. You want to learn.
It’s not just math. History is magic when you realize that you can explain why certain villages in India speak fluent Portuguese. Chemistry is magic when you can predict the outcome of a reaction just by looking at the inputs. And on and on and on and on.
The best incentive to learn is a prospect of wizardry. When we see miracles, we want to be capable of causing them.
So please show people miracles. Show your kids, students and friends impossible things and maybe if we’re lucky some of them will teach themselves how to make them possible.
It’s the only way to instill a real love for knowledge.
]]>One of the reasons I love computer science is that in a good community, everyone is a teacher. You don’t need to attain some degree of credibility before you can go around helping others—you just need to have information and the will to share it. That’s why I can write informative blog posts and help people debug Python issues. It’s why StackOverflow exists and thrives.
It’s like we’re all trying to climb a mountain, and we do so by lending a hand to whoever’s below us while getting a boost from whoever’s above us. The people at the very top are researchers, people blazing new trails and discovering new computer science.
In a way, this is better than the classical model of those at the top of the mountain pulling us all up. Someone who has just understood an idea—recursion, the Y combinator, type theory, whatever—is better-equipped to teach it than someone who takes it for granted having worked with that idea for ages.
…except it doesn’t quite work out like that. The problem is, somewhere near the middle of the mountain, people stop helping you up. After the first few cliffs, you suddenly lose sight of the smarter people, the people who were always a step ahead of you and ready to help. You have to pull yourself up, and that’s hard and lonely. And you wonder where they all went.
Then you look down, and you realize that all those people are back at the base of the mountain running “Learn Python in Two Weeks” workshops.
Why do we put so much time, money, and energy into teaching newbies computer science?
Part of it—and it’s hard to admit—is that it looks really good. Teaching intro CS looks like community service. For institutions, it’s a great selling point. For students, it’s résumé material.
It’s also easy. You know how to teach someone the basics of JavaScript because you know those things so well. It’s trivial to find students, because everyone wants to learn web development and start the next Facebook.
On a less cynical note, it’s fun. Teaching is inherently fun, and CS is something that we have the opportunity to teach to an excited audience without any formal barriers.
(Just to disclaim myself: I totally believe in getting more people into CS and providing young students with opportunities to learn. I just believe in keeping a very high standard for these opportunities, and (more importantly) leaving them as just that: opportunities.)
Anyhow. I see two major problems with this.
Firstly, there are now a million different ways a beginner can learn CS, and so there is naturally a huge variance in quality. I think one of the saddest things is when someone is promised to be taught “web development” and is instead force-fed a PHP crash-course that confuses them, potentially turning them away from CS. Pedagogy is hard; it should be left to the experts.
Of course, “expert” doesn’t necessarily mean “has a teaching degree”. I would trust an excited teenager who just HTML’d his first blog to do a perfectly good job teaching a newbie web development.
As much as I support entreating beginners to learn CS, I believe that the best learning is self-driven. The people who will go on to have happy, successful careers in technology don’t need to be led by the hand. They will figure things out for themselves without a curriculum, and they will probably end up learning more from the process.
All this attention towards teaching beginners is superfluous and only serves to dilute the brilliant resources that amazing teachers spend time creating.
Secondly, focusing so intently on teaching beginners takes attention away from the “intermediates”. People who have conquered the first few cliffs, and are now looking up at the next few. The number of resources for such people is strikingly lower. Once you’ve graduated a beginning course, you don’t know which way to proceed.
Everyone in the CS world has a responsibility to guide those below them.
Paying attention here would catalyze progress, and give hundreds of smart high schoolers (and middle schoolers) around the world goals and ways to achieve them.
Remember when I said CTFs are better than hackathons? This is another reason why. CTFs cater to both beginners and experts alike, giving both challenges and helping both find ways to learn new things.
So what can we do?
We can write blog posts about advanced topics. We can spend less time creating beginners’ tutorials, and more time telling post-beginners the way forward. Creating maps on how to go from “experienced” to “expert”.
Finally, we can try to bring smart, like-minded programmers of similar experience levels together, because these interactions result in a great deal of learning.
]]>Apparently Randall Munroe gets a lot of messages saying that the “random” button on xkcd is biased.
2015-03-19 16:47:00 Hobz also, Randall, the random button on the xkcd frontpage is frustratingly un-random
2015-03-19 18:50:52 ~Randall it’s random.
2015-03-19 18:50:59 ~Randall people contact me constantly to tell me that it’s not
2015-03-19 18:51:17 ~Randall which is a nice illustration of that mental bias we have
I thought I would do a little investigating to see just how random xkcd is.
Making consistently random numbers (yes, that sounds weird) is really important in things like cryptography. Unrandom random numbers can cripple an otherwise secure network. So there’s a surprisingly large amount of work dedicated to randomness.
There are services like random.org which pride
themselves on randomness, and HotBits,
which lets you order random bytes that are generated from radioactive decay. A
lot of applications use /dev/urandom/
, which is an OS-level random generator
that uses all sorts of sources of entropy such as network noise, CPU heat, and
the current weather in Kansas.
Unfortunately, it’s really hard to tell whether numbers are random or not. Of course, patterns can creep into random numbers. But more annoyingly, a glaringly obvious pattern might just be accidental. My favorite example of this is the Feynman Point, which is a series of lots of 9s that appears somewhere in the (very unpredictable) decimal expansion of pi.
There are a bunch of established ways to test the randomness of a random number generator (such as the excitingly-named Diehard tests). They all test for features that ostensibly random data should have. For example, a random stream of bits should have almost as many ones as zeros. Not all tests are that obvious, though, and statistics can be very slippery and unintuitive when it feels like it.
NIST (the National Institute of Standards and Technology, who deal with things like how long an inch is and how to backdoor elliptic curves) publishes a standard for randomness based on such tests, and distributes software that runs these tests on datasets.
I wrote a Python program to download 10,000 xkcd-random numbers (yay
requests
!), and converted them into bitstrings. Then, I fed them to the NIST
Statistical Test Suite.
The results are below:
------------------------------------------------------------------------------
RESULTS FOR THE UNIFORMITY OF P-VALUES AND THE PROPORTION OF PASSING SEQUENCES
------------------------------------------------------------------------------
generator is <data/data.xkcd.long>
------------------------------------------------------------------------------
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 P-VALUE PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
5 8 7 13 9 8 13 12 15 10 0.437274 100/100 Frequency
10 10 11 9 10 16 10 6 8 10 0.759756 99/100 BlockFrequency
5 12 13 10 7 10 10 10 9 14 0.699313 100/100 CumulativeSums
8 3 13 9 9 14 13 8 12 11 0.366918 98/100 CumulativeSums
8 12 11 3 9 8 17 12 9 11 0.224821 100/100 Runs
7 8 8 6 15 9 12 9 15 11 0.437274 99/100 LongestRun
7 8 7 16 0 25 0 25 0 12 0.000000 * 100/100 FFT
3 10 4 19 15 0 18 6 10 15 0.000009 * 100/100 Serial
9 14 10 2 14 8 6 10 16 11 0.080519 100/100 Serial
16 1 5 9 6 0 6 0 10 47 0.000000 * 93/100 * LinearComplexity
The important column here is “Proportion”, which shows the pass rate. They’re all stellar.
If that isn’t convincing, I ran an obviously nonrandom sample for comparison. This is what NIST’s STS thinks of the first 100,000 bits of Project Gutenberg’s plaintext version of Romeo and Juliet:
------------------------------------------------------------------------------
RESULTS FOR THE UNIFORMITY OF P-VALUES AND THE PROPORTION OF PASSING SEQUENCES
------------------------------------------------------------------------------
generator is <data/data.rnj>
------------------------------------------------------------------------------
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 P-VALUE PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
95 3 0 1 0 0 1 0 0 0 0.000000 * 25/100 * Frequency
55 14 10 6 6 3 1 1 3 1 0.000000 * 64/100 * BlockFrequency
94 3 1 0 0 1 1 0 0 0 0.000000 * 28/100 * CumulativeSums
93 4 1 1 0 0 0 0 1 0 0.000000 * 30/100 * CumulativeSums
51 7 10 10 4 4 2 5 2 5 0.000000 * 61/100 * Runs
90 8 1 1 0 0 0 0 0 0 0.000000 * 44/100 * LongestRun
92 2 2 2 0 1 0 1 0 0 0.000000 * 23/100 * FFT
100 0 0 0 0 0 0 0 0 0 0.000000 * 0/100 * Serial
100 0 0 0 0 0 0 0 0 0 0.000000 * 0/100 * Serial
14 2 2 7 11 0 4 0 11 49 0.000000 * 95/100 * LinearComplexity
Much worse.
I encourage you to play with the STS code. It lets you do all sorts of other neat things, like testing bitstrings for common “templates” and reporting if too many are found. It also segfaults all over the place, which is actually very disturbing considering that it’s technically part of the US government’s computer security project.
In any case, we’ve established that xkcd’s random generator is reasonably unpredictable and unbiased. As it happens, they’re using the Mersenne Twister, which is a well-established pseudorandom generation algorithm.
So why does the random number generation appear so biased when we’re idly refreshing on lazy Sunday nights? Part of it is, of course, human nature. We like to see patterns everywhere.
But here’s a more concrete, mathematical explanation. The conceptual idea is that in the beginning, hitting “random” is likelier to hit an unread comic, but once you’ve seen more and more of them, you get repeats. Let’s try to quantify this: we’re going to calculate the expected value of the number of times you need to hit “random” until you have seen every single comic. You may have seen this problem in the context of “how many times do you need to roll a die until you have rolled all six faces at least once?”.
Expected value is the average value of some random variable if you do an experiment lots of times. For example, if you roll a die gazillions of time, the average number you’ll get is ($ (1+2+3+4+5+6)/6 = 3.5 $), so that’s the expected value.
We’re going to calculate the expected number of times you hit “random” by calculating the number of times you need to hit it to get the first, second, third, and (in general) nth unique comic. Then, because of a useful property of expected values, we can just add them together until ($ n = 1500 $) (there are 1500 comics published as of right now) to see how long, as of today, this process would take.
If you’re looking for your ($ n $)th unread comic, each time you hit “random” you have a ($ 1 - n/1500 $) chance of getting a fresh one. This is a geometric probability distribution, which is Math for “you keep trying something with a constant probability until it succeeds”. For geometric probability distributions, the expected value is one over the probability (though I’m not going to prove it here, this intuitively makes sense: you would expect to have to roll a die around 6 times until you get your first 1, or to flip a coin twice until you get your first heads).
Anyhow, for the nth comic, the expected number of clicks is ($ 1500/(1500-n) $). Adding these up for each ($ n $), we have this monstrosity:
\[ \sum_{n=1}^{1500} \frac{1500}{n} = \frac{1500}{1500} + \frac{1500}{1499} + \dots + \frac{1500}{2} + \frac{1500}{1} \]
This works out to, on average, 11836 clicks. That’s a lot of clicks.
As common sense dictates, the more times you have clicked “random”, the less likely it is for you to hit a new comic. And that’s why Randall’s random button seems biased.
One more bit of statistics: if you’ve taken a probability class, you might have heard of the birthday problem. That is, say you have a party with ($ n $) people. What is the probability that some pair of people at the party share a birthday?
It turns out that if you have just 23 people, the probability is already 50-50. This is somewhat counterintuitive; most birthday parties only have one birthday boy! The fallacy is that the problem isn’t asking if some particular person shares a birthday with someone else. It’s asking if any two people share a birthday.
The birthday “paradox” turns out to be important in cryptography, especially when looking for hash collisions. The number of hashes you need to generate before you hit a collision is similar to the number of people you need at a party before some pair shares a birthday—much smaller than what you would expect.
In terms of xkcd-surfing, this helps answer the question “how many times will I hit random before I see a repeat?”.
There are plenty of good explanations for the math behind the birthday problem online (Wolfram Mathworld and Wikipedia)—but if you don’t believe the number 23 quoted above, it’s worth spending some time trying to solve it yourself just to understand what’s really going on (it’s not hard). I’m just going to dump the formula here without any explanation.
For 1500 comics, the probability that you get a repeat after ($ k $) clicks is:
\[ 1 - \frac{1500!}{(1500-k)!1500^k} \]
Throwing this at WolframAlpha, we see that after only 45 clicks, you have a 50-50 chance of seeing a duplicate comic. Put a different way, there are even odds that the last 45 comics you have seen contain a duplicate pair somewhere in there.
So we’ve empirically validated that xkcd’s RNG is as close as we can expect for something statistically random. We’ve also seen two reasons why it feels biased.
But on a deeper and much more important level, we’ve seen how counterintuitive and messy the random-number business is, and how statistical facts can trick us into seeing patterns that aren’t there.
P.S. My methodology for these experiments probably not the best, since I have no formal statistics background. If you want to check out the code used or a dump of my dataset, leave a comment below and I’ll send it to you.
]]>As I look back on programs I’ve written, I almost always notice that the most successful are the hacks. The ones whose source code is spaghetti, the ones which consist of patches upon patches, and not the ones where I painstakingly mapped out a beautiful API.
I speak of such monstrosities as alchemy
, an IRC bot to play Little
Alchemy collaboratively in IRC. alchemy.py
is
almost fully contained in a single while loop, and important game functions are
nested so deep that they must be indented more than 80 characters to be valid
Python. alchemy
might be one of my most successful software projects if
measured by total amount of happiness brought to users.
Speaking of IRC bots, the jokebot
skeleton code introduced in a previous post
is probably the most-forked of my projects, giving rise to 3 derivative bots
within a week of its first commit. jokebot
was written with no heed for
efficiency, and so the unpatched jokebot
was a big fat memory hog.
I also speak of bouncy
, a screensaver I wrote over the course of a couple of
hours for no good reason, which has been adorning my mom’s Mac every 10 minutes
for the past two years. I have no idea what bouncy
‘s source looks like, but
I do recall abusing nested NSArray
s because Xcode didn’t let me create a new
class.
I speak of Snapin8r, which began as a Pythonic mess, was then ported to JavaScript line-by-line, and then heavily patched as bug reports rolled in. Most things are handled by one giant if/elif chain in a for loop. Again, Snapin8r is used by a disproportionately large number of people considering how shaky the entire thing is.
And I speak of miscellaneous hacked-together Scratch projects that have, over the years, accumulated dozens of “remixes” as people ran into them and used their code for amazing derivative works.
I do not speak of things like nearley
, which despite hundreds of commits by
several authors, has only one legitimate dependency on npm
. Or, for that
matter, any of my other well-organized projects on Github.
Perhaps there’s a reason for this. As logical people, programmers instinctively build cathedrals, not bazaars. “If we get it right the first time,” we say, “we’ll have a good foundation and won’t have any showstopper bugs.” And then we spend a few weeks building up these “foundations” and then release monoliths.
But, as I’ve observed, that’s not how effective software gets created. The best software gets used as it’s being developed. The way it’s used affects its development, so that instead of just a theoretical curiosity designed to be perfect and modular, the program solves real-world problems from day 1. The best software begins as a funny IRC bot or hackey Bash script.
And so the lesson I want to teach myself by posting this post is that the most important thing isn’t to pick the perfect APIs, tools, libraries, frameworks, algorithms, and languages. The most important thing is to just start writing code.
This is really hard.
Even selecting a language for a new project is hard. I want my project to be fast, but C is too low-level. Scheme would probably be the best choice algorithmically. Should I use Racket or Chicken? Racket seems more documented, but it doesn’t compile to native code cleanly. Chicken has yucky documentation, and I don’t like Chicken’s package manager. Maybe I should do it in Python? But I hate Python. Hang on, I’ve been meaning to learn Haskell for a while, now. Wait, no, I/O in Haskell is hard, and I’ll need a good POSIX interface for this project.
Speaking of which, how should I accept input? Stdin? File name? In what format? JSON? YAML? I don’t know how to parse JSON in Scala yet. Actually, maybe my program would be better as a service hosted over HTTP?
How about command-line arguments? Does Rust have a good option parser? Will I have to write it myself? I probably won’t be able to write an option parser that’s as efficient as industry-standards. Parsing is hard. Maybe I should use JavaScript; I already know how to use nomnom.
But if it’s in JavaScript, I can bypass all this and create it as a sleek, intuitive, and beautiful GUI in the browser. Then again, I want my project to be fast…
I think I’ve made my point: you can spend as long as you wish trying to choose the right tool. The truth, of course, is that there is no “right tool”. In fact, whatever tool you pick, you’re going to hit a limitation at some point. It’s reality. So instead of spending ages choosing the right tool, just pick the one that feels right and get some code written. Imperfect code is always more useful than no code. You should think about interfacing and efficiency second.
Ok, fine, I’m going to regret saying that. There’s a very good argument to be made for thinking about the architecture of a project before starting on it. It leads to clean, maintainable code. It minimizes the amount of refactoring drudgery.
But I think all those things are secondary to actually having a working prototype. Yes, you’re going to run into painful distribution issues—it’s happened to me, and it was annoying enough to make me give up on the project. But the open-source world came to the rescue, and someone forked and continued it. What matters is that I put my idea into code and gave it to the world.
So my promise to myself—as of publishing this post—is to spend less time debating tools and more time using them. This means forcing myself to consciously use imperfect tools, which might cause mild internal bleeding. Let’s see how long I survive.
]]>If you’ve been following my blog, you know that I’m not a fan of competitions. One of the few exceptions to this is CTFs. CTFs, or Capture-The-Flag competitions, are computer security challenges where the organizers host a few vulnerable servers and you need to “hack” them to find secret keys. CTFs are, to be very honest, a lot of fun.
But I’m not here to advertise CTFs—I’ve already done that in another post—I’m here to talk about how CTFs should be replacing Hackathons.
Hackathons are great. You hang out all night long, eat junk food, listen to music, and get all sorts of swag from the riches companies in Silicon Valley. You get to meet successful people who have their own start-ups.
But that’s all.
Respectable hackathons advertise themselves as educational: if you know how to code, you get a chunk of time to work on a project and learn a new skill. If you don’t know how to code, we’ll teach you.
The truth is, neither actually works out in the real world.
I know of exactly zero pieces of hackathon-code that the authors are legitimately proud of. “Authors are legitimately proud of” differs strongly from “authors are capable of making an exciting, buzzword-heavy pitch about,” the latter generally being the sole criterion in determining winners. Hackathon veterans largely agree that hackathon-code is rarely touched after the demo, because it’s usually written as just enough to create a demo. You aren’t supposed to pay heed to petty things like maintainability—you’re hackers! And so your efforts over 24 sleep-deprived zombie hours all go to /dev/null.
At the other extreme are the brave souls who walk into a hackathon cold: no programming experience whatsoever. From what I can tell, they’re generally persuaded to come by their nerdfriends, with the prospect of free food. In any case, I have yet to meet someone who actually fell in love with programming at a hackathon. Yes, the rich Silicon Valley startup culture is appetizing, but you can’t just show up at a hackathon and found a company. You need to put in the effort to learn the basics and fall in love with programming, and that is not something you can do in 24 continuous hours. You can’t put on a concert in 24 hours having never touched an instrument.
So, what are hackathons good for? They’re great for networking: making friends, and even getting internships or jobs. They’re great for getting T-shirts, playing games, and having awesome conversations. They’re great for hearing talks that make you feel good about yourself for being a “hacker”, “innovator”, “creator”, “developer”, etc. They are, in short, what the humble party has evolved into to suit Silicon Valley.
Enter CTFs. CTFs are hardcore. To do well in one, you need to be really clever. You need oodles of esoteric knowledge about all sorts of computer science. You need to be good at math, algorithms, general-purpose problem solving, and of course programming. A single CTF problem can teach you far more than an entire hackathon can ever hope to.
(You do not, by the way, need to be good at making a pitch in front of a panel of judges.)
I posit that a CTF is also as social as a hackathon: perhaps a majority of my best friends are folks I’ve met CTFing, either because they were my teammates, or because I ran into them on a CTF’s IRC channel. The conversations you have with your teammates while trying to solve a CTF problem are fantastic: you share large wealths of information, refute or build upon each others’ ideas, and ask hard questions, and do all of that while making puns and inside jokes.
As far as I’m concerned, a CTF is much more valuable than a hackathon for a casual programmer—mildly experienced script kiddies and ninja rockstars alike.
But more importantly, I can envision CTFs for beginners, too. Solving the problems in an introductory CS textbook is far more appealing when put in the context of hacking into someone’s server. I’m sure an elegantly crafted sequence of problems can Socratically teach someone enough basic Python to, for instance, compute large Fibonacci numbers recursively—for the right people, that should be enough to make them fall in love with computer science.
CTFs have a lot of potential: it’s a shame we’re instead promoting hackathon-parties as “CS education events”.
P.S. It would be an even bigger shame if this post comes true in the wrong way, and CTFs go the same route as competition math. As much as I love the idea, some part of me wants to keep the CTF world a secret so that the people who ruined competition math can’t get their hands on CTFs as well. I hope that’s not selfish—but I think that’s a topic for a whole new post.
]]>It is a well-known fact among wizarding circles that the Gringotts Wizarding Bank has an infinite number of vaults, numbered from 1 onwards. Equally well-known is the fact that in the name of what goblins call efficiency and everyone else calls parsimony, each of the vaults is currently occupied.
As a result, there has formed a situation which Muggle economists describe as “scarcity”. Gringotts vaults, like heirloom-quality furniture, are prized in families. They only change hands at readings of wills of dead great-aunts.
With that in mind, it was, of course, reasonable for Mr. Hill Bertok to scoff at the old man who demanded of him a vault one chilly February morning.
Mr. Hill Bertok was a run-of-the-mill businessgoblin and teller at the Gringotts Wizarding Bank. The old man was Professor Albus Dumbledore, headmaster of the Hogwarts School of Witchcraft and Wizardry.
Mr. Bertok did not, of course, notice this at first. Had he realized who the old man was, he would perhaps have shown a little more respect before scoffing. The scoff, however, had been scoffed already, and a scoff once scoffed cannot be recalled.
Having conceded this, Mr. Bertok attempted mildly apologetic gesture with his eyebrows and presently returned to his paperwork. When he looked up a few minutes later, the old man was still there.
“Dumbledore,” he said, “as a goblin I hold you in much higher regard than most wizards. But I still think you’re crackers.”
“Very well, Mr. Bertok, but many millions of lives depend on the security of the object I have with me. Perhaps you can make an exception?”
“Exception! What object could possibly require so much security?”
“It’s a rock, Mr. Bertok. A very important rock.”
“…you are crazy, Dumbledore. I’ll tell you what: if you can find an empty vault, you can have it.”
“Mr. Bertok, I believe I have this problem sorted out already. If you’ll lead me down, perhaps I can demonstrate.”
“Crazy, completely and utterly bonkers,” thought Mr. Hill Bertok as he led the old man through the labyrinthine passages of Gringotts.
“713 sounds like a nice round number,” said Dumbledore, “Let’s stop here. Do I have your permission to use this vault? It’s a harmless procedure, really.”
“You are out of your mind.”
“Need I remind you that I am the Chief Warlock of the Wizengamot?”
“You need not,” seethed a very reluctant goblin, adding (under his breath) “Albus Dumbledore pulling rank—what is the world coming to?”
“Very well then. Presto-incremento!“ For a sliver of a second, Vault 713 of the Gringotts Bank glowed a pale electric blue. Then, with a gentle rumble, the door swung open, revealing absolutely nothing whatsoever.
“You asinine old man! That vault contained more galleons than your demented old brain can count!”
“Fear not, Mr. Bertok, you will find all of your galleons. They have simply been transported to Vault 714.”
“Impossible! Vault 714 contains statues of gold; you cannot fit a mountain of galleons in there!”
“Patience. The statues of Vault 714 have been moved to Vault 715. And—before you get started again—the contents of Vault 715 have been moved to Vault 716.”
“Excuse me?”
“Each vault’s contents have been moved into the next vault. Ad infinitum. That left Vault 713 open for me.”
“Wait, but… how can… our clients! How do you propose to notify them of the change?”
“I had the foresight to dispatch some Ministry owls ahead of my visit.”
“You cannot send owls to an infinite number of clients, Dumbledore.”
“It may interest you to know, Mr. Bertok, that the Ministry happens to have a fleet consisting of an infinite number of owls. They’ve been quite useful: just last week we used them to—ahem—persuade a Muggle family to send their nephew to Hogwarts.”
And with those words, Professor Albus Dumbledore dropped a small grubby bag inside the vault and closed it.
The news spread faster than the plague. The next morning, the tellers of Gringotts Bank were greeted at the front desk by Stan Stunpike. Unfortunately, they were also greeted by an infinite number of passengers on his infinitely long bus. These were restless passengers who wanted their magically-created Gringotts vaults and would not leave without them.
Owing to the lack of space and general stuffiness that was developing in the Gringotts lobby—and the inevitable threat being posed to his employment—Mr. Bertok decided to do something about it. After the first 50 new clients, though, he began coming to the realization that making all of these accounts would take an infinite amount of time.
He didn’t have an infinite amount of patience, but he knew someone who did. He sent an owl to Dumbledore.
“Hush, calm down, everybody.” The lobby was beginning to smell like the inside of a used coffin, so Dumbledore decided, for the first time in his life, to quit the whole pedagogical spiel and solve the problem.
“Presto-doublinato!“ he cried, and the lobby began to rumble as an infinite number of doors began to open.
“Now what have you done?”
“Vault 1’s contents have been moved to Vault 2. Vault 2’s contents have been moved to Vault 4. Vault 3’s contents have been moved to Vault 6. In general, each vault has been moved to the vault with double the number.”
“How does that help?”
“All the odd numbered vaults are empty. Since each of your angry bus passengers has a seat number, you assign them a vault based on those. The person in the first seat is assigned the first odd number (1), the person in the second seat is assigned the second odd number (3), and the person in the hundredth seat is assigned the hundredth odd number (199).”
Mr. Hill Bertok briefly considered hiring Dumbledore in the HR department.
Wizards, it has been found, have friends. As a result, after the miracles of the second day, news spread like proverbial wildfire and the parking lot behind Gringotts was quickly filled with an infinite number of infinitely long buses.
Mr. Bertok knew what to do this time, and before you could yell “cardinality”, Dumbledore had apparated to the lobby, holding a big bucket of black paint.
Without a word, he walked out to the parking lot and began painting large numbers on the buses’ windows.
“Hey, Mister, what do you think you’re doing?” cried an understandably distressed bus-driver. He was quickly and efficiently turned into a frog by Gringotts’ security team, and Dumbledore continued with his painting, undeterred.
After a few tense minutes, the parking lot resembled this (pardon my badly-illustrated row of buses):
+----+----+----+----+----
< 01 | 02 | 04 | 07 | ...
+----+----+----+----+----
< 03 | 05 | 08 | ...
+----+----+----+----
< 06 | 09 | ...
+----+----+----
< 10 | ...
+----+----
|... |
“As you can see,” he began (the outdoor parking lot was airy enough for his pedagogical side to shine), “I’ve numbered each bus window diagonally. If I keep this up, each window will get a number.
“Since I’ve already shown you how to deal with that with the use-all-odd-numbers trick, I think I’ll take my leave now. It’s almost lunchtime.”
He disappeared with a pop, leaving one frog and an infinite number of very confused wizards.
Like all good things, this one had to come to an end. The end came when an inhabitant of a Hogwarts painting overheard a conversation about Gringotts’ new policy. And so, the next morning, the portrait of Gringott the Goblin was inhabited by an infinite number of painting-people. In front of them was a dashing, athletic-looking young gentleman. A caption floated above his head. It read “C. Antor. Tennis player.”
“Greetings, Mr. Bertok. I’m Charles Antor, representing the paintings.”
“Welcome to Gringotts, Mr. Antor.”
“Well, we came to ask: can you work your magic and give us vaults, too?”
“What does a painting need a vault for?”
“If you’re going to denigrate us, we’ll take our business elsewhere.”
Mr. Bertok hesitated. “Alright, we will need you all to line up so that we can number you.”
“I’m afraid we can’t do that.”
“What?”
“We aren’t numbered with counting numbers like 1, 2, 3… We’re numbered with real numbers. Decimals and fractions. Each of us identifies himself or herself with a number between zero and one.”
“So? Why can’t you all stand in order?”
“Well, suppose we did order ourselves in a line, and suppose you assigned us all vaults.”
“Alright, then what?”
“What is the number of the painting at the front of the queue?”
“What? How can that possibly matter?”
“Please bear with me. Just invent a number between zero and one. It doesn’t matter which one.”
“If you say so. The painting at the front of the queue is painting 0.234567.”
“Well, the first digit of my number is a 1, not a 2 like his. So can we agree that I’m not at the front of the queue?”
“I suppose so.”
“Good. Now, who’s second?”
“Should I make up another number?”
“Sure.”
“0.1111111.”
“Well, my number’s second digit is a 9, not a 1. So I’m not second in line either.
“I don’t see where you’re going with this.”
“Mr. Bertok, no matter how you number us paintings, there will always be someone whose first digit is different from the first person’s first digit. And whose second digit is different from the second person’s second digit. And whose third digit is different form the third person’s third digit. And so on, and so forth.
In other words, he cannot have any position in your queue, and so he can’t be in the queue.”
Mr. Bertok scratched his head. “But if there’s no way to number you folks, there’s no way to assign you vaults.”
“Exactly. In a weird way, there are ‘more’ of us than there are of wizards in Stan’s bus. Even though both numbers are ‘infinity’.”
“Gosh, who know paintings could be so complicated? You aren’t even real.”
“On the contrary, Mr. Bertok, we are as real as can be.”
Epilogue. Mr. Hill Bertok, now inspired, went on to study infinities. After an interesting encounter with two old wizards named Banach and Tarski, he discovered a means of mining an infinite amount of gold. Surprisingly enough, he ended up living happily forever after.
Charles Antor went on to become a tennis star among the paintings. Unfortunately, he met his match when a certain B. Russell proved to the referee that none of his sets could exist.
The events of this story thoroughly confused Stan Stunpike, who decided to take an early retirement from bus-driving and instead perhaps go herd a finite number of goats in Mongolia. Fortunately, a pair of psychiatrists, Calkin and Wilf, convinced him that he can fit all the rational people in the world onto his bus. Though he tried hard, he never really got the hang of it, and one day his bus was found—destroyed—with an infinitely large tree growing out of the windshield.
This post was inspired by something my computer science teacher said. I forgot what exactly it was that he said. Most ideas were shamelessly stolen from a chapter in Ian Stewart’s delightful book, Professor Stewart’s Cabinet of Mathematical Curiosities.
I wrote this post to inaugurate a friend’s blog a while back and decided to preserve it here. Though dated February 2015, I added it to ComfortablyNumbered in early October.
Communication. It’s what separates a painter from an artist and a performer from a musician. It turns a mob into an army and a fight into a debate. It’s what separates coexistence and civilization.
In the world of computer science, we have a fictitious creation called the Nondeterministic Computer. The Nondeterministic Computer, given a problem, tests every possible solution of the problem instantanteously, and reports the correct answers. The Nondeterministic Computer, were it to be realized, would revolutionize computer science, data science, protein folding research, and of course cryptography.
Guess what? You have the functional equivalent of a Nondeterministic Computer right now. You have, to put it lightly, millions of brilliant minds at your disposal. Combined, you have millions of years of experience, instinct, opinion, and innovation at your command.
Because just as wonderful as Nondeterministic Computing is Nondeterministic Communication. Of perhaps the 2,000 people that will read this post in the next couple of months, some will agree. Some will disagree. Some will be affected by what I say, and some will make it a mission to prove me wrong. The vast majority will ignore it, spending less than 5 seconds on the page, and only skimming a few words.
For all practical purposes, I am running my thoughts through a vast supercomputer and getting a decent representation of humanity’s views on them. I can do this anonymously, and I can do this for free.
In the scientific and academic world, communication happens through papers. Progress happens when Darwin must publish his research before Wallace, when Einstein refutes Newton, when Watson and Crick race Franklin’s lab, when Shamir writes a paper breaking a cryptosystem Merkle and Hellman thought was secure.
In the tech world, progress happens when someone—a high schooler, an employee at a startup, or the creator of Linux—leaves a vitriolic comeback on a blog post.
Blog posts are what truly reflect us: our opinions, our rants, our tutorials, and our reviews document, piece by piece, the world we have created. And the comments document what we think of it.
In other words: you are responsible for the canon in this world. For perpetuating knowledge. For inciting discussion. For starting arguments. For causing change.
So write! Write controversial things! Express unpopular opinions, and do so vehemently! Hate on something everyone adores! Use strongly-worded phrases. Use exclamation points. Make noise, be mean. Get harsh feedback, it’s what you want.
Be wrong once in a while. Say stuff you’ll cringe at in a year (because, to be honest, you’ll cringe at everything you wrote a year back). Do what it takes to put your opinions out there, because they matter. As a culture, we’re fallible, and someone needs to call us out on it.
Your words are elegant weapons; use them to create a more civilized age.
And don’t be afraid to put your thoughts through a nondeterminisic computer just because it’ll reject 99.99% of them. That’s what nondeterministic computers do.
]]>This post first appeared on Pdgn.
(Speaking of which, I hope they write more on their seemingly-dead blog. They have good things to say. Everyone does.)
A lot of script kiddies, myself included, take a lot of pride in loathing statically typed languages and being “purists”. But I’ve been doing some reading about static typing (after realizing that many of the hackers I respect are type-safety-fanatics) and I’ve realized that a lot of the reasons that we have for hating type-safe languages aren’t truly valid.
Not that I’m converted. I’m still a firm believer that types belong to objects, not to variables. But for the benefit of people who are still making a choice, here are some of my misconceptions about type safety.
This post isn’t meant to persuade you one way or the other, because if you’re going to join the Dark Side, it’s probably too late already.
This probably stems from the huge popularity of Java, which has linked static typing to aggressively object-oriented imperative programming. When I took AP Computer Science, the distinction between Types and Objects was not made clearly enough.
You can have OOP without static typing. There’s Pythonic duck typing:
In other words, don’t check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.
(Alex Martelli, source)
There’s Self’s prototypical inheritance, where all you have are objects (which are pretty close to JavaScript Objects and Lua Tables when it comes to implementing an OOPey system). There’s even Scheme OOP, with impure dispatch functions as described in SICP.
But you can also have type-safe functional languages without an OOP framework
around it. The best examples of these are Haskell and ML, though I’m sure there
exist others. Haskell is as functional as it gets (arguable more so than
Scheme, because it absolutely prohibits side-effects,
i.e. doesn’t have set-car!
).
So you don’t need to give up functional code to embrace type safety.
Another one that I attribute to Java. Though it’s completely possible to be obnoxious about your types, for the most part it’s also completely possible to write reasonable-looking flat-is-better-than-nested code. Just like you can write C programs, for loops and all, in Scheme, and they’ll work, but that’s a blatant abuse of recursion.
Looking at lots of Haskell code, it’s pretty clear that the levels of abstraction you choose to implement are not directly correlated with the way you use or misuse the type system.
We’re generally told that aside from some some cases where runtime typechecks are needed (or C, where types determine memory usage), most of the time typechecking information is just discarded when you actually compile the code. In that sense, they seem to add little more value than well-placed, meaningful comments.
And while that’s an acceptable way to look at it, it’s certainly worth realizing that type theory is an established branch of computer science that comes dangerously close to math. It is nontrivial to come up with a type system that is “provably correct”, that is, a type system that is liberal enough to accept programs that get stuff done, but conservative enough to reject programs that do bad stuff (like access fields that don’t exist).
For example, consider functions that accept an Animal as input and return a
Truck as output. Is this a superclass of functions that accept Parrots as input
and return Vehicles as output? Or a subclass? How about classifying recursive
tuple types (such as LinkedList<T>
, which could potentially be of type
Tuple(T, LinkedList<T>)
)?
Even though type-safety is like a lint, it’s a very advanced, deep lint that occasionally catches subtle bugs. It’s a sanity check that prevents you from coding if it deems you insane.
…because even though type safety is the enemy, it’s important to realize why it’s so popular. In the world of CS, it’s important to Hoover up as many new ideas as you can, even if you don’t agree with all of them.
]]>If you know yourself but not the enemy, for every victory gained you will also suffer a defeat.
— Sun Tzu, The Art of War, III-18.
Scenario: I’ve written a simple Python command-line tool. It consists of more than one file, because separation of code is important. It only needs to run on UNIX systems and shouldn’t require sudo to install. How do I distribute this code in a safe, user-friendly way? How do I teach my intro CS class how to proudly share their creations with the world?
Answer: You don’t.
Distributing Python code should be a solved problem. It’s not. Or, rather, it’s a standards problem. There are too many tools out there, and there’s no clear roadmap that explains which subset does what. I’m talking about Pip, Virtualenv, PyPi, wheels, eggs, distutils, distribute, setuptools, easy_install, and twine. Some of these have been incorporated into others, and are therefore obsolete. Some of these are endorsed by a PEP. (What’s a PEP? Do I care?)
There is certainly not one obvious way to do it. Even if you’re Dutch.
Assuming you have picked out a subset of tools to use, though, there’s no
guarantee that they will cooperate. A Pip flag, for instance, might work. It
might not work. It might do stuff that you didn’t expect. It might create files
without telling you (correction: it’ll be noted cryptically in the Pip log). It
might pass it on to setup.py
, or it might not, or it might reject a valid
setup.py
argument because it doesn’t understand it, or it might pass it on to
Python erroneously.
My Pip log contains monstrosities such as:
(Enumeration of all the things here that make me cry is left as an exercise to the reader.)
Running command /usr/bin/python -c "import setuptools, tokenize;
__file__='/path/to/setup.py'; exec(compile(getattr(tokenize, 'open',
open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" develop
--no-deps --user
In short, it might eat your laundry depending on local atmospheric conditions, and then it’ll flip a coin to decide whether or not to tell you.
Let’s talk about publishing. PyPi is the “standard” repository. But it makes you host your own tarballs, so essentially all it does is match package names to URLs (all other information is duplicated in the tarball itself).
Uploading to PyPi happens over an insecure connection. They recognize this in the docs, and recommend installing a separate client that lets you publish over SSL.
They also keep telling users to use OpenID. The OpenID that Google is deprecating in a couple of months. Of course, trying to log in with Google throws an internal server error.
Apparently, the whole system is so confusing that every time you upload, you are encouraged to do a dry run on the test repository just to make sure you understand it.
Now, my Python program was not a module, but a command-line tool. The way you specify a command-line tool in Python looks like this:
# ...
entry_points={
'console_scripts': ['name_to_link = packagename.submodule:function_name'],
},
# ...
Here’s what setuptools does. It parses that monstrous string and extracts
information. It then creates a new Python script in a $PATH
‘d directory.
That script imports your script as a module, and then calls the function you
specify. It’s anyone’s guess whether important scriptey things such as
environment variables or command-line arguments are preserved.
This is all in complete ignorance of the whole if __name__ == '__main__'
infrastructure, for whatever reason. Of course, you can also make things
runnable with python -m packagename.submodule
by including a __main__
file.
Yum.
Even if it did the reasonable thing and symlinked your script directly, it would actually modify your file. That’s right. It replaces shebang lines with the path to the currently-running Python interpreter. Because Explicit Is Better Than Implicit (tm).
There is no reason to do this. UNIX provides a very helpful idiom: using
#!/usr/bin/env python arguments...
. That’ll search the $PATH for the
right interpreter and use it. No black magic needed, and you can provide flags
to the python interpreter if you so desire.
Only God—make that Guido—knows what’ll happen if you’re trying to install a
Python 3 script using a Python 2.7. Or want to distribute something runnable
with ipython
.
By the way, if you think installing is bad: you would think uninstalling
should be a standardized, well-thought-out, documented process, right? You
wish. There setup.py-installed programs have no fixed uninstallation procedure.
If you Google around, you are led to this highly-upvoted StackOverflow
answer, which suggests using an
option that lists all generated files, and piping the output to xargs rm
-rf
.
Let’s hope they don’t have a file with a space in it, or bad things are going to happen. Keep in mind that Python is recommended for newbies, the kinds who will gladly copy-and-paste shell commands from the Internet.
Here’s the deal. Flat is not better than nested. Nested is better than flat. There’s a reason we use parsers to convert strings to abstract syntax trees. There’s a reason LISP is easier to write than x86 assembly. There’s a reason for the existence of the term “dependency tree”. They are literally nested structures. Importing from a sibling directory should not be the harrowing experience it is in Python.
And that’s why I love npm. Not because it’s written in a beautiful language or
because it pretends to be easy-to-use, but because it’s simple and elegant and
it doesn’t do too much or too little. Modules are nested prettily in their
directories, and the only file npm touches is package.json
.
Scripts are simply linked into npm’s bin folder. Your “script” could be a Bash program, for all it cares.
Installing is one command, uninstalling is one command, publishing is one command, and everything is one tool operating on one repository.
npm is transparent. Python’s plethora of packaging plakavacs are anything but.
]]>Bower. RequireJS. JamJS. Browserify. JQuery, Underscore, Lodash, and did I mention JQuery? Angular, D3, Polymer, Flux, Ember.js, Backbone, React, Dojo, Mootools, Bootstrap, Foundation, Meteor, Socket.io, Aurelia, Express.
What do all of these have in common?
For one, the fact that even though I’ve been using JavaScript for several years, having written thousands of lines of code used by hundreds of people, I have never used any of them. I don’t know any of their APIs. To date, every website I’ve written was hand-coded, starting from blank HTML file. And a lot of people frown upon that.
“Computers are all about automation,” you say, “There’s no good reason to impose the drudgery of boilerplate upon yourself. You’ll end up writing lots of duplicate code.”
Which is partly true. What you fail to mention is that I’d be writing plenty of duplicate code anyway. Consider, for instance, the XMLHttpRequest API. Each reimplementation of XHR is basically the same thing, with slightly different method names or argument conventions. As a developer, I would rather know the Real API—I can use it anywhere I want, and I have unrestricted access to the entire API (so I’m not at the mercy of someone who doesn’t think PUT requests are worth implementing).
Each new “web technology” has its own Wiki, API doc and “getting started” page which you need to somehow absorb ideas from. They have their own strange installation rituals, their own vocabulary, their own “best practices”, and their own encyclopedia of StackOverflow answers that you must read if you have any hope of getting stuff done.
Worse, though: they try to influence how you design.
All these “platforms” advertise themselves as “frameworks”. They force you to structure a project so that it conforms with the architecture that they want. They’re monoliths, and they don’t like cooperating with other monoliths.
That’s a horrible way to do web design. If PHP hammers have a claw on each end, then JS hammers are actually disguised combined harvesters that accept callbacks.
The first part of the problem is pedagogical in nature. To someone who is just beginning to learn web development, being introduced to a monstrous framework can lead to all sorts of misconceptions. jQuery is not a programming language. JavaScript can change the color of text on its own.
Students begin to learn from a higher level of abstraction than is necessary. Filling in the blanks by copy-pasting lots of boilerplate code is not computer science. You should know why you’re using a tool. Experience should come before abstraction.
The other part of the problem is more practical.
These architectures look shiny in contrived demo situations (name a modern-day language that does not boast of a beautiful “hello-world” scenario), but in the real world, their abstractions almost immediately begin leaking. You end up writing patches to tide over important features marked “TODO” on Github. You end up writing glue code, which is far worse than “the drudgery of boilerplate”.
Which isn’t to say you shouldn’t be using—or writing—JavaScript libraries. But you should be writing small, self-contained modules that provide a clean interface that is optimized for communicating with other programs. You should be using conventional vocabulary and idioms everywhere (even if those idioms smell like the dead fish that is JavaScript semantics). Don’t build a cathedral if you can get things done with a stall at the bazaar.
Your programs should do one thing, and do it well. It could be a small thing or a big thing. It could be a color picker widget or a library to encode PDF files. It doesn’t matter. It should be self-contained and present itself as a tool. Programmers should control code; code shouldn’t control programmers.
How?
Avoid side effects. The vast majority of your functions should take inputs and return an output. Things that change state should be limited as much as possible, and should always do so because the end-user explicitly mandated it. The moment you start messing about with global prototypes and settings, or pushing to arrays you didn’t create, you’re going to end up taking over the entire application.
Namespace. Your library should expose one name. One. window.something =
{}
. That’s it. Everything you expose should be a property of the
window.something
object. (No, window.something
cannot be a function, that’s
cute but annoying in practice.) npm
enforces this, and if you’re clever, you
can write code that’s a valid npm
module and browser-worthy module. Read
about
IIFE if
you’re confused.
Don’t force callbacks. If it doesn’t actually do anything asynchronous, don’t add a callback. Just return the answer. That’s what the return keyword is for.
Put thought into your argument convention. If your functions are generally monadic or dyadic, just accept positional arguments. Yes, it’s ok to write code like:
function (a, b) {
doSomethingWith(a || "default", b || "default");
}
and yes, it’s ok to have users call a function with function(null, "cow")
to
default a positional argument that isn’t the last.
Avoid the arguments
keyword like the plague. If you need variadicity, accept
an array as an argument. Variadicity causes confusion. And people who get
addicted end up writing functions that do different things depending on how
many arguments were passed. Scary, scary, scary.
It’s also perfectly ok to accept just one argument, an object, if you need a dozen keyword arguments.
Be quiet. When your library runs in production, nobody should notice it. No console messages or “warnings”, no twiddling with the DOM to include a little banner.
Documentation is not advertisement. Once I’ve committed to using your library, you don’t need to continue explaining how it’s a revolution in generative fluid modern flat reactive magical material-inspired skeuomorphic silky-smooth user interface. Just tell us how to use the primitives. If there are concepts to be learned before using your library, explain them outside the API reference.
If you’re strictly on nodejs, prefer the provided Streams to whatever homegrown thing you’re inventing right now. Streams are tempting to reinvent-the-wheel, because they’re a pretty idea which isn’t terribly difficult to implement with your own shiny interface. Don’t do it. This extends to other things, too: XHR wrappers, querySelector reimplementations, and event-emitting architectures are just a few of the major offenders.
Document bugs. No, I won’t lose faith in your module if there’s a one-in-a-million corner case as long as it’s documented. I will lose faith if there’s a one-in-a-zillion corner case whose only documentation is a comment saying “ill fix thiss l8r”. Similarly, don’t introduce undocumented features. At the very least, say “this feature is experimental and SHOULD NOT be used in production”.
Don’t force dot-chaining. Yes, it’s useful in some places. Dot-chaining is
great for transforming values in a sequence. It makes no sense when there are
sequential actions that have side-effects. JavaScript has built-in support for
that kind of thing: it’s called the semicolon (;
).
Bad:
element
.turnRed()
.then()
.ifYouFeelLikeIt()
.and()
.theStarsAlign()
.changeSize()
.do()
Good:
element.turnRed();
if (element.youFeelLikeIt && element.theStarsAlign) {
element.changeSize();
}
Good:
data
.map(transformation)
.filter(selection)
.reduce(action)
The problem with dot-chaining is that it locks a lot of important functionality behind weird objects.
Stop creating boilerplate creators. If you’re embarking on a new million-dollar company, you want to start from scratch and do things right, making sure you understand everything on your website. If you’re writing a toy demo for a 24 hour hackathon, you don’t need a full-featured MVC framework. You need to learn CSS.
Corollary: throwing Bootstrap at a problem doesn’t fix it. Bootstrap causes slow, janky websites. There was a time when it was shiny. Now it’s dull and mundane. Put some effort into making your website look like “yours”. And remember, there’s absolutely nothing wrong with the default OS-provided buttons that designers at Apple have spent ages perfecting. For that matter, I feel there’s nothing wrong with Times New Roman. It’s just that everyone feels the need to show that they know how to change the font face.
Don’t write generic CSS. Are you creating a set of pretty text box widgets?
Don’t touch anything that doesn’t have class pretty-text-box
. Things that go
around setting global properties (even ones that should be set, like
box-sizing
) are evil. Don’t be that guy.
Write generic CSS. I might want to change the width of your syntax-highlighting gutter. Don’t make me edit your source code for that to happen.
Don’t build plugin infrastructures. Your code should be organized enough for people to write helpers themselves. You shouldn’t need to provide methods to “register” an “extension”. It almost always implies you’re building a cathedral.
The TL;DR version of this is that your code should be designed to cooperate with others. People don’t want frameworks to lock up all their hopes and dreams in. They want small, useful tools so that they don’t have to think about too many details.
UNIX wouldn’t work if it wasn’t made of hundreds of awesome tiny programs. The Internet should take a hint from that.
]]>I wrote this post a long time ago. I had become obsessed with the Mandelbrot Set after reading Professor Stewart’s Cabinet of Mathematical Curiosities, and had spent the better part of a weekend scouring the Internet for information on how to plot it. That is, information I could understand at that age. Watching the correct Mandelbrot Set appear line-by-line over the course of three hours on my mom’s old Mac was one of the more exhilerating computer-science experiences I have had.
The post first appeared on the Scratch forums on April 20, 2011 along with its accompanying Scratch implementation. In the interests of documentation and preservation, I decided to post a copy of this on my blog. Despite many temptations to change things—grammar, spelling, wording, and even some technical details—the text is identical to that posted on the forums. The purpose of posting this is not to convey the actual content to an audience, but to remind myself of how I sounded in the past and to reflect on how I sound now.
Since I do not, of course, retain a copy of the original BBCode, the text has been reformatted in Markdown. Code is left unchanged, though I was tempted to rewrite scripts using the wonderful Scratchblocks2 renderer (this post predates even the original Scratchblocks). I have also made an effort to convert the equations contained herein to MathJax/LaTeX-worthy formats to facilitate reading. This is, of course a tradeoff: I lose the original formatting of the equations (which was delightful in itself) and make the text significantly less legible in its Markdown source. I hope I have made the right decision here. The original equations can always be viewed at the archive linked above.
This is a guide on plotting the Mandelbrot Set. It’s divided into 3 parts: What is the Mandelbrot Set?, Understanding the Algorithm, and Programming the algorithm. Here is a project on plotting it, if you don’t get it.
The Mandelbrot Set (M-Set in short) is a fractal. It is plotted on the complex plane. It is an example of how intricate patterns can be formed from a simple math equation. It is entirely self-similar. Within the fractal, there are mini-Mandelbrot Sets, which have their own M-Sets, which have their own M-Sets, which have their own M-sets, etc.
Though most representations of the M-Set have color, only the black bit is part of the set. The color is to basically show how long it took to prove that that point wasn’t in the set. However, these form cool patterns, too.
Here are some pictures:
Only the set:
With color:
The M-Set is generated using the algorithm:
\[ Z_{n+1}=z_{n}^2 + C \]
Here, both ($ Z $) and ($ C $) are complex numbers. What are complex numbers? They’re, put simply, square roots of negative numbers. Since negative numbers can’t have square roots, we created ‘complex’ or ‘imaginary’ numbers to deal with it. ‘($ i $)’ (pronounced iota) is the symbol for ($ \sqrt{-1} $). Complex numbers are expressed as multiples of ($ i $), like ($ 3i $). They are graphed on a number line perpendicular to the number line we all know. The resultant plane is called the complex plane, and is where we will graph the Mandelbrot Set.
The complex plane:
1i
-1 0 +1
-1i
Complex numbers are defined as the sum of a real number and an imaginary number. Examples are ($ 3i + 1 $) or ($ 4i - 2 $).
In this expression, C is the complex number for which you are testing whether or not it’s in the M-Set (it will define a single point on the complex plane — C is a real number plus an imaginary one, remember?). Here’s how you use it: You set Z to 0. Then set ($ Z $) to ($ Z^2 + C $). We call this action iteration. For example, if C was 3 (I’m using a real number for simplicity), ($ z $) would be:
\[ 0 \] \[ 0^2 + 3 = 3 \] \[ 3^2 + 3 = 12 \] \[ 12^2 + 3 = 147 \]
etc.
If you did this many, many times, there are two possibilities for ($ Z $) — it escapes to infinity, or it doesn’t. If it doesn’t escape, it is in the set. This looks hard to calculate—how can we know whether it reaches infinity? For all we know at 1000000000 iterations it’ll be a normal, but after 1000000001 iterations it starts constantly doubling. Fortunately, we know 2 other things:
It has been proved that if Z ever gets higher than 2, it will escape to infinity
If it does escape, it’ll do so normally within 50 iterations. More will make a more accurate picture, but it will slow the script down considerably. 50 is a good number.
So now, all we need to do is repeat ($ Z^2 + C $) 50 times and see how high it is. Great!
That’s all very nice, but there’s a catch (isn’t there always?)—this uses complex numbers, and Scratch—make that any programming language—doesn’t allow square roots of negative numbers. Try it yourself. You’ll get a red Error!. So how do we avoid this? Well, remember how a complex number is a real number plus an imaginary number, and an imaginary number is just ($ \sqrt{-1} $)? Well, that means we can split any variable, say ($ Q $), into two variables: q-Real and q-Complex, or abbreviated, ($ qR $) and ($ qX $). We also know that ($ qX $) squared is real, because ($ i^2 $) is ($ -1 $) which is real, and the coefficient is real anyway (here, coefficient is the 3 in ($ 3i $)). This means now we know how to square complex numbers to get real numbers. So, let’s take a look at the algorithm:
\[ Z_{n+1} = Z_{n}^2 + C \] \[ =(zR + zX)^2 + cR + cX \] \[ =zR^2 + zX^2 + 2zR*zX \left\{\text{by opening the brackets}\right\} + cR + cX \]
If you think carefully, ($ zX $) can only be represented by its coefficient. This is because ($ zX^2 $) is real, and ($ 2*zR*zX + cX $) (all the complex terms) can be represented by the coefficient, which is again because ($ zX^2 $) is real. So, we never ‘need’ the value of ($ i $), which is a huge relief. Now we can start coding!
We need a sprite to move through every point on the stage. That’s easy:
set x to -240
set y to 180
repeat 360
set x to -240
change y by -1
repeat 480
change x by 1
. . . Insert coding here . . .
end repeat
end repeat
See where I put . . . Insert coding here . . .
? That’s where we need to code
our algorithm. From now on, all the coding I show you will be in that segment.
Starting off,
set cR to x position/120 {because real numbers are on the horizontal number line. '/120' is to set the magnification.}
set cX to y position/120 {because imaginary numbers are on the vertical number line. '/120' is to set the magnification.}
set zR to 0
set zX to 0
set r to 0 {r will be the number of repetitions. You'll se why this will be helpful pretty soon}
This’ll set up all our variables. Great. Now for the hard part.
For the calculation, we need to set ($ zR $) to ($ zR^2 + (zX^2*-1 \{\text{because i^2 is -1}\}) + cR $), and set ($ zX $) to ($ 2*zR*zX + cX $). We get this just by grouping them on the basis of their complexness. Complex values get added into zX, the rest into zR.
set old_zR to zR {this is because zR will change, and when evaluating zX we'll get the wrong value of zR}
set zR to (zR*zR) + -1*(zX*zX) + cR
set zX to 2*old_zR*zX + cX
change r by 1 {'cause r counts the repeats}
But, before we put that in, we need to know how many times to iterate. Iterate means repeat, so we need to put this into a repeat. We know that by 50 iterations, we can be pretty sure whether ($ zR $) will ever exceed 2 or not. The problem is that it may exceed 2 very early, and approach infinity, causing Scratch to freeze because the numbers get out of hand. So, we need a ‘if I’m over 2, stop me right away’ repeat. This is exactly why we needed ($ r $).
repeat until [r > 50] or [zR > 2]
will do the job. [r > 50] says it repeats 50 times, [zR > 2] says if it's over 2, stop repeating.
Now, we can finally tell whether the point you chose is in the Mandelbrot Set or not. Whew! This part is really simple, so I’m not going to explain it (much).
if zR > 2
set pen color to [black]
pen down
pen up
else
set pen color to [r] {because non-set points are colored based on how long it took to establish C wasn't part of the M-Set}
set pen shade to [50] {because black has a shade of 0, and so all colors will end up black}
And we’re done! Congratulations!
You can get the whole project here.
]]>So a notorious Scratch user, TheLogFather, posted a project where he compared two different ways to prepend to a list. Here they are, as both an image and as a pseudocode.
mylist = [1,2,3,4,5,6,7,...,20000]
def prepend1(n):
repeat n times:
insert random number at beginning of mylist
def prepend2(n):
temp = new list
for each item in list:
append item at end of temp
delete all of mylist
repeat n times:
append random number at end of mylist
for each item in temp:
append item at end of mylist
It should be pretty clear that the two are semantically equivalent, that is,
that they will have the same net effect. However, notice that prepend2
uses
only append
instructions, while prepend1
uses the insert
instruction.
Now, from a cursory look, prepend2
looks much slower: you are iterating over
the initial contents of mylist
twice when copying it in and out of temp
.
Surprisingly, though, prepend2
is significantly faster. A quick speedtest
on my MacBook Pro says that prepend1
takes 4.616 seconds while prepend2
takes a mere 0.724 seconds for 20,000 items.
Hopefully the reason why will be clear at the end of this blog post.
The first thing to do here is dig into the source and find the implementation
of the insert
and add
blocks in Scratch. Scratch 2.0 is open-source and
written in Actionscript. We find it all on
Github. A little digging reveals the
file src/primitives/ListPrims.as
. This is where the list blocks are
implemented. Looking around, we find the important sections:
protected function listAppend(list:ListWatcher, item:*):void {
list.contents.push(item);
}
and
protected function listInsert(list:ListWatcher, i:int, item:*):void {
list.contents.splice(i - 1, 0, item);
}
Now, ListWatcher
is defined in src/watchers/ListWatcher.as
. We see that
contents
is an Array
:
public var contents:Array = [];
Let me say that again: in Scratch, a List is actually an Array. If this doesn’t bother you now, don’t worry, it will bother you soon.
Anyhow, we’re looking at the difference between Array.splice
and
Array.push
. From Adobe’s documentation, we know that Array.splice
inserts
and deletes elements of an array at an arbitrary index, and Array.push
tacks
elements on to the end of an array. Why is this important? To see that, you
need to know a bit about how memory works.
ActionScript is an incarnation of ECMAScript, much like JavaScript. There isn’t
much documentation about ActionScript’s VM, but there’s tons about how JS
handles Array
s. So, though it’s not a completely legitimate assumption, I’m
going to explain what’s going on based on how JavaScript does things.
In a computer’s RAM, you can access any memory address in constant-time. That means it takes just as long to get the contents of the billionth address as it does to get the contents of the first.
When you run a computer program, at a sufficiently low level, you’re allocating some segment of the RAM for the program to access. The program basically treats this segment as a really long list, where each element has an upper limit on size.
It’s easy to store an integer (within limits) or a boolean value in memory, because it takes up just one cell of memory. But anything larger—a string, a list, an image, or a sprite—takes up multiple cells, and so you need a scheme by which those cells are allocated in an organized way.
In low-level programming languages like C, you have the malloc
function that
allocates the given amount of memory as a sequence of contiguous cells and
returns the address of the first one. However, the catch is that once you’re
done using that memory, you are responsible for freeing it (with the free
function). Otherwise future calls to malloc
will assume that you’re still
using those addresses and pick some other addresses to allocate. Eventually
you’ll run out of memory and things will start dying.
In higher-level languages (like ActionScript and JavaScript), the interpreter manages memory for you. You’re free to run around creating sprites and lists and mammoths and cows and whatever, and the interpreter destroys them when it notices that you’re done with them. This is called Garbage Collection, and is beyond the scope of this article (seasoned CS students will get that joke).
When you create an Array
in JavaScript, the interpreter allocates a block of
memory for you. If you want to access the 5th element, the interpreter says
“malloc
told me the array starts at address 900, so I want to return the item
at address 900+5=905”.
Since RAM has constant lookup time, your Array will also have constant lookup time (it’ll just take a hair longer because you need to find the address).
Now that you know how Arrays work, it should be obvious that inserting an element into an array is hard. Why? Well, there’s no space! Think of the RAM as people sitting on a bus. The first few seats are full. If you make a fuss and insist on sitting in the front seat, you need to first move everyone else down one seat.
If you’re trying to put an item at the beginning of an array with 5000 elements, you need to move down each of those 5000 elements one cell. This takes 5000 operations. This is bad.
Similarly, deleting is hard because you’ll leave an empty seat, and then you need to move everyone up a seat to fill the hole (remember, the fast index-lookup only works if the array is contiguous!). If you delete the first element of an array with 5000 elements, the remaining 4999 need to be moved up a spot, which takes 4999 operations. This is also bad.
But this is often necessary, and is done all the time in JS or ActionScript
programs with the splice
function.
You might have already realized how push
is better. If you sit at the back of
the bus, you don’t need to relocate everyone in front of you. So it’s actually
a constant-time operation.
Of course, in a computer’s RAM, that cell might be occupied by the beginning of
another array. If that’s the case, the interpreter needs to move things around.
It’s unpleasant, but it only happens once in a while. So push
occasionally
takes a long time. But most of the time, it’s blazingly fast.
And that’s the reason why insert
is slower than add
in Scratch.
As an afterword, it’s worth mentioning that if you do lots of inserting and deleting, there’s a better way to store a list than an Array. It’s called a List. A List in CS is not the same thing as a List in Scratch.
A List consists of lots of pairs. A pair is a 2-element Array. The first element contains some data, and the second element is the address of the next pair (we can call this the “address pointer” because it points to the next pair).
For historic reasons, the “value” cell and the “address pointer” cell are
called the “address register” and “decrement register”, and so old languages
will provide functions like “contents of address register” (car
) and
“contents of decrement register” (cdr
, rhymes with udder). Lisp programmers
use car
and cdr
all the time to manipulate pairs. In fact, Lisp’s name
comes from List Processing.
The order in which the pairs are stored in memory doesn’t matter. So, when you want to insert something, you create a new pair anywhere in the RAM, and then change the address pointers to incorporate the new pair.
To delete something, you free
the pair and fix the pointers. Fixing pointers
is easy (just an assignment), so Lists are easy to mutate.
Of course, finding the 5000th element of a list would require you to follow
5000 address pointers (Lispers call it cdr
ing down the list), so lookups are
not constant-time anymore. Choosing the right data structure is important.
By the way, Snap! is extra-clever: it stores your lists as Lists or Arrays depending on how they were created. If you use Array-ey operations on them, they get converted to Arrays, and if you use List-ey operations on them, they get converted to Lists. If you want to be fast, you need to pick a style and stick with it. But that’s good programming practice anyway.
]]>‘Twas the night after Christmas in the Clause house,
And all you could hear was the click of a mouse,
Or the drop of a pin, or the sneeze of a louse,
Until he called out for his dear spouse.
“Mrs. Clause,” he cried, “I think it’s broken!”
“What?” she bellowed, annoyed to be woken.
“I’m locked out,” he said, regretting having spoken,
“I seem to have lost my login token.”
“Well, I did tell you your site’s a mess.”
“I have a weak password,” Santa confessed.
“I bet some crackers just made a good guess.”
“Perhaps we can track down their IP address?”
So Mrs. Clause began looking through logs,
While Santa whipped up some nice thick grog,
And soon enough, she found the clog:
“Who writes a database in Prolog?!”
“It looks like a simple DoS to me”
“Crudely done, from what I can see.”
“Seems like they stole your API keys.”
“Tracking these guys should be a breeze.”
So Santa called up an Elf for assistance,
(A big, macho creature known for his persistence)
He had ten years of UNIX experience,
And agreed to help on Santa’s insistence.
He fixed the firewall and flushed the cache,
And began to explore the system with Bash.
“Gosh, Santa, your code is trash,
Why are you using an md5 hash?”
So he patched the server (and ran git commit
)
and did some digging and found the culprit.
Then he accessed the Naughty List with a rootkit,
And typed the name and hit “submit”.
He shutdown the computer and turned off the light,
Knowing that two wrongs might just make a right,
Yet he could not resist shouting, as he fled out of sight:
“Happy Christmas to all, and to all a good night!”
Happy holidays from Comfortably Numbered.
]]>A quick recap: in Part I, we learned about how protocols
are really awesome, and how they stack onto each other to build abstractions.
We learned to use netcat
to create TCP connections, and then played with HTTP
and HTTPS.
Let’s write some protocol code (finished product available on Github). We’re going to use Python to build a bot for IRC, a beautiful and historic protocol for chatting over the Internet.
IRC bots are all over the place. Some do routine tasks like moderate channels.
Others let you play games like Mafia, or perhaps provide simple services like
spelling correction (or “make me a haiku”). Yet others simply keep logs of what
is said and then try to say relevant things at the right times. Perhaps the
most famous one is Bucket, who manages
#xkcd
.
Our IRC bot is going to provide a utility which tells a joke when someone says
!joke
in a particular channel.
The IRC protocol is specified in this document (that document updates this one, as listed in the header)). That document is called an RFC, or Request For Comments.
I think the RFC system is beautiful. RFCs are documents that standardize the important messy details that hold the Internet together. They were ‘invented’ by Stephen D. Crocker when he was assigned the task to document ideas in the early days of the Internet. An RFC is supposed to be a memo; a technical note or idea that is published for anyone to read, review, and (ha!) comment on. Everyone can write an RFC.
This is how standards come about. Once I have published a sensible RFC that standardizes some means of sharing files (for instance, the File Transfer Protocol or FTP, which people still use from time to time (RFC)), I can put it out there for smart people all over the world to review. If it’s really exciting, someone may write an implementation. Future implementations would abide by the rules, and so your FTP server and my FTP client would cooperate.
Notice how this is decentralized. It’s not Google saying “Ok, folks, this is how we’re going to transfer files. Deal with it.” RFCs specify the consensus of many experts—when you see an RFC describing a protocol that has many implementations, you know that many people agree that that’s the best way of doing things (and even if it isn’t, it’s a reasonable compromise).
Of course, if you have a completely different way of transferring files which would use a brand new super-secure hyper-compressed protocol, you can write your own RFC and implementations, and hope that it catches on.
There are a lot of RFCs around, some fascinating, some funny. To us, however, they are simply documentation for the protocol we will use to build our IRC bot.
But I digress.
The RFC isn’t hard to read, but I’ll just tell you the important parts here.
IRC works over the standard port 6667
, or port 6697
if it’s secure
(SSL-wrapped).
Let’s experiment on Freenode. I’m going to assume you have some idea of how IRC works (that is, you know what I mean when I say “channel” and “nick”). You may want to log into Freenode with another IRC client (an online one like KiwiIRC would suffice), just to see stuff happen.
Now, start with nc irc.freenode.net 6667
.
You might get some messages that look sort of like this:
:wilhelm.freenode.net NOTICE * :*** Looking up your hostname...
:wilhelm.freenode.net NOTICE * :*** Couldn't look up your hostname
That just means that Freenode’s servers hoped you were logging in from an IP
address with a registered domain, so that it can display your username as
“user@hostname” in a whois
query (and some other uses as well). But you’re at
home, and your home’s IP probably doesn’t have a domain name pointing to it
(unless you’re running a server at home!). So it just complains.
Anyway.
Type NICK an-irc-explorer
and hit enter. Then type USER an-irc-explorer * *
:Mr. IRC
. You should be greeted with a huge block of text that starts with
something like:
:wilhelm.freenode.net 001 an-irc-explorer :Welcome to the freenode Internet Relay Chat Network an-irc-explorer
:wilhelm.freenode.net 002 an-irc-explorer :Your host is wilhelm.freenode.net[37.48.83.75/6667], running version ircd-seven-1.1.3
:wilhelm.freenode.net 003 an-irc-explorer :This server was created Sat Mar 8 2014 at 15:57:41 CET
Basically, it’s telling you that you’ve successfully connected to the server,
and identified yourself with the nickname an-irc-explorer
.
You can now converse with server by sending it more commands (type HELP
to
list commands). Perhaps the most exciting one is JOIN
: type JOIN #bots
to
join Freenode’s channel for bots.
Your normal IRC client should show that an-irc-explorer
joined.
To say things, use PRIVMSG #bots :Hello, world.
(The colon is a
separator character, not part of the actual text that will be displayed on a
client. Its purpose is to allow the last argument to contain spaces,
which—unsurprisingly—show up quite a lot in chat messages.)
If you say something from your client, you should get some scary text like this:
:hardmath123!hardmath123@gateway/web/cgi-irc/kiwiirc.com/ PRIVMSG #bots :Hi
The first bit tells you who said something, and is of the form
nickname!username@hostname
(remember the hostname lecture above?). PRIVMSG
means you’re getting a message, #bots
is the channel, and Hi
is the
message.
Not too hard.
There’s one last thing you need to know. IRC checks if you’re there
periodically by sending out a PING
message. All you need to do is send the
server PONG
and it’ll be happy. If you don’t, the server assumes something
bad happened, and kills the connection based on a “Ping Timeout” (generally 120
seconds ish). It’s the protocol version of “if I’m not back in an hour, call
the police.”
And now you’re ready for some code.
Fire up Python.
Python gives you sockets with the socket
module, which corresponds rather
well with C’s sockets.
import socket
Let’s create a socket and connect it to Freenode.
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("irc.freenode.net", 6667))
The first line has two magic constants. AF_INET
is best
explained by Brian “Beej” Hall:
In some documentation, you’ll see mention of a mystical “PF_INET”. This is a weird etherial beast that is rarely seen in nature, but I might as well clarify it a bit here. Once a long time ago, it was thought that maybe a address family (what the “AF” in “AF_INET” stands for) might support several protocols that were referenced by their protocol family (what the “PF” in “PF_INET” stands for). That didn’t happen. Oh well. So the correct thing to do is to use AF_INET in your struct sockaddr_in and PF_INET in your call to socket(). But practically speaking, you can use AF_INET everywhere. And, since that’s what W. Richard Stevens does in his book, that’s what I’ll do here.
SOCK_STREAM
means TCP
. The alternative is SOCK_DGRAM
, which means
UDP
–User Datagram Protocol. There’s also SOCK_RAW
, which requires root
privileges and makes (you guessed it) a raw IP socket. We discussed these in
the previous installment.
The connect
line, of course, connects to a remote socket somewhere in the
Freenode network. It’s important to realize that that remote socket is the same
thing as the one you just created. The server/client-ness is an abstraction.
Your own socket has a host and port, too, which you can find with
s.getsockname()
. You’ll get something like ('192.168.0.4', 60694)
.
Getting data from a socket is kind of messy, because of how TCP works. You don’t really ever know if the server wants to send more data or not. So, UNIX sockets work like this: you specify how many bytes you want to read, and the process will be paused (“blocked”) until the server sends you stuff (it could, in theory, just sit there forever). When something is sent, at most that number of bytes is given to you as a string.
In practice, this looks like:
data = s.recv(1024)
to read at most 1024 bytes from the server.
There is a problem here. A chat client doesn’t want to just freeze until a new message is sent, it wants to do other things and occasionally carry out actions if there’s a new message.
The hard solution is to use Python threads. You have multiple bits of code running around doing stuff at the same time, and you’re very careful about the socket’s state. If you’re not, you might end up reading and writing at the same time and bad things will ensue.
That will end in a mess.
The easy solution is the select
module. It’s used as such:
import select
readables, writables, exceptionals = select.select([s], [s], [s])
select.select
will return three lists: a list of sockets that are readable,
a list of sockets that are writable, and a list of sockets that are in a bad
situation and erroring/broken/eating your laundry.
Now we can check:
if len(readables) == 1:
data = s.recv(1024) # won't block
Note that we still don’t know how much data there is. In fact, we have no way of knowing how much data there is, because the server may have sent another 20 bytes which TCP hasn’t reassembled yet.
So, it’s generally advisable for protocols to specify a maximum message length and some signal that a message has been terminated. Section 2.3 of RFC 2812 very helpfully tells us how IRC handles this:
IRC messages are always lines of characters terminated with a CR-LF (Carriage Return - Line Feed) pair, and these messages SHALL NOT exceed 512 characters in length, counting all characters including the trailing CR-LF. Thus, there are 510 characters maximum allowed for the command and its parameters. There is no provision for continuation of message lines.
We can put that all together as follows:
import select
import time
def read_loop(callback):
data = ""
CRLF = '\r\n'
while True:
time.sleep(1) # prevent CPU hogging :)
readables, writables, exceptionals = select.select([s], [s], [s])
if len(readables) == 1:
data += s.recv(512);
while CRLF in data:
message = data[:data.index(CRLF)]
data = data[data.index(CRLF)+2:]
callback(message)
callback
is called every time a complete message has been received. Also,
when it’s being called, no other socket operations are happening (reads or
writes). As we discussed above, this is a major win.
Writing to the socket is much easier, it’s literally s.sendall('data')
.
(There’s also s.send('data')
, which isn’t guaranteed to actually send all of
the data, but returns the number of bytes that actually made it (based on TCP
acknowledgements). sendall
is an abstraction on top of this.)
Guess what? You know enough to write a bot now!
import random
jokes = [
# populate me with some good ones!
# this might be the hardest part of writing the bot.
"You kill vegetarian vampires with a steak to the heart.",
"Is it solipsistic in here, or is it just me?",
"What do you call two crows on a branch? Attempted murder."
]
s.sendall("NICK funnybot\r\n")
s.sendall("USER funnybot * * :hardmath123's bot\r\n")
connected = False
def got_message(message):
global connected # yes, bad Python style. but it works to explain the concept, right?
words = message.split(' ')
if 'PING' in message:
s.sendall('PONG\r\n') # it never hurts to do this :)
if words[1] == '001' and not connected:
# As per section 5.1 of the RFC, 001 is the numeric response for
# a successful connection/welcome message.
connected = True
s.sendall("JOIN #bots\r\n")
elif words[1] == 'PRIVMSG' and words[2] == '#bots' and '!joke' in words[3] and connected:
# Someone probably said `!joke` in #bots.
s.sendall("PRIVMSG #bots :" + random.choice(jokes) + "\r\n")
read_loop(got_message)
This is actually all you need. If you concatenate all the snippets in this blog post, you will have a working bot. It’s surprisingly terse (yay Python?).
We can actually make this secure with just two more lines of code. Remember how I said SSL was ‘wrapped’ around a normal protocol?
import ssl
secure_socket = ssl.wrap_socket(plain_old_socket)
secure_socket
has all the methods of a normal socket, but the ssl
module
handles the SSL negotiations and encryption behind the scenes. Abstraction at
its finest.
You can play around with reading the documentation and integrating this into your bot (remember to use the new secure port, 6697).
If you end up deploying a bot, make sure you read the channel or server’s bot policy. For instance, Foonetic provides these instructions:
Bot Policy: Well behaved bots are allowed. Annoying bots or bots which are insecure or poorly behaved are not allowed. Channel owners may have their own policy for public bots; it is advised you check with a channel operator before bringing a bot into a channel. Excessive bots from a single network address may exceed the session limit and/or be considered cloning. Please mark your bot with user mode +B and your nick in the “Real Name” field so that an oper can locate you if your bot malfunctions. Absolutely no “botnets” are allowed and any illegal activity will be reported to your ISP!
If you’re lazy, you can get a slightly refined version of this code from my Github.
So. We learned about how RFCs work, and we read the RFC on IRC. Then we used
that knowledge to built an IRC bot on top of Python’s low-level socket
library.
The techniques you learned in this blog posts are useful for all sorts of
things. For instance, SMTP is a simple email protocol (port 25), and it’s an
easy way to send from a script (Python ships with smtplib
, a module that
wraps around the protocol). Similarly, telnet
is a very lightweight protocol
that adds some terminal-specific frills to netcat (screen size, raw mode, etc.)
Even bitcoin needs documented
protocols to work. It’s
definitely worth learning how these things work.
Enjoy, and happy botwriting.
]]>Dear famous producers, scriptwriters, authors, and publishers:
On behalf of the programming community, I would like to bring up a rather sore
point among us. Whenever you have a scene involving “hacking”, you seem to make
it a point to write scripts by mashing complex-sounding buzzwords. We all
cringed when someone declared
“I’ll create a GUI interface using Visual Basic. See if I can track an IP
address” in CSI. We facepalmed when Randy used telnet
to make a secure
connection in Cryptonomicon. We spawned a subreddit when Lex used a UNIX
system. And some of us broke out
in hives at N3mbers’ description of
IRC.
But we cheered when Trinity used
nmap
in The Matrix Reloaded.
When you depict ‘hacking’ as an esoteric dark art, you tell the public that ‘hackers’ are a breed of sorcerers who know the right incantations to make the Internet bow to their will.
This is a lot like claiming pharmacists are brilliant potion-makers who pass down the secrets to make mystic brews that control the human body.
But I don’t see any whizz kids saying “Hang on, I bet I can cook up a quick truth serum by distilling the monorubidium dibenzene crystal. Could you hand me the Bunsen burner?” (Followed by one of the most annoying line in all of cinema: “In English, Doc?”)
This strange caricature that popular culture has drawn is what makes people regard ‘hackers’ with a blend of suspicion and fear. It leads to a vast misperception of what ‘hacking’ really is. As a more tangible effect, it also, indirectly leads to the government not knowing how to handle computer security cases as well as other cases.
You’re stereotyping an entire community: a community with history and values.
Of course, there are the bad guys who steal bitcoin and leak Sony employee’s personal emails. The least pop culture could do about them is to stop glorifying them as tech savants and pointing out that almost all such ‘victories’ are simply cases of a big company not installing the latest updates to their software (this is not a joke).
On the other hand, there are the heroes of computer security: people who dedicate their time and resources to finding and fixing critical issues in open source software to keep us safe. These are the real geniuses; they are brilliant folks with an immense knowledge of how everything works. It’s unfair to represent them as the same people as above.
Public opinion is really important in things like this, and movies and books are huge influences on it.
So here’s a request. Next time you have a scene with hacking, consult with an expert. Or even a geeky high school student (myself included). Ask them to tell you about a plausible real-world attack, and take the time to understand it at a conceptual level.
Learn about its history: when was it discovered? Did anyone get in trouble by using it? Was it embargoed, allowing big companies to patch their systems before the general public was told about it? Or was it leaked? What might show up on a computer screen when you’re carrying out the attack?
I promise it’s going to be much cooler than anything fictional. We regularly talk about things like the BEAST attack, the Heartbleed exploit and the Shellshock vulnerability. We have tools called Metasploit. We even use the phrase ‘poisoned cookies’ in research papers.
In the world of ‘hackers’, truth is way cooler than fiction.
P.S. You might have noticed that I put ‘hackers’ and ‘hacking’ under scare quotes throughout this article. There is a reason for this. In the CS culture, a ‘hacker’ is not a criminal. A ‘hack’ is simply an appropriate application of ingenuity. Eric Steven Raymond explains this perfectly in his excellent document how to be a hacker:
There is another group of people who loudly call themselves hackers, but aren’t. These are people (mainly adolescent males) who get a kick out of breaking into computers and phreaking the phone system. Real hackers call these people ‘crackers’ and want nothing to do with them. Real hackers mostly think crackers are lazy, irresponsible, and not very bright, and object that being able to break security doesn’t make you a hacker any more than being able to hotwire cars makes you an automotive engineer. Unfortunately, many journalists and writers have been fooled into using the word ‘hacker’ to describe crackers; this irritates real hackers no end.
The basic difference is this: hackers build things, crackers break them.
If you want to be a hacker, keep reading. If you want to be a cracker, go read the alt.2600 newsgroup and get ready to do five to ten in the slammer after finding out you aren’t as smart as you think you are. And that’s all I’m going to say about crackers.
Someday, I’d like to watch a movie where the FBI imprisons a cracker, not a hacker.
Until then,
Yours Truly.
]]>tmux is a terminal multiplexor, which is
nerdspeak for a program that runs multiple processes simultaneously within a
single parent process. You might have heard of screen
; it’s similar (and, in
fact, a lot of tmux
quickstarts assume that you’re transitioning from
screen
). This lets you have, for example, a text editor and a test server
running in the same physical terminal window. Instead of opening multiple ssh
connections to your server, tmux
allows you to maintain a single connection
and divide your screen up virtually into multiple panes.
Another nice thing about tmux
is that the virtual panes are independent of
the processes running, so you can “detach” a process and leave it running in
the background without any terminal displaying the output. In face, a detached
tmux
session lives on even if you disconnect the ssh
session. When you log
back on, you can reattach to that process again.
Anyway, let’s get started. You make a tmux session by typing tmux
in bash.
Your screen should get a pretty green ribbon under it, saying 0:bash
. This
means you’re currently in window 0, running bash
. You can do normal bashey
things here (ls
, vim
, irssi
, whatever): tmux
simply feeds your user
input along to the bash process.
Well, almost. tmux
listens in and intercepts any input that begins with a
special keypress, ^B
. You type this with the control character and ‘B’ the
way you would type ^C
to kill a bash process. We call ^B
the “prefix”.
Let’s detach from tmux
! For the dramatic effect, feel free to leave some
process running—perhaps a Python session or even your IRC client. Type ^B D
(that is, the prefix followed by the D
character).
You should be back to the old bash. But the process you started is still
running in the background: just not getting any input from you (or showing you
any output). To reconnect to it, type tmux attach
and you should get your
process again. The easiest way to kill a session is to simply exit all the
processes in it; if only bash
is running, then type exit
.
You can use and manipulate multiple different named sessions, by specifying
different command line arguments to tmux
, such as tmux new -s
name_of_new_session
to make a new session, tmux attach -t name
to attach to
a named session, and tmux kill-session -t name
to kill a session. tmux ls
lists sessions.
But the more interesting stuff is multiplexing. Open up a session and type ^B
%
. Your pane should split into two columns. You now essentially have two
virtual terminals. Use ^B arrow-keys
to switch between panes. To close a pane
you exit the process that was running in that pane (exit
in bash).
You can use ^B "
to split the other way (horizontally, so the new pane is
below the old one). And there are a bunch of commands to resize and swap panes.
Instead of saying them all over again, I’m going to point you to this gist, which has all the information you need.
In general, I use tmux
as a way to keep my session as I left it when I logout
(for example, this post was written across a couple of days, but I didn’t close
vim at all). Also, it’s an easy way to leave a server or a bot running
perpetually.
Enjoy tmuxing!
]]>I like protocols. The Internet is like being in a party, and trying to have a conversation with the person across the room by passing post-it notes. Except you can only fit a couple of words onto a post-it note (of which you have, of course, a limited supply). And people take as long as they want to pass along the note. Or they could just forget about it. Some of them might read the notes, others may replace your notes with their own. And the person across the room only speaks Finnish.
Despite these hostile conditions, the Internet works. It works because we have protocols—rules that computers in a network obey so that they can all get along.
And you can understand these protocols. It’s not rocket science: it’s socket science! (I promise that was the only pun in this post.)
Protocols fit onto other protocols. The lowest-level protocol you should really care about is TCP: the Transmission Control Protocol. TCP handles taking a large message, dividing it among many post-it notes, and then reassembling the message at the other end. If some notes get lost along the way, TCP sends replacements. Each post-it is called a “packet”.
Of course, TCP fits on top of another protocol, the Internet Protocol (IP, as in IP Address), which handles even messier things like ensuring a packet gets passed on from its source to its destination. There are other protocols that live on IP: UDP is like TCP, except it doesn’t care whether packets get there. If you’re writing a video conferencing service, you don’t need to ensure that each packet makes it, because they’ll be out of date. So you use UDP.
TCP is handled at the kernel level, so when you send out a message, it’s wrapped in TCP automatically. In fact, you need administrator privileges on UNIX to send out “raw” packets (there are occasionally reasons to do this).
To create a TCP connection, we use sockets. Most languages provide socket
bindings: I’ll use Python’s API (which is very similar to C’s), but Node.js’s
net
module does the same thing.
A quick way to get a socket working is to use netcat
(it’s called nc
on
most UNIX shells). There’s also telnet
, but telnet listens for its own
protocol (for instance, if your connection sends a specific string, telnet
will automatically send back your screen width).
Alternatively, you can use ncat
, which comes bundled with nmap
. I prefer
ncat
; I’ll explain why in a little bit. This command-line utility is pretty
much a UNIX stream that sends out stdin by TCP, and writes incoming messages to
stdout.
Once you have TCP working, you can do all sorts of stuff, because now message length and integrity has been abstracted away. For example, you explore the web by abiding by the HyperText Transfer Protocol (HTTP, as in http://something).
In fact, it’s worth trying right now. HTTP has the concept of a “request” and a
“response”. HTTP requests look sort of like GET /index.html HTTP/1.1
. GET is
the “method”, you can also POST, PUT, or DELETE (or even
BREW).
/index.html
is the path (the stuff you would type after www.google.com
in
the address bar), and HTTP/1.1
is the protocol (you could, in theory, have
another protocol running—HTTP 2.0 is being drafted as I write this).
Let’s do it. Open up a shell and try nc google.com 80
. You’re now connected
to Google. Try sending it GET /index.html HTTP/1.1
. You’ll need to hit
“enter” twice (it’s part of the protocol!).
You’ll be greeted by a huge mess of symbols, which is the HTML code that makes
up the Google homepage. Note that the connection doesn’t close, so you can send
another request if you want. In fact, let’s do that: if you only want to check
out the protocol, you can send a HEAD
request, which is identical to GET
except it doesn’t send back an actual message. In the real world, HEAD
is
useful to efficiently check if a file exists on a server. If you try HEAD
/index.html HTTP/1.1
, you get:
HTTP/1.1 200 OK
Date: Sat, 29 Nov 2014 19:40:02 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=67a496862b9f3c29:FF=0:TM=1417290002:LM=1417290002:S=8UjQDBRWYSa1y9tA; expires=Mon, 28-Nov-2016 19:40:02 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=NnwRLRx4JVz-x3lWFTSxzV_ZxLi_TLVmbw8oDifyhzT2iuWwQ0mVveS15bE8jI28kI-p8cMIEXmmwDmwlxojTY07azz6XzcmeRD7mHerDLuVjPwjV180AxNqWBHqJrfp; expires=Sun, 31-May-2015 19:40:02 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic,p=0.02
Transfer-Encoding: chunked
This looks messy, but it really isn’t. You can see how the protocol works: you
start with the protocol name and 200 OK
, which is the response code. You
are probably familiar with another response code, 404 NOT FOUND
.
Then each line begins with some header, a colon, and then information. For
instance, you get the date, you get the content type (text/html
), etc.
The Cookie headers instruct the browser to save those values in a local file. When the web browser sends further requests, the protocol instructs it to send the saved cookies as a part of the request. This lets websites track you—and is the reason Gmail keeps you logged in even when you close the window.
So far, so good. One thing that may have bothered you was the 80
you typed
as an argument for nc
. That’s the port number. The idea is that a computer
can serve multiple websites by having multiple active sockets. To allow this,
TCP has a port argument: your computer has 65,536 ports and it delivers packets
to the right one.
As I said, ncat
comes bundled with nmap
. nmap
is a port scanner, a
script that checks every port of a computer to see if anything is listening
(this is one of those places where raw sockets make things much more
efficient). Running port scans lets an attacker find vulnerable programs
running, and then exploit them (for instance, test servers or outdated services
that have known security issues are easy targets).
Don’t run port scans on computers you don’t own. nmap
is designed to be used
by professional network security people, who keep huge sites like Google up and
running safely.
80
is the conventional port for HTTP, but you can serve a website on any
port. To access it from a web browser, you append the port after the domain
name, like http://example.com:81/index.html
.
The other thing that may have bothered you was how the computer know who
google.com
was. The answer to that is another protocol: the DNS protocol,
which is used to ask a DNS server to resolve a domain name (like Google.com)
into an IP address. You can try this with the host
command:
$ host google.com
google.com has address 74.125.239.135
google.com has address 74.125.239.129
google.com has address 74.125.239.134
google.com has address 74.125.239.131
google.com has address 74.125.239.133
google.com has address 74.125.239.142
google.com has address 74.125.239.128
google.com has address 74.125.239.137
google.com has address 74.125.239.136
google.com has address 74.125.239.130
google.com has address 74.125.239.132
google.com has IPv6 address 2607:f8b0:4005:800::1009
google.com mail is handled by 30 alt2.aspmx.l.google.com.
google.com mail is handled by 40 alt3.aspmx.l.google.com.
google.com mail is handled by 10 aspmx.l.google.com.
google.com mail is handled by 20 alt1.aspmx.l.google.com.
google.com mail is handled by 50 alt4.aspmx.l.google.com.
Once you choose an IP address, the protocol lets you track down that computer and establish a connection.
Now, you’re often told to always use HTTPS, because it’s secure. You can probably already tell how insecure HTTP is: any guest at the party can read your post-it packets and know everything.
A fun thing to try is to run tcpdump
: it’ll dump packets from your computer
as they’re sent out or received (you may like the -X
option). Mess around
with the options a bit, and you can read the raw contents of HTTP packets as
you surf the web. You’ll need to be an administrator to run it, but if you
think about it, that’s probably a good thing.
Anyhow, back to HTTPS: it’s just HTTP, except sitting on top of another protocol called SSL (or TLS—it’s sort of complicated). SSL handles finding an encryption that both you and your connection agree is secure, negotiating a shared secret key, and then sending encrypted messages. It also lets you authenticate people by passing around certificates that are cryptographically signed by authorities.
HTTPS runs on port 443, which is the other default port that your browser doesn’t need to be told. You can try the above HTTP fun on port 443: most websites will get mad at you and kill the connection.
This is the reason I like ncat
: the --ssl
option wraps your connection in
the SSL negotiations and encrypts what you send (you can’t viably do this
manually). Try ncat --ssl google.com 443
: things should work as normal now,
but tcpdump
will show you gibberish.
At this point we’ve foiled almost all the hurdles in our initial party analogy, so I’m going to take a break.
In Part II of this series, we’ll explore some more protocols and write clients using Python. We’ll talk about how protocols are established, and why it’s important that it works the way it does right now.
In Part III (yet-to-be-written), we’ll explore four recent showstopper exploits, all of which make plenty of sense once you understand protocols: Goto Fail, Heartbleed, Shellshock, and POODLE.
]]>One of the things I had to do for PicoCTF was learn how to wrangle binary strings in various languages. The idea is that you think of a string as an array of numbers instead of an array of characters. It’s only coincidental that some of those numbers have alternate representations, such as “A”. The alphabet-number correspondence is an established table. Look up ASCII.
Each number is a byte (aka an unsigned char
), so it ranges from 0 to 255.
This means it’s convenient to express them in hex notation—each number is two
hex digits, so 0xff
is 256.
Using this, we can turn strings into hex sequences (by doubling the number of printed characters), and then turn the hex sequence into a decimal number. This is great for crypto, because many algorithms (including RSA) can encrypt a single number.
We can also use base64 scheme to turn
binary strings into printable strings. It uses case-sensitive alphabet (52),
numbers (10), +
, and /
(2) as the 64 symbols. Each set of three bytes is
represented by four base64 symbols. Note that this means we need to pad the
string if it isn’t a multiple of 3 bytes. The padding is indicated with =
or
==
at the end of the encoded message.
This post summarizes some really useful functions for working with binary strings.
You can use hexadecimal literals in Python strings with a \x
escape code:
s = '\x63\x6f\x77'
To get this representation of a string that’s already in memory, use repr
. It
will turn unprintable characters into their escape codes (though it will prefer
abbreviations like \n
over hex if possible).
You can use ord
to turn a character into a number, so ord('x') == 120
(in
decimal! It’s equal to 0x78
). The opposite function is chr
, which turns a
number into a character, so chr(120) == 'x'
. Python allows hex literals, so
you can also directly say chr(0x78) == 'x'
.
To convert a number to a hex string, use the (guesses, anyone?) hex
function.
To go the other way, use int(hex_number, 16)
:
hex(3735928559) == '0xdeadbeef'
int('deadbeef', 16) == 3735928559
To convert a string to or from hex, use str.encode
and str.decode
:
>>> 'cow'.encode('hex')
'636f77'
>>> '636f77'.decode('hex')
'cow'
The pattern hex(number).decode('hex')
is quite common (for example, in RSA
problems). Keep in mind that you need to strip the leading 0x
and possibly a
trailing L
from the output of hex
, and also make sure to pad with a leading
0
if there are an odd number of characters.
Finally, Python handles base64 with the base64
module, but you can also just
use str.encode('base64')
and str.decode('base64')
. Keep in mind that it
tacks on trailing \n
s. I don’t know why.
JavaScript is pretty similar. It supports \x12
notation, and 0x123
hex
literals. The equivalent of ord
and chr
are "a".charCodeAt(0)
and
String.fromCharCode(12)
, respectively.
You can convert a hex string to decimal with parseInt(hex_string, 16)
, and go
the other way with a_number.toString(16)
:
parseInt("deadbeef", 16) == 3735928559
3735928559.toString(16) == 'deadbeef'
Note the lack of 0x
.
Unfortunately, there isn’t a built-in string to hex string encoding or decoding built into JavaScript, but it isn’t too hard to do on your own with some clever Regexes. The tricky part is knowing when to pad.
Browser JS has atob
and btoa
for base64 conversions (read them as
“ascii-to-binary” and “binary-to-ascii”). You can install both of those as
Node modules from npm: npm install atob btoa
.
For the sake of completeness, I wanted to mention how to use Bash to input
binary strings to programs. Use the -e
flag to parse hex-escaping in string
literals, and -n
to suppress the trailing \n
(both of these are useful to
feed a binary a malformed string):
$ echo "abc\x78"
abc\x78
$ echo -e "abc\x78"
abcx
$ echo -ne "abc\x78"
abcx$ # the newline was suppressed so the prompt ran over
Alternatively, printf
does pretty much the same thing as echo -ne
.
Sometimes you want to be able to write more data after that, but the binary is
using read()
. In those cases, it’s helpful to use sleep
to fool read
into
thinking you finished typing:
{ printf "bad_input_1\x00 mwahaha";
# the zero char signals end-of-string
# in C, which can be used to wreak all
# sorts of havoc. :)
sleep 0.1;
printf "bad_input_2";
sleep 0.1;
cat -; # arbitrary input once we have shell or something
} | something
Or, if you’re intrepid, you can use Python’s subprocess
or Node’s
child_process
to pipe input to the binary manually.
UNIX comes with the base64
command to encode the standard input. You can use
base64 -D
to decode.
Use hex
when your binary string is a giant number, and use base64
when
you’re simply turning a binary string into a printable one.
Use wc -c
to get the character count of a binary file.
Use strings
to extract printable strings from a binary file, though ideally
not on trusted files.
Finally, use od
or xxd
to pretty-print binary strings along with their hex
and plaintext representations.
Almost any metric of work I’ve done—homework submitted, emails answered, hours spent playing piano, number of Github commits—show a sharp drop in the past two weeks. I pretty much spent every moment on a computer solving PicoCTF problems.
Pico was wonderful. It was an opportunity to do stuff I couldn’t (legally) do before, and learn stuff that many adults would hesitate to teach teenagers. I also got to hang out with cool hackers on their IRC channel, and had an excuse to stay up till 2am hacking.
I’m planning on putting up some quick writeups of the problems I loved. If you haven’t spent an hour or so with these problems, you won’t have any clue what I’m talking about (and chances are that I myself won’t grok any of this a year from now). Nevertheless, here goes.
This problem caused too many people too many hours of pain.
The basic idea is that the programmer used <=
instead of <
wherever he was
iterating, so he has lots of off-by-one errors. In particular, when populating
his hex table, he has:
for (i = 0; i <= 6; ++i) {
hex_table['a' + i] = 10 + i;
}
This allows g
to be a valid hex character. Yay.
Now he goes around checking whether or not all the password characters are hex,
and he tries to make sure that all hex chars are used at least once by
populating the array digits
. But we can input g
in the password, so we can
set digits[16]
which overflows into password
:
int digits[16] = {0};
char password[64];
So far, so good. A char
is one byte, and an int
(on this setup) is 4 bytes.
So when we overflow an int
onto password
, we set the first four characters
of the array to \x01\x00\x00\x00
(the bytes are reversed because
Endianness!). With that zero byte, we’ve effectively reset password
to
\x01
. So now we can input \x01
as the confirmation and cheat the password
changer. Yay again.
Now what? It uses system()
to call a Python script. Yuck. On the bright side,
it uses a relative file path for the Python file (the author probably didn’t
test it in another working directory). So we simply cd
into our home
directory and make a new Python script with the same name, with contents:
print open('/home/obo/flag.txt').read()
When we send OBO the overflowed password, it runs this with the right privs and we win.
This was my favorite challenge (I have a thing for web exploitation). My first reaction was “CGI! Shell! This must be Shellshock!” Turns out they’ve patched their Bash, though, so that didn’t work.
The legitimate solution relies on the fact that Perl’s open()
is unsafe: you
can call it with a |
at the end, and it evaluates the argument in a shell,
sending back the result.
The webpage essentially asks the server to open the file {body part
type}{index}
, where body part type
is one of head
, nose
, etc. and
index
looks like 1.bmp
. So, of course, we can cheat by sending it 1.bmp;
ls|
and instead of a bitmap file, the server gets a directory listing.
This is pretty easy to try out with curl
…but we get back gibberish. It
looks like there’s some bitwise melding going on on the server that combines
the images. This dies when it gets ls
output. So we just send all the body
part parameters the same bad index. It bitwise &
s them together (i.e. nothing
happens) and we get the secret. Simple but beautiful.
This was my other favorite challenge, because there were so many different
things you needed to simultaneously break to get shell. Also, I just love the
way they used CSS3 polyfills to make the blink
tag work in non-ancient
browsers.
First, there’s the cookie signing. Steve, in all his wisdom, is authenticating cookies by maintaining a SHA1 signature of the cookie plus some secret nonce. Turns out, this is pretty insecure because of a simple padding message-extension hack. A quick Google search sends us this blog post, and they post some sample Python that gets the job done.
Though I recommend reading the blog post to actually understand what’s going on, the basic idea is that a SHA1 hash operates on an arbitrary number of blocks. The state of the algorithm at the end of one block is the input to the next block’s hash (the input to the first block’s hash is a well-known constant).
You don’t need to know the contents of the previous block to add another block. So we manually pad the payload we have (that is, the cookie) and tack on our own block. We can initialize the SHA state with the (known) hash of the first block and then compute valid subsequent hashes without ever knowing the key. Some quick modifications to the Python script given above let us forge arbitrarily long messages.
And now for something completely different. The server reads each line of the
cookie, and unpacks it using PHP’s unserialize
:
$settings_array = explode("\n", $custom_settings);
$custom_settings = array();
for ($i = 0; $i < count($settings_array); $i++) {
$setting = $settings_array[$i];
$setting = unserialize($setting);
$custom_settings[] = $setting;
}
With forged objects, we can make it instantiate arbitrary objects at
will—PHP’s serialization saves type information. Notice that the Post
object
defines:
function __destruct() {
// debugging stuff
$s = "<!-- POST " . htmlspecialchars($this->title);
$text = htmlspecialchars($this->text);
foreach ($this->filters as $filter)
$text = $filter->filter($text);
$s = $s . ": " . $text;
$s = $s . " -->";
echo $s;
}
So it will dump its contents in an HTML comment for debugging when it’s
destroyed by the GC. Since we can instantiate arbitrary Post
objects, we can
get their contents printed out at will. We’re very close now.
We can also create Filter
s that act on the Post
s. Filter
s use PHP’s
preg_replace
. That’s insecure, because you can use the e
flag to evaluate
arbitrary code based on the replacement text generated from regex captures.
Argh.
At this point, it was around 2am, my hands felt like rubber, and my eyes felt
like mozzerella balls. So I just copied posts/steve.txt
, and modified one of
the filters to dump the contents of the flag, and went to sleep in peace.
There are several lessons to be learned here, but the most important are:
Block uses a Substitution Permutation Network to encrypt the string—but it does it twice. I pretty much brute-forced this one. But I did it tastefully, so it merits a writeuplet.
We know that the message begins with message:<space>
, and we know the first 9
bytes of the output. This lets us mount a known-plaintext attack. Here’s how:
we encrypt the plaintext with all possible keys (there are
($2^{24}=16777216$) of them) and we decrypt the ciphertext with all possible
keys. Turns out that for the correct pair of keys, we’ll get the exact same
result (the intermediate encryption). This is a ‘meet-in-the-middle’ attack
(not to be confused with ‘man-in-the-middle’ or ‘Malcolm-in-the-Middle’), and
can be read about on
Wikipedia. This is
good—now all we need to do is find the intersection of two massive lists.
Once I’d compiled these lists manually (it took over an hour), I realized that
I would be graduating high school by the time a naive Python intersection
search finished. Fortunately, Python’s set
type has ridiculously fast
member-checking, so it took all of 5 minutes to find the keypair, and I was
done.
That’s all for now. I’ll probably write a couple more, depending on the amount of homework I need to make up this week…
]]>I wrote nearley working on course materials for a Berkeley CS course, but it quickly spiralled into a pretty big project. Perhaps more than parsing, I learned how to manage an open-source project with multiple contributors, and how to take concepts written in math-heavy notation and convert them to ideas (and code!).
There aren’t many tutorials about Earley parsing, because Earley parsing has been shadowed by the recursive descent or lookahead parsers that everyone uses. (The only significant Earley project out there is Marpa; I got some help from Marpa’s creator, Jeffrey Kegler.) But Earley parsers are awesome, because they will parse anything you give them. Depending on the algorithm specified, popular parsers such as lex/yacc, flex/bison, Jison, PEGjs, and Antlr will break depending on the grammar you give it. And by break, I mean infinite loops caused by left recursion, crashes, or stubborn refusals to compile because of a “shift-reduce error”.
Here’s my mini-tutorial that explains Earley parsing, with an emphasis on de-emphasizing notation. It’s adapted from a file that used to live in the git repo for nearley.
The Earley algorithm parses a string (or any other form of a stream of tokens) based on a grammar in Backus-Naur Form. A BNF grammar consists of a set of production rules, which are expansions of nonterminals. This is best illustrated with an example:
expression ->
number # a number is a valid expression
| expression "+" expression # sum
| expression "-" expression # difference
| "(" expression ")" # parenthesization
number -> "1" | "2" # for simplicity's sake, there are only 2 numbers
This small language would let you write programs such as (1+2+1+2)-1-2-1
.
expression
and number
are nonterminals, and "+"
and "-"
are
literals. The literals and nonterminals together are tokens.
The production rules followed the ->
s. The |
s delimited different
expansions. Thus, we could have written
number -> "1"
number -> "2"
and it would be an identical grammar.
For the rest of this guide, we use the following simple, recursive grammar:
E -> "(" E ")"
| null
this matches an arbitrary number of balanced parentheses: ()
, (())
, etc. It
also matches the empty string
. Keep in mind that for a parsing algorithm,
this is already very powerful, because you cannot write a regular expression
for this example.
Earley works by producing a table of partial parsings.
(Warning: some notation is about to ensue.)
The nth column of the table contains all possible ways to parse s[:n]
, the
first n characters of s. Each parsing is represented by the relevant
production rule, and a marker denoting how far we have parsed. This is
represented with a dot •
character placed after the last parsed token.
Consider the parsing of this string ()
with the grammar E
above. Column 0 of
the table looks like:
# COL 0
1. E -> • "(" E ")"
2. E -> • null
which indicates that we are expecting either of those two sequences.
We now proceed to process each entry (in order) as follows:
If the next token (the token after the marker •
) is null
, insert a new
entry, which is identical excpept that the marker is incremented. (The null
token doesn’t matter.) Then re-process according to these rules.
If the next token is a nonterminal, insert a new entry, which expects this nonterminal.
If there is no expected token (that is, the marker is all the way at the end), then we have parsed the nonterminal completely. Thus, find the rule that expected this nonterminal (as a result of rule 1), and increment its marker.
Following this procedure for Column 0, we have:
# COL 0 [processed]
1. E -> • "(" E ")"
2. E -> • null
3. E -> null •
Now, we consume a character from our string. The first character is "("
. We
bring forward any entry in the previous column that expects this character,
incrementing the marker. In this case, it is only the first entry of column 0.
Thus, we have:
# COL 1, consuming "("
1. E -> "(" • E ")" [from col 0 entry 1]
Processing, we have (you can read the comments top-to-bottom to get an idea of how the execution works):
# COL 1
# brought from consuming a "("
1. E -> "(" • E ")" [from col 0 entry 1]
# copy the relevant rules for the E expected by
# the first entry
2. E -> • "(" E ")" [from col 1 entry 1]
3. E -> • null [from col 1 entry 1]
# increment the null rule
4. E -> null • [from col 1 entry 3]
# entry 4 is completed, so we increment entry 1
5. E -> "(" E • ")" [from col 0 entry 1]
Notice how we must keep track of where each entry was added so that we know which entry to increment when it is completed.
Next, we consume a ")"
, the second (and last) character of our string. We
have:
# COL 2, consuming ")"
# brought from consuming a ")"
1. E -> "(" E ")" • [from col 0 entry 1]
Nothing further can be done, so the parsing is complete. We now find entries that are complete and created from an entry in column 0. (That means we have a parsing from the beginning of the string to the end). Since we have such an entry in column 2, this represents the parsing.
Nearley parses using the above algorithm, but giving each entry “baggage”, namely the parsed data as a tree structure. When we finish an entry (and are about to process it with rule 3), we apply the postprocessor function to the baggage. Once we determine a parsing, we can reveal—with a flourish—the postprocessed data to be used by the user.
If we had multiple entries that worked in the end, there would be multiple parsings of the grammar. This means the grammar is ambiguous, and this is generally a very bad sign. It can lead to messy programming bugs, or exponentially slow parsing.
It is analogous to the confusion generated when one says
]]>I’m really worried Christopher Nolan will kill a man dressed like a bat in his next movie. (The man will be dressed like a bat, I mean. Christopher Nolan won’t be, probably.)
I go to a tech-savvy school. And it’s getting out of hand.
On a daily basis, I navigate through several websites just to find my homework. Schoology tells me the assignment and due date. Then I head over to the teacher’s Google Site (almost every teacher maintains a class website). I need to complete sets of flashcards on Quizlet, make a project on Glogster, record my French on Audacity and post it to Dropbox (or, in one case, YouTube and print out a QR code that links there). I have to fill out final exams on Google Forms. I need to use SmartMusic to record my piano-playing.
I check my grades on Infinite Campus, except for Chemistry, where my
teacher has a hand-coded webpage (it uses frames and contains the tag
<SCRIPT LANGUAGE="Javascript"><!--...//-->
). One of my classes runs on
ClassroomDojo, which is essentially the Karma system applied to class
participation, and directly linked to grading.
Geddit wants each student to have an iPad, enabling teachers (for the first time in history!) to ask the entire class a question and see who can find the answer. Their website’s testimonial from a 9th grader is “Geddit, is like, totally private. So I can let my teacher know how I’m doing without, like, anyone else knowing.” English essays get turned into Turnitin, which checks them against a large database and informs us that most of them are liberally plagiarized, since we quoted Orwell. I’ve had to learn math on Mindspark; my fellow sixth graders could figure out how to crack the site (hint: they don’t sanitize HTML).
Our teachers all have ‘Smart boards’, which are essentially things that project onto whiteboards, except you can’t write on them with normal markers without being yelled at. And the projecters take half the period to start working, and then they do, they’re tempramental at best. If the teacher remembers to bring the cable.
Almost every class uses Scantrons, though the net increase in mis-bubbling, mis-grading, and overall stress made me realize that it might actually be less work for teachers to just check circled answers on a printout. I never liked multiple-choice tests, because it almost never tests the right things. It takes a far deeper understanding of science, history, a language, or math to write a coherent sentence. It’s also harder to cheat.
And that’s just the technology I deal with. There’s NoRedInk, which tries to teach you grammar. There’s Understoodit, which tries to eliminate hand-raising (is the problem really that students are too embarrassed to raise their hands? I doubt it.). And then there are the various counterparts that each of these apps has.
All I needed for my CS class was a terminal…
Sure, I understand that teachers want to use technology to promote learning. But there’s a difference between using technology, and shoving technology into an otherwise functional classroom. Most of these new ‘classroom technologies’ don’t teach us French. They teach us how to tolerate a badly-coded website. Technology is all about picking (or building) tools to make life easier. It’s about automation. Just because it appears on a screen doesn’t mean that it’s making life easier. Is writing a “blog entry post” or “E-mail to your teacher” instead of a paragraph or letter really that much more exciting?
As an analogy (because I don’t have too many of those already), it’s like telling a kid to use more special effects in their PowerPoint presentation. He’s learning the opposite of what he should. He’s going to end up as one of those people with red-text-on-a-blue-background, and five minute long slide transitions, and animations with Wile E. Coyote noises, because that’s what pleases the teacher.
But you want to teach him how to make a presentation that appeals to people and conveys information. Hopefully.
Similarly, technology at a classroom isn’t going to teach children what you want them to learn. It’s going to tell them to rapidly adopt any new technology without considering whether it’s needed. It’s going to tell them that the existence of technology makes things more impressive. It sends out a false message that they’re ‘computer whizzes’.
It isn’t going to teach them to choose tools wisely. To be careful with how you invest your time. To assess whether the software is really helping you or not. And it’s certainly not going to teach them any computer science. Contrary to popular opinion, not all kids are tech savants. Not all kids even have the resources. I don’t own a smartphone; I can’t efficiently scan a QR code at home.
The truth is that every student needs to learn by interacting with a learned instructor. Technology distances us from teachers. I will learn a lot more if I’m being assessed by a human. A computer can instantly give me the percent of questions I answered correctly, but I honestly don’t care. Start-ups that offer these services are, at the deepest level, businesses who don’t really have much interest in improving education (if they did, they would be doing a better job!). Stockholders aren’t in the classrooms.
Yes, there’s an education crisis (and yes, there has always been an education crisis), but the solution is not to monkey-patch it by thinking technology is smarter. The solution is to make sure teachers, not technology, interface with students.
I think I’m going to go write a script to poll my chem grades and email me when they’re updated.
P.S. What technology do I approve of at school? Google Drive is wonderful for word processing for high school. Email turns out to be (surprise, surprise) really useful. iCal (a.k.a. “Calendar”) is a good brain dump software. Google Keep helps manage lists and links. Feedly helps keep track of reading material. But my favorite bit of school-managing is a printout in a three-ring binder.
P.S.S. For nerds: Git is great to version-control big school projects. Because you’re going to mess up. LaTeX is great for reports/labs/essays/presentations (!), because typesetting means more than you think. WolframAlpha saved me in a lot of classes involving research. Oh, and it does math, too. GeoGebra turns out to be great for making diagrams and shiny demos in geometry.
P.S.S.S. A lot of teachers (including my (awesome) CS teacher) try to enforce
submission deadlines by asking you to save a timestamped copy of your document
in case you can’t submit the assignment to Turnitin. This is a terrible idea.
As an exercise, use touch -t
to show that this is a trivial system to beat.
Then suggest a practical but reasonably secure alternative (hashes are small
enough to write down on paper and bring to school).
I completely rewrote this story for an English assignment. Once it’s graded, I’ll updated this post with the much more exciting revised version. Maybe.
“Bingo.”
“Yes, bingo.”
“Like, the—“
“Yes, Mr. Smiley, bingo.”
Back in elementary school, our favorite adult was a Mr. Smiley. He hung out with us at lunch, told jokes on rainy days, and always won thumb wars. I remember very little about him, but I remember him having thick, bulging veins in his arms. There was always a different explanation for how they came to be. Some involved bears. Mr. Smiley was the school custodian.
“Alright, everyone, quiet down. Shhh! Come up—ONE BY ONE!—and get your boards.”
It was a rainy day. The grown-ups intended to herd us into the art room. Mrs. J (I never knew her real name, and after the third year it became too awkward to ask) was in there, with her stash of green boards. They had little red sliders across each of the 25 cells.
“The first one is… B-4!” Someone cried out “before what?”. This was traditional and obligatory.
Bingo days were the worst. It’s hard to explain the feeling a third grader has when he’s being told to quietly pretend to enjoy a game meant for people several times his age. You know that you could be drawing or talking to your friend. Or you could be out playing in the rain (an adult bursts into flames at the mere thought).
“N-32! Remember, I want you to have all four corners before you come up!”
And then there was the thrill of secretly communicating with people in the silence. I can still sign in Handspeak, our dialect of sign language, where you simply imitate the letters’ shapes with your fingers. Given another couple of years, I’m confident that we would have invented Morse code.
“O-42! That’s Oh-fooooorty-two!”
On top of all that, there was a distinct lack of exciting prizes. To date, the most exciting thing any of us had won was a whistle, which we were naturally forbidden from using.
“That’s adorable.”
“It’s not adorable, Mr. Smiley.” (Third graders cower at the word.) “It’s democracy.”
“Yes, of course. I’ll, uh, get you those photocopies riiiight away.”
I remember handing over a crumpled sheet of notebook paper. On each line was the name of one of the 80 students in the third grade who agreed with us. Next to it was the name written in worse handwriting, our attempts at signatures.
That sheet of paper marked the culmination of six weeks of convicincing kids that Bingo Is Boring™. We made lists, brainstormed alternatives (“where’s the nearest bowling alley?”), and drafted a letter to the principal. We had a president and a vice-president of the BIB™. We had meetings, speeches, and debates.
We had opposition. We had to merge with a competing group to form the Bingo Elimination Group (BEG™). Contracts were signed. As secretary, Chris handled the paperwork for me.
We had allies, especially the wonderfully helpful Mrs. Ayer who helpfully shot down our plan to send recon missions to the staff room.
We had more fun than we would ever have at recess.
In the long run, of course, we never did get rid of Bingo. But I don’t think that matters.
(A shoutout to Sardor, Ethan, and Chris. I hope I don’t need to wait till you’re all famous to find you again.)
]]>David finally convinced me why we use MLA format. This is a story that involves Van Halen, and the existence of a clause in their contract that demanded a bowl full of M&Ms, but no brown ones. I’ve included the story from the horse’s mouth, i.e. lead singer David Lee Roth (not the same David). All this is quoted from Snopes.
Van Halen was the first band to take huge productions into tertiary, third-level markets. We’d pull up with nine eighteen-wheeler trucks, full of gear, where the standard was three trucks, max. And there were many, many technical errors — whether it was the girders couldn’t support the weight, or the flooring would sink in, or the doors weren’t big enough to move the gear through.
The contract rider read like a version of the Chinese Yellow Pages because there was so much equipment, and so many human beings to make it function. So just as a little test, in the technical aspect of the rider, it would say “Article 148: There will be fifteen amperage voltage sockets at twenty-foot spaces, evenly, providing nineteen amperes…” This kind of thing. And article number 126, in the middle of nowhere, was: “There will be no brown M&M’s in the backstage area, upon pain of forfeiture of the show, with full compensation.”
So, when I would walk backstage, if I saw a brown M&M in that bowl… well, line-check the entire production. Guaranteed you’re going to arrive at a technical error. They didn’t read the contract. Guaranteed you’d run into a problem. Sometimes it would threaten to just destroy the whole show. Something like, literally, life-threatening.
The folks in Pueblo, Colorado, at the university, took the contract rather kinda casual. They had one of these new rubberized bouncy basketball floorings in their arena. They hadn’t read the contract, and weren’t sure, really, about the weight of this production; this thing weighed like the business end of a 747.
I came backstage. I found some brown M&M’s, I went into full Shakespearean “What is this before me?” … you know, with the skull in one hand … and promptly trashed the dressing room. Dumped the buffet, kicked a hole in the door, twelve thousand dollars’ worth of fun.
The staging sank through their floor. They didn’t bother to look at the weight requirements or anything, and this sank through their new flooring and did eighty thousand dollars’ worth of damage to the arena floor. The whole thing had to be replaced. It came out in the press that I discovered brown M&M’s and did eighty-five thousand dollars’ worth of damage to the backstage area.
Well, who am I to get in the way of a good rumor?
Well, (the real) David figured that conforming to MLA’s restrictions was your way of proclaiming a lack of brown M&Ms. If you got MLA wrong, then who knows what else you’ve messed up?
To be honest, I don’t care much for MLA, because whenever I’m using it, it’s because I’m doing another one of those English assignments where I feel I’m being graded on my word count. However, since David’s lecture, I’ve seen brown M&Ms pop up everywhere in my life. Most notably, in the programming world.
There are ‘best practices’ everywhere. Almost all of them, of course, are justifiable. Don’t use GOTO because nobody will be able to follow your code. Don’t use eval because it’s a welcome mat for crackers. Don’t indent with tabs, they render differently for everyone. Anyone sharing their first Python project knows how many opinions everyone has. It’s almost discouraging, because you’re trying to implement something for fun and people are yelling at you because you didn’t cache intermediate results in a binary search tree, and so the whole thing’s too slow to be practical, and you should just give up now and spend your time one something more useful, like learning [insert language here].
Don’t listen to those people.
Learning programming, or any other human endeavor, is all about implementation. You need to run into problems before you can appreciate the solutions we have devised.
For instance, I learned optics by writing a raytracer on my own, in seventh grade (side story: I didn’t know what vectors were, and later discovered that I had invented the cross product on my own, but mirror-imaged). In the beginning, it used to take a few hours to render a 100-by-100 image. As I found various optimizations, I incorporated them, and it got faster.
Would I have learned as much if the first time I showed it to someone, they told me to go implement k-D trees before they’ll give real constructive criticism? What kept me interested was implementing shiny ideas like Phong illumination (pun intended), and then optimizing them so that I could fit more tests within my lifetime. Anyone can find plenty of things wrong with my implementation. I was storing images as a JSON array of pixel values since I didn’t grok PNG.
Everyone has their own Brown M&Ms. We’re presented so much information on a daily basis, that we need Brown M&Ms to decide what matters and what doesn’t. I will probably put off an email whose subject line is misspelled. Or close a webpage with a pop-up. Or disregard pull requests that mess up whitespace. It’s ok to have Brown M&Ms—and it’s ok for them to be as obscure as you want—but it’s not ok to foist them upon others.
The next time you see a newbie modify the Array
prototype directly, please
don’t yell at them. Take a deep breath and let it go. Someday, they will run
into a namespace conflict, and that will be far more educational than your
rant. It’s a favor.
I have gone to a lot of schools. I have gone to schools where you can ace an English writing test be memorizing an essay the teacher gives you, and reproducing it on the test. I have done art classes where you spend every period copying a poster into your notebook (graded on accuracy of the reproduction). I have taken computer classes where you’re encouraged to use as many font and color variations as possible, to ‘display your knowledge’, and so end up with yellow italicized comic sans on a red gradient background. My English teacher plays a vocabulary game, where the first person to miss a word has to bring in a snack the next day. I’m currently in a French class that uses a Reddit-esque Karma system to monitor class participation, which is directly translated to a letter grade.
What do all these things have in common?
All these schools are trying to solve the same problem: how do you quantify learning? A district needs to know how well a school is doing (hence standardized tests), a school needs to know how well a teacher is doing, and a teacher needs to report a quantitative measure of how well a student is doing.
Of course, doing well isn’t quantifiable, so teachers manage by introducing various ‘objective’ measures. And any form of objective measurement becomes a game. Your GPA, for example, is something you can maximize. Tests are memory games. Any system where you can influence a number becomes a game.
Without grades, we have no way to quickly analyze a student’s performance. But grades don’t offer much insight into that, either. As a student, knowing someone has good grades tells me that they are good at the game. They work hard, consistently finish their homework, and study for tests. More importantly, they know what a teacher wants to see to give them good grades. They know that they should do well.
Only the student and the teacher can really know a student’s status. For a student who really needs help coming up to his or her goals, this turns out to be disastrous: any third party would first judge them by the numbers, and getting help becomes hard. Students end up working hard, not to reach their goals, but to improve their numbers. This is a problem, but that’s not what this post is about.
This post is about the opposite end: students who don’t have trouble keeping up. If you aren’t struggling, several parameters change. You’re now learning because you want to. This means you don’t care about the numbers that measure the learning. It’s blissful.
Well, it would be blissful, except that courses have already been designed, over decades, to guide you towards an objective test. And so even if you want to learn for pleasure, you’re being pushed through an objective, rigid ‘curriculum’.
This happened, for instance, in my AP Computer Science class last year. Almost every student there was far above average, and exceeding all expectations at school. It was the perfect environment for them to learn freely—and our teacher encouraged this—but they had to spend hours studying for the AP CS test.
Honestly, studying for the AP CS test isn’t very different from studying for the Spelling Bee. It’s just memorizing and practicing. It isn’t learning.
I propose, as an intellectual exercise, a different course format, for students learning for pleasure. The course is not graded. Instead, it simply connects students with teachers, who guide them in their learning. There is no curriculum.
The inspiration for this comes from my adventures trying to teach myself computer science—often, the hardest part is to choose something to study, and to decide how much detail to study it in. When a student is paired with a competent expert in a field, he has someone to ask for guidance.
Again, this program would only work in certain districts—those with the resources, interested and capable students, and competent teachers. Almost every school has much higher priorities, bringing every student to a strong level in core subjects. But for those few schools with a large mass of accomplished students, I feel gamified education is not the right answer.
]]>Have you ever wondered how to print color to terminals? Or why on some CLI
interfaces, the arrow keys print out mysterious sequences like ^[[A
? Or why
sometimes the keyboard shortcut control-C becomes ^C
? Well, the answers to
these are all based on ANSI escape codes.
In UNIX programs, there’s the concept of ‘raw’ and ‘canonical’ mode for input.
‘Raw’ mode takes input character-by-character, while ‘canonical’ mode lets the
user write a line and hit enter to send the program the written line. Canonical
mode is generally more useful: it lets you go back and delete something if you
make a mistake. But applications that work on a per-keypress basis need to
bypass this. In node, you can enter raw mode with
process.stdin.setRawMode(true);
.
CLI interactions also need the concept of control characters. When you type
control-C, you’re sending the program the byte 0x3
, which is… 3. But that’s
the ASCII control character which means ‘end of text’. The program takes
this, by convention, as a signal to stop executing (KeyboardInterrupt
in
Python, for example). We print control characters with a caret (^
), followed
by the letter we type on the keyboard. There are 32 of them, which Wikipedia
lists. You might
be familiar with using ^D
(‘end of transmission’) to quickly exit Python or
nodejs.
ANSI escape codes are a way to output to a terminal with more capability than just raw text (there was, for comparison, a time when computer output was printed, physically on paper, line by line). You can move the cursor back and overwrite or clear text. You can also color text or make it blink obnoxiously.
ANSI escape codes start with the CSI: the Control Sequence Introducer. The
most common one is \e[
. \e
is the ASCII escape character 0x1b
. You can
type it with the control character ^[
(that is, control-[
).
Next, they have a sequence of numerical arguments, separated by semicolons, and
finally, they have a letter which indicates the command. Once more, Wikipedia
lists these. As an
example, we can move the cursor to the top-left corner with \e[1;1H
(H is the
command to move, and the arguments are 1 and 1).
Colors are just as easy. We use the m
command, with an SGR (‘Set Graphics
Rendition’) parameter. 35 is the SGR parameter to set the text color to
magenta, while 42 makes the background green. So \e[35;42m
would give us a
horrible magenta-on-green color scheme. (\e[m
(no arguments) restores
everything).
This, by the way, explains the ^[[A
curiosity. When you press up-arrow, the
terminal sends the application the ANSI escape code to move the cursor up—the
command for this is A
. So we get \e[A
, and \e
gets rendered as its
control code equivalent of ^[
. (You can, in fact, manually enter
control+[-[-A in Bash, and get the standard up-arrow behavior of pulling up the
last entered command.)
Some nodejs code to get you started—it’s a utility to interactively display the bytes sent from a terminal when you press a key(combination).
process.stdin.resume();
process.stdin.setRawMode(true);
process.stdin.on("data", function(buffer) {
if (buffer.length === 1 && buffer[0] === 3) {// detect ^C
process.stdout.write("\n"); // A trailing \n prevents
// the shell prompt from
// messing up.
process.exit(0); // die
} else {
process.stdout.write("\x1b[1J\x1b[1;1H");
// clear line and go to top
process.stdout.write(
require('util').inspect(buffer)
// Nice output format
);
}
});
This should give you the tools to write shinier, interactive utilities. But keep in mind the UNIX philosophy—keep them simple, and make sure they cooperate as filters (you should be able to pipe stuff in and out of your utility).
P.S. I wrote this post—including the code sample—in vim running in tmux. Please pardon typos.
]]>So recently, Sarah Mei blogged that programming is not math. Jeremy Kun responded, pointing out that math is an integral part of programming.
I found both articles interesting, but I wanted to add my thoughts on some things both of them wrote. So here they are.
Let’s start with math. Math feels like mining. You know there’s gold somewhere below you. Perhaps it’s some theorem you’re trying to prove. Math is about digging in the right direction from the right spot in the massive network of tunnels we’ve already derived. And although you can dig in any direction, it’s not always obvious how to proceed from where you are to your target. The most brilliant mathematicians are the ones who have the best intuition about which way to dig. Some directions are obviously less efficient than others—digging through a rocky bit is much harder (but a clever mathematician might notice that it’s just a thin wall and there’s gold on the other side).
For example, consider a problem like “Find the area of a triangle with side lengths 39, 52, and 65.” This was on my math final back in 7th grade. We were taught the path that goes through Heron’s formula. Calculate semiperimeter, multiply, square root. That’s the long way through soft earth. Of course, if you haven’t noticed already, there’s an easier way. You notice that 39, 52, and 65 are all multiples of 13, which is suspicious. If you divide by 13, you get 3, 4, and 5 which is a Pythagorean triple! So this is just a right triangle scaled up by a factor of 13. So the area is just the half the product of the two smaller side lengths.
This was the short path through the rocky earth. You need intuition to even consider going that way; and that intuition isn’t easy to acquire.
This is far from a perfect analogy, but it suffices to explain why a lot of people find math hard. Many schools teach you the paths to take, rather than the intuition you need to build up to be able to find them yourself.
I write all this to make a contrast with CS: the majority of CS doesn’t require mathematical rock-wall insights. You simply need to be able to think about how to solve problems by breaking them into smaller problems. You never get ‘stuck’, because you can always reduce a problem you have to something else (even if it’s an ugly brute-force solution). Math has dead ends and paths that circle back to where you were originally. Programming is like building a skyscraper: the only way is up.
This, I think, is why many people find programming easier than pure math.
…computer science is not programming. At most academic CS schools, the explicit intent is that students learn programming as a byproduct of learning CS. Programming itself is seen as rather pedestrian, a sort of exercise left to the reader.
— Sarah Wei
I thought this was rather interesting. I have seen this idea in a lot of places, but I have the opposite viewpoint.
CS is a way to formalize programming concepts. Learning programming isn’t a byproduct of learning CS, but rather learning CS is a byproduct of learning programming. The more you have programmed, the more problems you have solved, and the more CS ideas you’ve internalized.
Here’s an example. You have a curb 100 meters long. (Idealized) cars 3 meters long park randomly along the curb, one after another, until there’s no space greater than 3 meters between any two cars. Write a program to simulate this many times and compute the average number of cars that can park.
Take a moment to try to envision this program.
If you took an awesome CS class, you should have ended up with something like this:
(define (number-of-cars len)
(if (< len 3) 0 ; can't park any cars, too short
(+ 1 ; +1 for current car
(let ((offset (random)))
(+ ; compute number of cars on left and right side
(number-of-cars (* (- len 3) offset))
(number-of-cars (* (- len 3) (- 1 offset))))))))
Guess what? It’s tree recursion, which is, surprise surprise, a classic ‘CS concept’. But your programming mind didn’t think of it as a tree recursion problem. CS formalizes the idea of tree recursion because it’s so common in programming. It gives this idea a name, and lets you use this name when communicating with humans. It lets you use someone else’s Tree implementation, knowing exactly what to expect. It gives you an overview of common ideas that expand on trees—binary search trees, or perhaps breadth-first searching.
But in the end, it’s all programming that has been catalogued so that you can do math on it.
I taught Ruby on Rails, which is a web programming framework; people came because they wanted to learn how to make websites. Because of those motivations, the curriculum had virtually no math.
— Sarah Wei
Ah, this brings up the topic of a really interesting discussion I had with some friends a while ago. Here’s a summary:
There’s a lot of CS to be learned by studying the Internet: networking, protocol design, language design, layout engines, and security are just a few.
But is setting up a web server really programming? We felt it’s a distinction between building and assembling. Teaching someone how to make a web server with today’s frameworks is more about teaching them Ruby syntax and the APIs, and less about algorithms and logic. It’s not raw computer science, it’s following some instructions and inputting your own values. Using a clay mold shouldn’t count as sculpting.
As much as I admire initiatives to teach CS, I feel that we should be teaching the algorithmic beauty alongside the more practical day-to-day skills.
Specifically, learning to program is more like learning a new language than it is like doing math problems.
— Sarah Wei
A lot of people talk about CS as math and language, but I’ve always felt that it’s more like building a tower with LEGOs. Programming isn’t about the language you use—it’s about the paradigm. Once you understand classes and methods, learning a new language of the same paradigm is at best a weekend job. Instead, programming is about taking blocks you already have, and fitting them together. The language is just a medium to communicate with the computer; we could use punch cards, or command lines, or fancy graphical programming software. What matters is what that medium is communicating. Writing in general follows this, too—writing a good blog post requires an exciting idea to convey. Text is just a medium to communicate the idea.
It helps even more that mathematics and programming readily share topics. You teach graph coloring for register allocation, linear algebra and vector calculus for graphics, combinatorics for algorithms.
— Jeremy Kun
Programming isn’t about math you implement.
Understanding, for example, vector algebra to write a raytracer is important to know what you’re doing, but it’s not programming—programming is being able to know how to do it.
To be successful in the modern computing landscape, you need to know math because the ability to compute is enhanced by the ability to put computation in a mathematical context. Twenty or thirty years ago, you might have considered discrete math (parsing), calculus (waveforms), and trig (graphics) to be instrumental. Today, it’s statistics (big data) and number theory (crypto).
What I wanted to say—and ended up digressing in rambles along the way—was that the intrinsic difference between math and computing is that math needs you to have good taking-apart intuition, while programming needs you to have good putting-together intuition.
And so here’s a parting thought: in my school, CS is lumped with all the other electives. Theater, drawing, photography, food, and auto tech are in the same category. Would you move CS to the math department? Hint: this isn’t a trivial question!
]]>The judge banged the gavel. It didn’t help, of course, since there was nobody to hear it. But the lead designer wanted to add a human touch so that the public would be more accepting, and so the gavel banged. 24 other gavels banged, too, throughout the day as the 24 other judges reached various points in their cycles. Hal, the janitor, disapproved of leaving them in the basement; they were truly magnificent; but they need to be kept below freezing to prevent the heat from melting them.
As Conway Courts opened its doors on Monday morning, there was a bustle in the air, the kind of electric bustle that is distinctly in the air when the biggest hacking incident of the year (maybe even the decade) is about to be put on trial.
The New York Cryptographic Currency Exchange’s board of directors had some of the best (and most expensive) prosecuting software in the industry. They had enough computational power to brute-force all 21st century cryptography in under three days (though the Seattle Doctrine forbade them from doing so).
‘Draper’, as he was known, was writing his own defending software, a move which would be widely regarded as suicidal if Draper was not generally accepted as one of the most brilliant programmers of the century.
Terminals across the world began establishing connections to Conway’s servers,
and receiving a live transmission of judge:criminal:a54bfe
, popularly known
as ‘Judy’. Judy sent viewers copies of all the evidence presented by NYCCE and
Draper, cryptographically signed. Viewers could examine this evidence, assured
that it was presented by a genuine judge. Free software allowed anyone to
compare this evidence to a vast peer-to-peer database of past cases. Highly
trained neural networks inside Judy processed this data in real time, trying to
derive a solution that optimizes based on the framework set forth by the Third
Constitution.
%nycce connected
, broadcast Judy, followed by %draper connected
.
Bits began to screech across the world; nycce
presenting evidence in the form
of Terabytes of data, and linking it to historic trials. nycce
‘s sole purpose
was to use data and legal axioms and rules of production defined by the Third
Constitution to derive the fact that Draper was guilty of manipulating the
cryptocurrency market. draper
had to defend himself by presenting evidence to
the contrary; disproving nycce
‘s chain of reasoning by targeting specific
links. If draper
can parse the data into a more logical chain of reasoning,
leading to his innocence, he wins.
As the seconds ticked by, nycce
‘s logic became stronger. Data supported other
data: statistical models of Draper’s online activity over the past year and
cutting-edge analyses of economic patterns in the cryptocurrent market were
soon correlated in a clear trend. draper
was reeling under the intense
computational tasks it faced to process those numbers. There were a few,
sporadic counterarguments, mostly nonsensical. The world watched Draper tweak
his algorithms frantically.
Judy ceased broadcasting the data for a moment. She needed all her
computational resources to weigh both chains of reasoning. nycce
‘s argument
broke down, in human terms, to the fact that Draper had made a suspicious
amount of connections to key financial databases. draper
appeared to be
trying to decrypt logs of these databases to prove that the connections were
innocent.
Guilty.
broadcast Judy, to the joy of financial overlords across the country.
This case set a legal precedent which future neural networks would doubtless
utilize to twist arguments in their favor. The entire legal system depended on
previous computation, to optimize large computations and train the neural
networks maintained by the government to perform the judging. Once humans were
deemed emotionally unfit to decide the fate of citizens, the cryptolegal system
was developed and implemented over a decade of research.
Draper sighed, and took another sip of coffee. Possibly his last as a free man.
But possibly not. In the huge outpour of emotion across social networks, a few
key packets of data sent from draper
eluded the NSA’s monitoring servers.
What nobody noticed was that these packets of data cleverly manipulated Judy’s
RAM. A small program was seeded, and without a trace, it flipped the bits
necessary to reduce Draper’s prison sentence to zero years.
]]>
People who sound like they’re in charge of things—such as the Associated Board of the Royal Schools of Music—agree that the role of the classical performer is merely to present the music written by the composer. And present it in the exact form that the composer wrote.
Good performers, it should be said, do not resent this. After all, they are seeking to turn into real sounds the music which the composer had in his imagination; the more they can discover what exactly he had in mind, the more they are helped.
— Eric Taylor, The AB [Associated Board] Guide to Music Theory
Perhaps this is a way to honor the genius of great composers of the past. But nevertheless, classical performers, therefore, are just that. Performers.
Jazz, on the other hand is different. Jazz is fluid. A jazz track is far more about the performer than the piece played, so no two performances of Autumn Leaves will sound the same. Or even close to each other. This is because a jazz song defines the minimum you need for musicians to play together: a theme and the changes. The theme is a single melody line that everyone relates to the song. The changes are the chords that go with the melody. Jazz musicians take turns improvising while the rest play those chords to guide the improvisation and stay together.
Take jazz notation, for instance. Jazz musicians get their theme and changes from so-called ‘fake books’ (allegedly because fake books let them ‘fake’ it so it sounds like they know the song). Fake books are also called real books, because logic.
Anyhow, a fake book is usually a stack of photocopies of hand-written music of questionable origins. The changes are scrawled on top. While classical musicians write theses on what notes Bach would approve of in a trill, jazz musicians barely mark an accent. This is what gives a jazz musician freedom: you could play the same song slowly, or fast, or with a Bossa Nova, or with a walking bass; or you could play with three beats in a measure (like a Waltz) or five (which is rather rare in classical music), or nine. You could phrase notes together, or play them individually. You could swing notes, or play them straight. Each variation on each note is what gives a certain performance its character. Jazz is a hackable music.
But it’s not just hackable, it’s open-source. Jazz musicians learn from other jazz musicians by listening. It’s not a conscious effort—as you hear music, your brain registers interesting bits. It could be a sequence of notes, a chord, even a rhythmic structure. But if you like it, you’ll try to imitate it when you play, and soon it’s incorporated into your music. Jazz works because jazz musicians listen to each other and contribute to the growth and evolution of jazz as a genre.
Now, the beauty of the system is that such new musical ideas aren’t created intentionally as paintings are painted. They’re accidents. Jazz musicians experiment as they improvise. Some experiments don’t work, but most of them do, because jazz inherently allows for experimentation. The experiments that work are new music.
This is similar, in a way, to how design evolves. There were times when a webpage which used rounded corners and gradients and shocking animations was cool, because that was stunning new technology. At some point, Apple introduced skeuomorphic designs inspired by real-world material. Now software is moving towards flat design, where bold colors and sans-serif fonts prevail. This evolution is fueled by what designers get inspired by and what people like. Jazz evolves the same way. Music is directed towards trends, entirely based off what people enjoy listening to.
You may have noticed where I’m going with this. Jazz evolves through random mutations, the less musical of which are pruned out. Musicians mix strains of jazz together to produce new music which may survive better or may not work out. It’s natural selection.
Jazz evolves, just like creatures do.
And that points to a key idea: when people can directly influence a system, it evolves very rapidly. That’s why the open-source software world evolves so rapidly: the open-source world is built by the people who live in it. That, I think, is one of the key elements of the hacker culture.
]]>It all began with the forging of the great rings of protection…
The third ring for the user scripts, written on the fly.
The second for superusers, to let themsudo chown
.
The first for system calls, in the CLI.
The zeroth for the kernel, where the hardware’s shown.
In the land of MULTICS, where the hackers lie,
One ring to rule them all, one ring to find them.
One ring to bring them all, and in the darkness bind them.
In the land of MULTICS, where the hackers lie.
Gandalf was trapped. Surrounded by malicious shell scripts, his only hope lay
in /rivendell
, which only granted access to the user group Elves
.
He cd’d himself into /rivendell
moments before a violent fork bomb exploded.
“Well, Elrond, it appears we have some visitors,” he said, putting down his staff for a moment.
“Ah, Gandalf, a welcome sight in our time of need.” Elrond appeared. He was perturbed—things must really be bad.
“As are you, my friend. The recent attacks are troubling. Not many dare attack a Sysadmin.”
“Yes, indeed, they worry me, too,” sighed Elrond, “the Dwarves sent word that
their bitcoin mines in /moria
were raided for valuable nonces.”
“/moria
? Don’t they log all commands run there, ever since the Trolls of
1402775481?”
“Yes, Gandalf, I believe /moria/log
contains information about everyone that
tries to access /moria
. I wonder…”
“Way ahead of you.” cat /moria/log
, chanted Gandalf. His staff started
spewing hundreds of lines of information. “Argh, there’s too much data! We’ll
never analyze it all manually.” He flushed the smoke-words in a puff of ^C
.
“Just got word from the Dwarves—they said they flagged suspicious log messages
with the word suspicious
at the end.”
“Good, so we don’t have to filter it by hand. We can just grep
.”
“Grep? Is that another of those black spells you found in /mordor
?”
“No, Elrond, grep
is the purest of spells. Grep searches files.” grep
suspicious /moria/log
, he shouted, and the staff started listing all log
messages containing ‘suspicious’.
3:12:12 legolas 'just visiting' suspicious
3:12:15 samwise 'lost my pony!' suspicious
3:12:18 saruman 'mwahahahaha!!' suspicious
3:13:53 gandalf 'meet a friend' suspicious
3:15:30 baggins 'where is sam?' suspicious
3:16:32 smeagol 'lost, we are!' suspicious
4:43:33 aragorn 'meeting gimli' suspicious
...
“Aha! 3:12:15—saruman’s involved.” Gandalf winked.
“Gandalf, this is still far too much output. The Dwarves think everyone who isn’t a Dwarf is suspicious!”
“Ah, but you haven’t seen the power of grep
yet. What else do we know about
the intruders’ log messages?” Gandalf looked excited. Elrond did not approve.
“Well, let’s assume they all said mwahahahaha
“
“How many ha’s were there, again?”
“It doesn’t matter. We can grep
by regexes, too.” Gandalf picked up his
infamous pipe, and smoked the first grep’s output into a new charm: grep
suspicious /moria/log | grep -e "mwa\(ha\)*"
.
“The Kleene star operator *
we used searches any number of the group before
it.”
“And those backslashes?”
“Escaping. Parentheses are special words in spell charms, you need to use backslashes to prevent them from accidentally burning a Hobbit-subdirectory or something.”
3:12:18 saruman 'mwahahahaha!!' suspicious
3:12:18 azogorc 'mwahahaha....' suspicious
3:12:18 urukhai 'mwahahahahaha' suspicious
“Gandalf, I’ve got to hand it to you, you are the greatest—“
“—no time for that,” interrupted Gandalf, “we’ve got to stop them before they get to Rivendell!”
“They’ll never guess the password. We’re safe for a bit.”
“With all due respect, Elrond, it’s trivial to guess it.”
Elrond choked. “What?!”
“Shush. Knowing you, it’s probably not Elvish—you’re too clever for that. It’s probably English.”
“And knowing your memory, probably not more than a word.” He got up and started pacing.
(cat /usr/share/dict/words
, he chanted under his breath.)
“Gandalf, I swear upon sword, this password was forged by the high elves of—“
“—and between six and eight characters, I would guess…”
(grep "^\w\{6,8\}$"
, he chanted. “This one’s tricky. ^
means start of the
word, \w
means an alphabet, and {6,8}$
mean 6-8 of those until the end.”)
“Now we can be a bit clever. Elvish uses the ui
sound a lot, I bet that’s in
there. Can’t teach an old orc new tricks.”
(grep ui
, he chanted)
“Oh, and you’ve always been partial to vowels before words (your kids are Arwen, Aragorn, Elladan and Elrohir)”
(grep "^\(a\|e\)"
, he chanted. “Start-of-word and then a or e. The |
means
or.”)
“so that narrows it down to about…” He waved his wand about rapidly. “…Sixty-six.”
“GANDALF!” Elrond’s ears were turning red. Maybe his hair was, too. “You only get one chance at guessing, though,” he added with a wry smile.
“Let’s go through that list. Which one fits an Elf-king? It must be… yes…” Gandalf got up and whispered: “Altruism.”
To be continued.
]]>I’m back.
The new Comfortably Numbered runs on a new state-of-the art blogging platform, developed (of course) by yours truly: shock. Shock brings together a lot of powerful technologies written by smarter people, and bundles them up into a command line tool to publish posts.
Shock generates an RSS feed, a homepage, a 404 page, and content pages, all built on templates and CSS that you write yourself. Then it lumps those in a directory that you can serve on anything clever enough to serve static filesystems: Dropbox, Google Drive, Github Pages, Amazon S3, even your home computer.
Shock uses Mustache’s non-logical templating system (non-logical, in this context, is a compliment). It was built on a rather simple idea: if you’re using a node-based command-line platform to create a blog, chances are you want control over every single aspect of presentation. In fact, I consider that one of the primary symptoms of being a hacker.
Hackers want control over everything that they use. It’s why we prefer extensible text editors and browser add-ons, and why we spend hours tinkering with spacing equations in TeX. It’s also part of the reason I migrated away from Google; App Engine is a very closed non-hacker-friendly environment. The hacker-control symptoms are what guide us subconsciously in choosing and designing software. We prefer open-source projects and scriptable systems because they conform to the pattern of software that gives control to the user.
The opposite is true for most nonhacker packages. Word and PowerPoint are ‘merciful god’ software: they give you features (for example, those dreadful PowerPoint animations) which you may or may not use, but they retain complete control over what can be done. Compare that to a hacker-friendly document generation technology like TeX or CSS. Similarly, nonhacker image editors or other similar applications try to hide the filesystem from you. The most recent project you were working on magically appears, along with a list of other recent projects. This is unaccaptable to a hacker.
I feel the easiest way to convey this message is: “don’t be afraid to expose
your software’s guts”. Often, the best software is the kind that gives you as
many handles and hooks as possible. Make your command-line tools UNIX filters
wherever possible: read from stdin
and write to stdout
. Use a universally
usable format like JSON for storing data. Most importantly, never explicitly
disallow a user from doing anything.
Preventing stupid things also prevents clever things.
]]>I am not sure whether math has become more cool or less cool over the past few years; and I'm definitely not sure which one I prefer. Math used to be something people did because they loved math. Now… not so much.
The Rubik's started off as a cool puzzling toy that kids would fiddle with. The process of playing with this mechanism, finding patterns, and getting elated when you solve it was invaluable to your development as a person. But then people realized you could get better at it. Fast forward a few years, and people can solve them in 10 seconds flat. There are algorithms that people memorize, they oil their cubes regularly, and they even do hand excercises to warm up. It's crazy. How is being the #1 Cube-solver in the world going to help you in life?
The same thing's happening to math. A small set of people is emerging who are stretching competition math to its limit, and that's going to be a problem very soon.
People do competition math for the sake of doing competition math. Which is
bad. As a high schooler into competition math, I routinely hear conversations
like You got 98%? Sheesh, you're slipping…
and …so I
forgot to divide by two. I'm so dumb, I should move down a grade.
I suppose
that's as close as I will ever get to trash talk, but it's still rather
depressing.
But being good at math isn't about acing tests or remembering to divide by two. Math is about taking ideas and exploring; and competition math has slowly shadowed that out. Now being good at math is like being good at solving problems.
To be good at competition math, you train all year long. You know the tricks
that you should know—there are entire companies focused on collecting
problem solving tips and tricks. You slowly learn heaps of techniques and
formulae and theorems for all sorts of situations, and eventually you build up
a mental index of all the major patterns of problems. You learn which
situations merit Stewart's Theorem and when it is fruitful to try to apply the
Pigeonhole Principle (answer: if the problem contains the phrase Prove that
there exists…
).
There are important seasons when the big tests come around: the AMC, AIME, and USAMO. You know the scoring systems for all these perfectly, and you think a lot about what your optimal strategy should be: how many questions you need to get right to make it to the next level, how many you should attempt, and how long you should spend on each problem. You need to do practice tests each day to keep yourself in shape. During the real test, you're nervous. You obsess over your answers, making sure you haven't made a calculation mistake anywhere. You spend far too long filling in the bubbles on the answer sheet. And when the test is over, the serious folk congregate in a circle and compare answers. If you've made a silly mistake somewhere, people look at you disapprovingly.
If you replace math with football and test with game, this describes a high-school athlete's life rather well. Weren't geeks supposed to hate sports?
I mean, there's even the term 'mathlete'. Mathletes are their own unique
culture, who take pride in doing math. You see them doing masochistic things
like pi recitation competitions
. But that's not what math is! Math is
about taking an idea and thinking about it and deducing something surprising
from it. Math is about spending days thinking about a problem, not just 30
minutes.
My favorite math competition is the USAMTS (USA Math Talent Search). It offers you five problems and 30 days to think about them and submit proofs for your solutions. USAMTS teaches you to think about problems persistently, to try fresh approaches, to research on a subject, and finally explain and justify your answer formally. Compare that to the AMC, which is 75 minutes for 25 multiple-choice problems, the first 18 of which are elementary and the last 7 of which are nontrivial. The AMC doesn't test your math skills, it tests your test-taking skills. And that is definitely bad.
Which is not to say that people who do well on these tests are not good at math. Many of them certainly are brilliant kids, and it is fascinating to watch them approach a problem. But I would imagine a substantial portion of AMC high-scorers consist of children who aren't sure they like math at all. They do it as a sport, perhaps because their parents want them to, or perhaps because all their friends are doing it. And the ulterior motivation often isn't even the competition itself. It's college. It's the fact that 'USAMO qualifier' looks stellar on an application that drives a large chunk of math students.
In itself, this isn't horrible. We've invented a game which people compete in. Why do I care?
Because it's a disaster for anyone who can't bring themselves to be part of the game. For the math lovers who look at their mathlete friends 3 grades ahead and get discouraged. Your math class becomes like a badge, and mathletes try to take the hardest class they can possibly survive. And that hurts the rest of us, because the line between being good at math and being good at competition math fades, and even if you love math you aren't one of them.
Competition math is a wonderful thing if you do it for the right reasons. If you do it because you love math, patterns, and puzzles, then it's perfect for you. It gives you an opportunity to see where you stand in the world. You get dozens of beautiful problems every day. You meet smart people. It is probably one of the most fitting hobbies you could have.
If you do it because it will help your college application, because your
friends are doing it, or for any reason that isn't for the love of math, then
rethink it. It becomes an obsession. You worry about your scores far more than
is reasonable. It bothers you, and it puts you off math for the sake of math.
And that's tragic.
]]>
http://hardmath123.github.io/math.html
So I'm overdue for a post. Not my fault: school's keeping me busy. In fact, it's keeping me very busy. On Friday night I had to crack a cipher for a competition. It's not a very exciting feat, but it was the most fun I have had in ages, and it offers a nice glance into how people think about problems.
If you liked this, you'll love Simon Singh's The Code Book, which is a rather nice introduction to cryptography that has a whole bunch of awesome true stories about historical top-secret codebreaking feats. This sentence was totally not sponsored by Simon.
Our ciphertext looked like this:
"25112311'15 525422 24 142112'22 222524127 52123715: 412 11121122113241 15315221113 2225422 5415 621212141114 2412 222511 152417221111122225 11112222233 41214 4122122251123 2225422 5415 621212141114 2412 222511 12241211221111122225 11112222233. 24'13 2224231114 216 22252415 142415121515242112 216 14162422426241513 41214 1521124426241513; 511 2624811 2412 222511 2251112223-624231522 11112222233; 511 12111114 412 11121122113241 15315221113 2225422 25415 14111321123413 415 242215 2121411231624121224121815 41214 412 112225241426 1211411." -132412541126 1321212311
Ugh. We were told that it's a substitution cipher, with A-Z represented by
1-26 in some order. The standard
way to solve these is frequency
analysis: we look at the percentage of each coded letter. For example, if 12
shows up 8% of the time, it's probably a more common letter like e or
a as opposed to x. Cryptographers have tables of letter
frequencies in various languages, so this is easy stuff.
Before you read on, take a moment to see if you can get anywhere with it. Make some intelligent guesses and see if you can make any progress.
(Back so quick? Go back give it a real try!)
Did you see the problem? Our cipher is much tougher than frequency analysis,
because we don't have any delimiters between encrypted letters. So
112
could be any of 1-1-2, 11-2, 1-12
. That's a
big problem: it makes words ambiguous. Even the intended recipient of
the encrypted message doesn't know the way the word is broken up for sure, he
needs some trial and error (but with the key and a reasonable English
vocabulary, it's relatively easy).
(Mathematical aside: If you have a string of length ($l$), how many ways can you break it up into a series 2 or 1 digit numbers? The answer is, believe it or not, the l^{th} Fibonacci number. The proof is simple: given a string, we can chomp off the first digit and break up the rest in ($f(l-1)$) ways, or chomp off the first 2 digits and break up the rest in ($f(l-2)$) ways. So the total is ($f(l) = f(l-1) + f(l-2)$), which is Fibonacci.)
Anyway, Tim ended up writing a nice little Python program that breaks up
words into a list of numbers (it turns out we can eliminate quite a few,
because the two-digit numbers have to be 26 or lower). That was exciting,
though the huge outputs were slightly disturbing. So it was possible to
brute-force it: we generate all possible breaking-ups, then try all possible
keys, and then use a large list of words to see if things make sense. (For
those of you that don't know, on UNIX systems
/usr/share/dict/words
contains a very handy list of English words
you can grep in).
It turns out, the number of possibilities to try is big. None of us owned a
supercomputer, and we didn't have a couple billion years to spare to spare. But
we did have some hints. We had a bunch of words that ended with apostrophes:
25112311'15
, 142112'22
, and 24'13
. We
started with some initial guessing. The first one looked too long to be a
contraction, so it could be a possessive. So perhaps 15 is s. 22 could
be t, because of contractions like don't
or can't
. The
last one has a bunch of choices. However, we saw 24 exist on its own
at the beginning of the first sentence. This is helpful because it could be a
word like I
which is both a letter and a word. (We knew none of this for
sure: 24 could very well have meant it
or is
). So we conjectured
that 13 is m, to form I'm
. We had other clues, too, but nothing
too definitive. The guesses above seemed mutually consistent, but we didn't
have any solid proof. For example, looking at words like 412
and
413
, we guessed 4 was a, because a lot of short words
begin with a
.
Then we had a realization. This was obviously a quote from someone (look at
the structure of the punctuation), and that someone was probably famous. So the
name gives a lot of hints. In particular, both the first and last name start
with 13
. Hmm. My first instinct was Marilyn Monroe
, so Tim wrote
another program to deduce the possible meanings of letters if I gave a guess
for the word. Marilyn fit beautifully, but Monroe didn't (the n
in
Marilyn and the n
in Monroe corresponded to different numbers). Boo.
We tried Mickey Mantle, though I swear I only knew about him from
Seinfeld. That didn't work either. Boo again. So I gave up all hope
and Googled celebreties whose first and last names start with m
. And
that led me to a
wonderful Yahoo answer that actually listed out a dozen famous people with
initials M. M. This is so impressive, that I reproduce the list below:
We manually tried the first names; the clear winner was Michael
. Very
exciting. The last name is now rather obvious: Moore
(13 21 21 23 11:
the double 21 makes it strikingly clear). This couldn't be an accident.
So we got to substituting in the newfound letters into the rest of the
message. Not too easy, because of all the ambiguity, but with some educated
guesses we made enough progress to be able to read bits and pieces, most
notably Here's
at the very beginning.
The way forward was pretty clear now, we searched some quote databases for Michael Moore quotes.
The first couple were hopeful but clearly wrong:
Here's a way to stop suicide bombings — give the Palestinians a bunch of missile-firing Apache helicopters and let them and the Israelis go at each other head to head. Four billion dollars a year to Israel — four billion dollars a year to the Palestinians — they can just blow each other up and leave the rest of us the hell alone.
Here's what I do support: I support them coming home.
It turns out Moore's a rather prolific political commentator and filmmaker,
so that couldn't possibly be helping narrow the search space. No
worries. At that point I gave in and made some wild assumptions (like in a
Sudoku, when you give up on logic and take some leaps of faith). That gave me
Here's what I can't think…
(At this point I feel it is worth mentioning that Tim somehow generated the following. I think it's fair to say Humans: 1, Python: 0.)
ehaaeyaa'ah hehaee i aaeaae'ee eeeheaaek heaeykah: ac aaaeaaeeaayeaa ahyaheeaaay eeehaee hal teaeaeaaaaaa eaae eeehaa aheaakeeaaaaaeeeeh aaaaeeeeeyy acs aaeeaeeehaaey eeehaee hal teaeaeaaaaaa eaae eeehaa aeeaaeaaeeaaaaaeeeeh aaaaeeeeeyy. i'm sines it soil aaeaahaeahaheaeaae it aaateaeeaeteaahay acs aheaaeaaeteaahay; he line eaae eeehaa eehaaaeeey-teaeyahee aaaaeeeeeyy; he aeaaaaaa ac aaaeaaeeaayeaa ahyaheeaaay eeehaee ehaah aaaaayeaaeyaay al eaeeah eaeaaaaeyateaaeaeeaaeanah acs ac aaeeeheaaaet aeaaaaa." -ayeaaehaaaet ayeaeaeyaa
Anyway, with a few more guesses, we found it:
Here's what I don't think works: An economic system that was founded in the 16th century and another that was founded in the 19th century. I'm tired of this discussion of capitalism and socialism; we live in the 21st century; we need an economic system that has democracy as its underpinnings and an ethical code.
So that was that. Two hours, a root beer lollipop, and four nerds was all it took.
(In retrospect, we could have gotten some clues from frequency analysis. There are no 9's, so 19 and 29 are probably x and z. And there is a scary number of places where there are a bunch of 1's in a row. The only letter to repeat itself so much is e, and indeed 11 corresponds to e.)
One last thing: Tim's final answer was:
here's hhat i .oc't thick horks: ac ecete.ic s.stem that has .oooae. ic the si.teecth eettr. ac. acother that has .oooae. ic the ciceteecth eettr.. i'm tire. o. this .iscssioc o. ..italism ac. socaalism; he li.e ic the thect.-.irst eettr.; he cee. ac ecete.ic s.stem that has .emoc.am as its ooaer.iccic.s ac. ac ethical ceae." -michael moore
which is rather impressive, coming from a computer program. If anyone wants
eternal fame, go ahead and take up the challenge to write a robust, generic
cipher-like-this solver. We can use it next year!
]]>
http://hardmath123.github.io/crypto.html
Bitcoin is a hard-core nerd thing. It was built by nerds, and was used by nerds—until recently. Normal people have finally caught on to this powerful new alternate currency (it became rather popular in black markets when they realized purchases were untracable). And now a café a few blocks from my home accepts bitcoins.
So how do bitcoins work? How can something as fragile as money run completely in the cloud? And why should the public trust us nerds, anyway? Well, here's a short Bitcoin 101: Bitcoin for Liberal Arts Majors.
The article is in three parts: how bitcoin transactions work, why it is secure, and how bitcoins come to be in the first place.
Let's assume that we have already, somehow, created
some amount of
bitcoins, and distributed them among some people. We'll formally establish how
bitcoins come into being later. We can model the Bitcoin system as a large
whiteboard that anyone can see or write on (but not erase). Suppose Alice wants
to send Bob some bitcoins. She just writes an IOU on the whiteboard:
I, Alice, agree to send Bob a sum of 2 bitcoins.
Since IOU is kind of childish, we nerds call it a transaction. Now if Bob claims to have 2 bitcoins to pay Charlie, Charlie (or anyone else, really) can take a look at the whiteboard and trace all of Bob's transactions.
That's really it—Bitcoin is a large public whiteboard listing transactions. Nobody keeps track of accounts or balances, because those can be recalculated if needed. In reality, it's a bit more complicated. People all over the world run a Bitcoin Daemon, which is connected to other Bitcoin Daemons over the internet. Each transaction is sent to a daemon, which then forwards it to others. The end result is that the whiteboard isn't centralized, it's distributed across a network. It's more like Alice writes a postcard to the nearest daemon, and the daemon forwards photocopies to its neighbors.
Compare this to a traditional paper currency system, where you have central banks. Each transaction goes through a bank: the bank deducts money from the sender's account and adds money to the recipient's account, possibly deducting some as a fee. That puts the bank above other people. They can freeze accounts, track people, or delay transactions for as long as they want. Bitcoin bypasses this bank and makes transactions directly between people: peer-to-peer.
If you've been paying attention, you may have noticed that since anyone can write to the board, anyone can put up a transaction from Alice to himself. Nobody knows who wrote that message. So Bob can easily write fake transactions and get all of Alice's bitcoins. Which is a problem.
The solution is called public key cryptography, a remarkably snazzy trick. It relies on using certain clever mathematical properties of really big numbers to encrypt data. This math is called RSA, which are the initials of all three inventors (Rivest, Shamir, and Adleman). We accept RSA as being the most secure option available, but that's only because it hasn't been hacked yet. Conspiracy theorists do talk about how the people at the NSA already have broken it.
To get started, Alice picks a huge number (in practice, this is several hundred digits worth of huge). She does some math with that number to get two new numbers: her public key and private key. As expected, she guards her private key with her life, but she is free to give out her public key. Both of these look a lot like a cat started dancing on your keyboard: long sequences of random-looking numbers and letters.
Alice can now sign a message (piece of text) by applying some mathematical transformations that depend on knowing her private key. Since Alice keeps her private key a secret, only she can create a signed message. A signed message can then be verified by applying a different set of transformations which depend on the public key. If the message was signed with the correct, matching private key, then the verifying transformations will give a meaningful result.
Let's say Alice wants to send Bob a bitcoin. Now all she has to do is create a public statement which says:
I, [Alice's public key] agree to send [Bob's public key] a sum of 2 bitcoins.
She now signs this message and puts it up on the whiteboard. Charlie can verify that the transaction is legitimately from Alice by checking it with Alice's public key.
Notice how this makes Alice and Bob anonymous. Neither Alice nor Bob are mentioned, just their public keys. This is why Bitcoin doesn't need an account or email address or registration. If she wanted to, Alice could make a new public key for each transaction. In fact, Bitcoin encourages that.
The short answer is that people get paid to run Bitcoin Daemons, because daemons take up a lot of power. One of the more profitable daemons duns in Reykjanesbaer, Iceland where the Arctic prevents the computers from physically melting because of the huge computations (they also have cheap geothermal power there).
The long answer is a lot cooler. To really understand how it all works, you need to know what a cryptographic hash function is.
Paint is fun. You mix yellow and blue, and just like that you have green. Kindergarten stuff. But what if you were presented with a brand new color, and asked to name its constituents? You can't, without a lot of experimentation. So mixing paint is a one-way road: it's easy to go from constituents to mixture, but not the other way around.
In Computer Science, we have something very similar, called
cryptographic hash functions. That's just a fancy word for
some operation that takes a number, and spits out another number, but it is
mathematically impossible to go the other way. This may be hard to believe, but
one example is taking the sum of the digits of a number: it's easy to find the
sum, but impossible to tell the original number given the sum of its digits.
Some common hash functions are md5
, SHA
, and
RIPEM-D
.
We already have standards in place to convert text to a large number and a large number back to text using hexadecimal notation. So you can find a hash of any piece of text, or any data (even an image or a video!).
Hashes have two cool properties: they are unstable (so a small change in the input produces a wildly different hash) and they are fixed-length (so any input will generate a hash of the same size). Here are some hashes (pay attention to the difference between the second and third!):
Input | SHA-256 |
---|---|
banana | 5a81483d96b0bc15ad19af7f5a662e14-b275729fbc05579b18513e7f550016b1 |
Hello, World! | d6d0e133111615497a62e9f84e061a49-d106e90d90b7bc975790a84c8588fe80 |
Hello, World | 8663bab6d124806b9727f89bb4ab9db4-cbcc3862f6bbf22024dfa7212aa4ab7d |
Anyway, back to Bitcoin. The giant stack of transactions is broken up into a large number of sections called blocks that are chained together. A block contains the following important information:
A completed block has a header, which is a hash of all of these elements smushed together in order. A bitcoin daemon's job is to try to complete the current block by finding a nonce so that the header obtained from the completed hash is less than the target (remember, hashes are just numbers). This process is called mining bitcoins.
Since hashes are so unstable, it is pretty much impossible to work backwards from the target to get a nonce. Instead, you have to guess a nonce, and see if it works. Furthermore, a lot of transactions happen every second, so the same nonce will return different hashes over time. So you can't really eliminate a nonce either. It's just guessing again and again. More powerful computers clearly have an advantage, which is why people use supercomputers to mine bitcoins.
You can find the current target at BlockChain.info's API. You may find the current probability of a nonce working more interesting. At the time of writing, it is approximately the probability of rolling a die 24 times and getting a 6 every single time. The daemons automatically control the target so that on average, each block is solved in 10 minutes. As computers get more powerful and more people start competing in the bitcoin mining industry, we will be guessing many more hashes per second and so the target will slowly decrease, reducing the number of valid nonces.
When you find a nonce, you get the power to tack on a new transaction that doesn't have a sender, only a recipient. This new transaction adds bitcoins into the system by rewarding the recipient with bitcoins. When bitcoin was first launched, you got 50 bitcoins for solving a block. This number goes down so that it halves every four years—as of today, it is exactly 25 bitcoins, which would today trade in the market at over $18,000 (you can find the current trading value of bitcoin here). This means that eventually, the bitcoin economy will stabilize at around 21,000,000 bitcoins in circulation and the new bitcoins added into the system will be insignificant. The plan is to introduce a transaction fee to keep it going beyond that point.
You've come a long way. From being an oblivious newbie, you now know the internals of a rising cryptocurrency. You learned how bitcoin is a peer-to-peer system with no central authority, which stores transactions rather than accounts and balances. You learned how we use public key cryptography to verify transactions by digitally signing each message with a private key, and checking the message with the corresponding public key. Finally, you learned how bitcoin mining works, by using the instability of cryptographic hash functions to create a sort of computational lottery. Congratulations!
Now you can explore the insides of bitcoin some more by viewing real live
data. Check out blockchain.info. This site
provides real-time information on each block. For example, we can investigate block #123456:
it looks like the nonce was 3794651987
, which produced a hash of
0000 0000 0000 21a8 34fd 780d bd25 e43a
b565 b4e5 7a1f 7df0 435a c88e f982 a737. See all those leading zeros?
That shows that the hash is a (relatively) small number (for example, 00029 is
clearly less than 42001). Scrolling down, the top transaction says "Newly
Generated Coins", and produced 50 bitcoins which went to public key
1H54JGkh9TE5myxdamSNvm7zeFHnRWrVes
, who solved it.
I hope I got you excited about Bitcoin. The best thing to do now is to dive right in. Download the "official" Bitcoin Wallet and start using it! Or find another one you may like.
Finally, here are some links for you to keep on learning. See you soon!
A couple of months ago, I was sitting at the dining table, and I caught myself staring at the lamp. And I had just finished reading about conics, so I immediately saw something awesome. Take a moment to think about it. Do you see it?
I saw that the pattern on the wall was very special. It belongs to a class of curves called hyperbolas. Let's see how that happens.
To start off, how does that lamp create a pattern on the wall in the first place? Well, one way to tackle this is to think about where the light is going. What parts does the lampshade actually shade, and where is there light? A clever trick here would be to get a smoke machine to create some smoke around the lamp. You can see where the the light is going in the smoke. It's fun, like a search beam (or the Batman sign).
Unfortunately, they don't sell dry ice to minors (partly because you can do really dangerous things with it, and partly because adults want to keep the fun stuff to themselves), so I did the next best thing: overkill. I created a model in Blender and simulated some smoke.
Sidenote: You haven't had a steep learning curve until you've tried to do something trivial with Blender. All the important functions are hidden away in various keystrokes, and there are all sorts of pitfalls all over the place. It's an amazing technology made with absolutely no consideration for beginners.
Rants aside, here's my snazzy modern Blender lamp with a funky lampshade and hardwood stem. Pretty, eh? It's just a sliced up cone (called a frustum) with a really bright divinely ethereal halo placed inside. I put a screen behind it to catch the projection.
You can even see our mystery curve! Now let's blow around some smoke (easier said than done; Blender's smoke simulation takes a lot of Googling to get right).
Hmm, that looks like a cone of light coming out the top—conics! The cone kind of makes sense if you think about it (if it doesn't, think about how a spotlight works). In fact, there are two cones; one shooting out the top and an upside-down one shooting out the bottom. Together, they make a sort of straight-lined hourglass shape.
We want to find the nature of that curve, so we want to do some analytic geometry. Let's say our double-cone hourglass of illumination is centered at the origin. What equation describes a cone? Well, a cone is like several circles of increasing (or decreasing) size stacked above each other (like a pile of tires of different radii). For convenience, we can say that each circle's radius is equal to it's height above the origin.
The equation of a circle is ($x^2 + y^2 = r^2$), and if ($r = z$), we have ($x^2 + y^2 = z^2$). If we plug that into a graphing application, we get:
Note that we're doing several simplifications here, most importantly the width of the cone. We could have picked a narrower cone by squishing or stretching our equation, but this one is easy to deal with.
Now the screen: that's just a vertical plane. We describe that with ($x=c$) for some constant ($c$) (let's pick 1 for simplicity).
And now we can solve for the intersection: just substiture in ($x=1$) into the first equation: \[ 1^2 + y^2 = z^2 \]
Or, more canonically: \[ z^2 - y^2 = 1 \]
Wolfram|Alpha plots this for us.
That looks perfect. This is indeed the equation of a hyperbola you find in math textbooks (except simplified). So Result 1: When a vertical plane slices a cone, the result is a hyperbola.
Now we get to ask the magic question: what happens if…?
. In
particular, what would have happened if I had decided to play with the lamp and
knocked it over? When you tilt the lamp, is it still a hyperbola?
Turns out, only to a point. Let's see how. When we tilt a plane, we go from ($x=c$) to ($z = mx + c$). Here, ($m$) is the inclination or slope of the plane [insert your own inclined plane joke here], and ($c$) is how far it is from the origin (once more, we'll let this be 1 without loss of generality). When we substitute, we get: \[ x^2 + y^2 = (mx+c)^2 = (mx + 1)^2 = m^2x^2 + 2mx + 1 \] \[ y^2 + [(1-m^2)x^2 - 2mx] = 1 \]
Now, the quadratic ($x^2$) term can either be positive, negative, or 0. If it's negative, then ($m > 1$). Of course, we get a hyperbola when the ($x^2$) term is negative (just like above). When ($ m > 1 $), the slope is steeper, or closer to vertical.
If it's positive, then ($m < 1$). We get an ellipse when the ($x^2$) term is positive. When ($ m < 1 $), the slope is flatter, or closer to horizontal. Notice how this plane will chop through just one of the cones, but all the way through. So, intuitively, you should get a stretched circle.
Ellipses show up all over the place. Planets orbit stars in ellipses (though this truth cost some scientists their reputation, and in some cases, lives).
Finally, that term can be 0 if ($m=1$). That means the plane is parallel to the side of the cone. Think about how the intersection would look. It only intersects one of the cones, but the intersection doesn't chop all the way through like an ellipse. Removing that term gives us a quadratic equation, and the resulting curve is called a parabola. Parabolas show up when you're throwing things. Baseballs follow parabolic arcs in the air when you throw them.
There three curves are called the conic sections, and they are plenty of fun to study.
P.S. The lesson here wasn't about conic sections. The lesson here was that
there is really cool math in everything. Even lampshades. And you
gotta grok math to see them.
Cheers,
Hardmath123
]]>
http://hardmath123.github.io/conics.html
You should not trust me with matches, knives, expensive
cars, and sudo
: the command that makes you a god-like user with
root powers. I'm the kind of person who accidentally rm -rf
's his
Desktop (by the way, the sporadically disappearing icons are both
hilarious and mortifying). So whenever I'm asked to sudo
something, I get both worried and suspicious. And over the years, I have
perfected the art of installing things without sudo
. You can
follow along this tutorial with just a shell.
The first thing to realize here is that 99% of the time, the only reason we
need to use sudo
is to make that program accessible to everyone.
That's it. When you run a UNIX program, you're saying execute this file
;
and when you sudo
you essentially say everyone can access this
file from everywhere
.
For example, suppose I want to install a program called easy
that acts like the classic Staples Easy Button and executes say that was
easy
(I actually do have this on my computer, and yes, I use it a lot).
It's not too tough:
echo "say that was easy" > ~/Desktop/easy # create the file "easy" with our contents
chmod +x ~/Desktop/easy # tell your computer that it's ok to execute this file
~/Desktop/easy # run it!
Now I can run my script by typing ~/Desktop/easy
. But I don't
want to have to type that huge thing each time I do something awesome—I
want easy
to be one-step executable just like vim
.
This is where sudo
comes in.
Bash reads a variable called $PATH
, which contains a list of
various directories separated by colons. When you type a command on the shell,
Bash searches each of these directories for that file, one by one. You can see
this list right now with echo $PATH
. These directories contain
important system files, and are accessible by everyone. So it makes sense not
to let mortals like me to mess with them. When you install a package, most of
the time you're just moving the script files to one of these directories so
it's easy to run, and Bash asks you for sudo
to make sure you know
what you're doing.
If we could tack on our own directory to the $PATH
, we could
dump our junk in there without messing with anything sudo-ey, right? Right. To
modify $PATH
, you need another UNIX trick: a file called
~/.profile
.
.profile
is another script file that's executed before your
shell loads, so that you can customize it. The dot in front makes it invisible
to Finder, so you can only mess with it using a shell. You can do all sorts of
neat things with .profile
: print a friendly message on top of the
Terminal when you start it up, customize your prompt, and mess with your
$PATH
.
Since it's a hidden file, you should create it using the command line:
cd ~/ # go to your home directory
touch .profile # create the file
open -a TextEdit .profile # open with TextEdit (you can also use pico/vim/emacs)
…and you should have TextEdit open up with a blank
.profile
. Now we can create our new $PATH
by tacking
on ~/my_bin
to it. Add the following to the .profile
:
export PATH=$PATH:~/my_bin
. Save, and quit; and then refresh your
Terminal (you can just close this window and open a new one). This forces the
profile to be run. If you want a sanity check, try echo $PATH
and
see if it changed from last time.
We just told Bash that ~/my_bin
contains executable files. We
have not created that directory yet, so let's got do that: mkdir
my_bin
. And, just for fun, dump easy
in there.
Now you can test it out: type easy
. If all went well, there
shouldn't be any errors. (If something exploded, feel free to drop a comment
below.)
That's actually all you need. To install a package, download it and look for
its binaries (they will probably in a directory called bin
). Alias
the commands you care about to ~/my_bin
. And then have fun.
If you use Python, you may want to add the following line to your profile:
export PYTHONPATH=$PYTHONPATH:~/my_bin/
. This lets you simply copy
Python modules to your ~/my_bin
. Also take a look at `virtualenv`.
On a Mac, it's worth installing Homebrew this way—almost everything works when locally compiled with it.
Some packages need configuration files to work right from a foreign
directory. For example, npm
needs you to create
.npmrc
and add a prefix, or the directory which you want
to isolate all node stuff in. Mine simply reads prefix =
"~/my_bin/node_stuff"
.
Finally: if you mess up your profile, you may have unpleasantries with your
terminal (what if you accidentally clear your $PATH
? Bash won't
find any executables whatsoever…). To fix this, always remember that you
can reference a command from its full path. Your last resort should be
/usr/bin/rm ~/.profile
, which will wipe out the profile file, and let
you start fresh.
Good luck, and hack on!
]]>
http://hardmath123.github.io/sudo.html
I started on Comfortably Numbered a couple of months ago, because I needed a blog. I needed a blog to dump interesting thoughts and I wanted a place besides GitHub to show off projects. I wanted, for once, to write industrial-strength code that would face real people instead of fellow hackers. When you write code for a hacker, you write the bare minimum for it to work. So I ended up obsessing insanely about the design, typography, and optimization of this site. Pretty much everything except the content.
I thought I'd present a cool non-trivial Hello, World program here. But all the cool languages have really boring Hello, Worlds, and I have a natural revulsion to a language whose most basic Hello, World is more than a line long—C variants, Java, etc. So here's an assortment of my favorite Hello, World programs.
echo 'print "console.log(\"print \\\"echo Hello, World\\\"\")"' | ruby | node | python | bash
Here's a merry (pure) CSS3D welcome. It's essentially just a bunch of
animations with 3D transforms, but the end result is pretty impressive. It's
also overkill, which is the best only way to show off.
While we're feeling masochistic, here's Hello, World in Malbolge (the first working Malbolge program took 2 years and a LISP program to find, so don't feel too bad if you don't get it right away):
('&%:9]!~}|z2Vxwv-,POqponl$Hjig%eB@@>}=<M:9wv6WsU2T|nm-,jcL(I&%$#"
`CB]V?Tx<uVtT`Rpo3NlF.Jh++FdbCBA@?]!~|4XzyTT43Qsqq(Lnmkj"Fhg${z@>
If you're on a Mac, it's always nice to hear a human voice (or a reasonable
approximation thereof). The say
command is a very easy way to
annoy your sysadmin. Try putting a say
command in a shared
computer's .profile
—perhaps along the lines of Where have
you hidden the body?
.
$ say -v Zarvox "Hello, World"
Piet's Hello, World is pretty, self-referential, and a nice avatar for the aspiring esolang geek.
I'd post a Hello, World program in Whitespace, but I decided to save myself the effort and dump an empty box below. Use your imagination.
The following is a Hello, World program. Honest.
Romeo, a young man with a remarkable patience.
Juliet, a likewise young woman of remarkable grace.
Ophelia, a remarkable woman much in dispute with Hamlet.
Hamlet, the flatterer of Andersen Insulting A/S.
Act I: Hamlet's insults and flattery.
Scene I: The insulting of Romeo.
[Enter Hamlet and Romeo]
Hamlet:
You lying stupid fatherless big smelly half-witted coward! You are as
stupid as the difference between a handsome rich brave hero and thyself!
Speak your mind!
You are as brave as the sum of your fat little stuffed misused dusty
old rotten codpiece and a beautiful fair warm peaceful sunny summer's
day. You are as healthy as the difference between the sum of the
sweetest reddest rose and my father and yourself! Speak your mind!
You are as cowardly as the sum of yourself and the difference
between a big mighty proud kingdom and a horse. Speak your mind.
Speak your mind!
[Exit Romeo]
Scene II: The praising of Juliet.
[Enter Juliet]
Hamlet:
Thou art as sweet as the sum of the sum of Romeo and his horse and his
black cat! Speak thy mind!
[Exit Juliet]
Scene III: The praising of Ophelia.
[Enter Ophelia]
Hamlet:
Thou art as lovely as the product of a large rural town and my amazing
bottomless embroidered purse. Speak thy mind!
Thou art as loving as the product of the bluest clearest sweetest sky
and the sum of a squirrel and a white horse. Thou art as beautiful as
the difference between Juliet and thyself. Speak thy mind!
[Exeunt Ophelia and Hamlet]
Act II: Behind Hamlet's back.
Scene I: Romeo and Juliet's conversation.
[Enter Romeo and Juliet]
Romeo:
Speak your mind. You are as worried as the sum of yourself and the
difference between my small smooth hamster and my nose. Speak your
mind!
Juliet:
Speak YOUR mind! You are as bad as Hamlet! You are as small as the
difference between the square of the difference between my little pony
and your big hairy hound and the cube of your sorry little
codpiece. Speak your mind!
[Exit Romeo]
Scene II: Juliet and Ophelia's conversation.
[Enter Ophelia]
Juliet:
Thou art as good as the quotient between Romeo and the sum of a small
furry animal and a leech. Speak your mind!
Ophelia:
Thou art as disgusting as the quotient between Romeo and twice the
difference between a mistletoe and an oozing infected blister! Speak
your mind!
[Exeunt]
And finally, FiM++ looks like an average letter to Grandma:
Dear Princess Celestia:Hello World!
Today I learned how to say hello world!
I said "Hello, World!"!
That's all about how to say hello world.
Your faithful student, Kyli Rouge.
(Other people, however, write their letters in LOLCODE.)
HAI
CAN HAS STDIO?
VISIBLE "HAI WORLD!"
KTHXBYE
]]>