What You Don’t Know About Closures Can Hurt You

Some of you out there may still be wondering why it is that closures and macros are so great. “What exactly can you do with them that I can’t do in my more widely used Turing-complete language,” you ask. That’s reasonable. Well, I don’t think you’ll fully appreciate closures until you grasp some of the basics of functional programming, so let’s start there.

Functional programming languages accord a first-class status to procedures:

* They may be named by variables.
* They may be passed as arguments to procedures.
* They may be returned as the results of procedures.
* They may be included in data structures.

In other words, you can treat procedures as primitives. In the same way that you operate on integers, juggle them, shuffle them, combine them, transform lists of them, do anything you want with them… a functional programmer does the exact same thing with procedures.

Now if you’ve never thought much about this or experimented with these ideas, this will sound bizarre. You might have a hard time imagining how this can possibly be so useful. The only way I can think of to describe it is to imagine that you lived in a world where there was no such thing as subroutines, yet– and then suddenly some experimental language came out that had that feature. People would not admit that it was particularly useful and they would point to their terrible hacks and workarounds as evidence that such “ivory tower” concepts were useless in the Real World. Of course, you’ve seen almost the exact same thing in your workplace already, more than likely. When you see code repeated across the code base and you abstract it out into a class that is referenced by multiple projects, suddenly your coworkers are peeved at you because they can’t understand a section of code that they’re looking at without hitting “right-click — Definition” to jump to your new class file. This is why the default code architecture for a crummy business app is to open up a hundred separate form files and freely mix, match, and repeat business logic, data access code, and gui instructions all together. It’s a wonder that these guys even use subroutines…. (Oh wait, you see that, too: gigantic subs that go on for screen after screen that you can’t understand….)

Anyways, when you get procedures being treated as first-class citizens of the language, new abstractions are possible. You can create procedures that take procedures as an argument and that return procedures as a value. This means you can take several similar functions and abstract out the basic process that they all share. The procedures that you use as arguments can be used customize as many other procedures as you like. So on the one hand, you get code reuse via the basic “skeleton” procedures… and on the other, you get opportunities to reuse the code that is being used to customize the “skeletons.” Anonymous functions can be used to in-line the customizations when they’re not going to be reused, so you end up with cleaner, more expressive code.

Before I looked much into it, functional programming sounded like a step backward. I thought that the whole idea of separating code and data out and encapsulating it into classes was the answer to everything. Of course, you don’t have to give up on classes altogether. You can use functional style techniques to expand the configurability of your objects. In some cases this might be a more appropriate choice than inheritance, for example.

Now, one thing I’ve noticed with my OOP designs is that I end up spending a lot of time tinkering with my interfaces. It seems like I spend more time fiddling with the properties and methods lists and avoiding breaking the Law of Demeter than I do actually working on real code. And then I go back and break apart classes and rework the object model some more. With first class procedures, you can often skip the tedious object instantiation ritual and the maintenance of a separate class file and just jump right to what you’re trying to do. And if you can apply your procedure to a wide variety of data structures, so much the better.

Along into this scene arrives the closure. Again, if you aren’t jumping up and down with excitement for what first class procedures can do for you, you won’t really appreciate what closures add in to the mix. A closure is just a procedure that carries an environment with it. It’s like a class that has just one method and as many private variables as you’d want to cram into it. If you’re entrenched in an OOP mindset, you might think that’s ridiculous– you can make such things in your current language, you think to yourself. But this is one of those cases where less is more… and when mixed with other functional techniques you get some powerful options at your disposal.

So a closure turns out to be just a procedure with state. It’s sort of like an extremely stripped down object. The first thing people tend to show off when they’re going over closures is how you can add a counter to a procedure. Not very exciting, eh? But yeah, you can actually use that simple counter to package all kinds of logging and diagnostic functionality with your procedures. You can make a procedure that takes a procedure as an argument and returns that exact same function with your diagnostic code bolted onto the inside of it. The rest of your code doesn’t care if you’re using the plain vanilla procedure or the one with the extra logging utilities running with it.

Another thing that attaching a counter to a procedure makes possible is the creation of iterators. You might have seen an article recently that talked about how one of these days you just might write your last for loop. This, my friends, is how you get rid of for loops: you write a procedure that takes a procedure and an integer as arguments and returns a closure as its value. Each time you call the procedure, the counter gets incremented and it returns the next value from the procedure. When the counter reaches the same value as the integer argument, the closure returns a null value. Then what you need is a means of applying your other procedures to the values that are kicked out of your iterators– and with that you are now set free from the tedium of cryptic for-next loop code.

Now you may be thinking that there’s nothing wrong with your trusty for-next loop. And if you’re an OOPy type of person, you might think that your for-each loop does the job just fine. But how would you say “for each prime number from 200 to 300, execute the following function with that number as an argument.” Functional programming gives you a very expressive way to code exactly that. Another situation where iterators are useful is where you’re working with infinite lists or loops that repeat back on themselves. You might not be able to store an entire list of objects in a collection all at once– but with iterators you can simulate that very thing. You probably already use something very similar to an iterator when you open up a text file and operate on it on a line by line basis.

It occurs to me as I write this, that there was a time when I’d write “do while not rs.eof” at least a dozen times a day. I’d crash my IDE at least once every week or so because I forgot to put in “rs.movenext” at the back end of loop. That redundant code peppered throughout the application was a major liability– especially when we ended up having to move to a different data access paradigm. Oh the pain! What we needed was an iterator that could be instantiated with a SQL string. Then we could have written procedures to operate on those iterators. No more annoying do-loops! As a special bonus, if we had been writing unit tests back then, we could have made dummy iterators that don’t even call a live database– they just ould have pulled in a stream of test data instead!

There’s a lot more we could delve into on this topic, but the point is that with even just a smidgen of functional programming techniques, we could have made our code much more concise. We would have had additional means of separating out redundant code, and it would have been much easier to “rewire” the code to change its behavior.

Now, I mentioned briefly already that in functional programming we’ll strive for making our procedures able to operate on a wide range of data structures. It turns out that the key to making your functions really powerful in a wide range of situations is to use recursion. In your Imperative/OOP world, you may have only used recursion in a few rare cases– like searching through directories or setting up a tree view. Well, in functional programming, the default approach to just about everything is to use recursion. Your data structures are often defined recursively, so your functions that operate on them are consequently recursive as well. Recursion becomes your meat and drink and even the air that you breath… and you become intimate with every variety of it.

And just as closures are always introduced with a lame “counter” example, so recursion is introduced with fibonacci numbers. This is a really good bad example, because it shows not only how recursion can be the easiest way to express something, but also that recursion can spin dangerously out of control if it is mishandled.

Anyways, as you study recursion, it often takes a lot of trial and error to get it right. You test your function with several arguments to make sure it’s coming out right. You add in code to print stuff out so that you can get a feel for what’s going on. You take out the text code when you think you’re done and you put it back in when you find a new problem you need to fix. There has to be a better way. Wouldn’t it be great if we could write code to help us write code? Sounds like a job for a macro!

What we’ve got for show and tell in today’s code example is a tool called memoization. When you memoize a procedure, you create a closure that contains the procedure and a hashtable to go with it. Each time you call it, the closure checks to see if the argument is already in the hash table. If it is, it returns what’s in the hash table instead of running the potentially time consuming code to determine the value in the usual way. If the argument is in the hashtable, the closure finds the value using your procedure and then stores it in the hash table. In the case of the typical fibonacci function, this results in a dramatic performance increase.

Looking at the code, we have two defuns, a defmacro, and a macro call. The testing-memoize function takes a function as an argument and returns a closure. The print statements we use to show us what our recursive function is doing are here. We also added a “back door” to allow us access to the hash table that is attached to the function. The other function, tree-substitutes, is a recursive function that searches a hierarchical list for a specific symbol and splices in a list of things in its place. The macro takes the same arguments as a defun, but sets up a parameter to hold the closure, defines the a slightly modified version of the function using the tree-substitutes function, stores the closure in the parameter, and then redefines the function to call the closure instead. Finally, we have an example of how to apply our memoization macro with our ubiquitous fibonacci procedure.

If we like the behavior of our recursive function, we can turn off the memoization by changing “memoize” to “defun” in our procedure definition. If we like the performance gains and want to keep them, we can create our closures with a different momoization technique that doesn’t print out diagnostic text or expose a back door.

So, we can see that closures give us a means of abstracting away any extraneous or temporary behavior in our procedures. These techniques don’t replace subroutines and libraries, but they do provide an additional means of eliminating redundant code. We also see that macros can be used to set up all kinds of behind-the-scenes code by giving us a means of establishing additional ways of defining functions. We could potentially set up “testing” and “production” variants of our definition macros to radically change the behavior of our application by switching between which package we reference.

Hopefully you now have a better idea of what you can gain from studying functional programming, closures and macros. We’ve only scratched the surface, of course. If you’d like to know more, I highly recommend the classic Structure and Interpretation of Computer Programs. If you’d rather learn from a cool Perl hacker (instead of a “smug Lisp weenie”) then you might try Mark Jason Dominus’s Higher Order Perl instead.

(By the way, if you take a peek at the hash table after running the fibonacci code, it makes a pretty picture for you at the REPL prompt.)

This entry was posted on October 25, 2007 at 7:05 pm and is filed under Closures, Computer Science, Functional, LISP, OOP, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Learning Lisp