By now, I'm assuming that you've read Part 1. If you haven't, that's ok. Although this post follows thematically from Part 1, it can stand alone. Enough talk, let's get to it.
In Part 1 I talked a lot about this programming problem that I was asked to solve during a technical interview. Here's a rough reconstruction of that problem:
Use the BitTorrent encoding (Bencode) protocol to serialize the following data structures: String, Integer, List, Dictionary.
That's not the exact wording, but it's roughly equivalent. I suggest that you quickly review the Bencode protocol if you aren't familiar with it; just so that we'll be on the same page. I'll give a couple of examples from the Bencode wiki:
IN -> OUT1: "Spam" -> 4:spam
2: 3 -> i3e3: ["spam", "eggs"] -> l4:spam:4:eggse
4: {"cow" => "moo", "spam" => "eggs"} -> d3:cow3:moo4:spam4:eggse5: {"spam" => ["a", "b"]} -> d4:spaml1:a1:bee
When this problem was presented to me, I approached it from the perspective of OO imperative programming. Let me tell you: that got really messy, really fast.
The first thing working against me was that I tried to write the solution in PHP (which was the language I was most comfortable with at the time). No disrespect intended to the language, but since PHP treats Arrays and Hashes (or Associative Arrays, as they are also known in the PHP world) as essentially equivalent, there was added complexity when trying to distinguish a List from a Dictionary. My solution started simple; a Bencode class with some instance methods to take care of the data type encoding. Then came the conditionals. Which lead to a set of Interfaces to allow for type checking. Eventually I had to turn to recursion. That's the point where I should have known that I was coming at it from the wrong perspective. Like I said, it got messy.
In the end I had a very complex solution to a seemingly simple problem. That thought stuck with me for a long time. I expected that a simple problem should have a simple, or at least more elegant, solution. Especially for something that was being asked in a technical interview. I really started to understand functional programming, it dawned on me that Bencoding is just about transforming the supplied data types into a specially encoded string. The kind of problem that lends itself well to the functional paradigm.
The String data type gets transformed into a new encoded string which consists of the length of the original String, followed by a colon, followed by the original String. That's pretty easy to accomplish. Let's see how that might look in Clojure:
(defn bencode-string [s] (str (count s) ":" s))
A function that accepts a String and returns the encoded version. Nice and simple, right? Transforming an Integer is just as simple:
(defn bencode-integer [i] (str "i" i "e"))
Integer goes in, the encoded version of the integer comes out. If you want to see these in action, you can head over to Try Clojure and paste these function definitions into the REPL. Once the functions are defined, you can call them with examples like:
(bencode-string "spam")
and (bencode-integer 3)
. The REPL will echo the return value of each function.Things start to get a bit tricky when we look at the encoding for Lists. A List is just a collection of data types; it can contain Strings, Integers, other Lists, and/or Dictionaries. In order to Bencode a List we need to apply the appropriate bencoding function to each element in the List. Luckily, this is something that functional programming does really well. Assuming we have a generic
bencode
function that knows how to encode each data type, encoding a List might look like:(defn bencode-list [l] (str "l" (reduce str (map bencode l)) "e"))
Ah, the classic Map/Reduce paradigm. Let's dig into this a little bit. We'll start with
(map bencode l)
since it's the first part of the function that will be processed. As I mentioned above, this makes use of a bencode
function that we haven't defined yet, but that we can assume knows how to encode each of the four data types. Stick with me, we'll define bencode
shortly. The
map
function takes two arguments: a function and a List. It then applies that function to each element of the List, returning a new List of the resulting values. It's roughly equivalent to the following imperative code:new_list = []
l.each do |e|
new_list.push(bencode(e))
end
After
map
has created a new List containing the bencoded versions of the elements in l
, it then passes that new List to the reduce
function. The reduce
function takes two arguments: a function and a List. It is often used to reduce a List to a single value. A simple use case for the reduce
function is summing a List, i.e. reducing a List to a single Integer value by using the addition function. In our example, (reduce str (map bencode l))
is taking the resulting List from the map
function and reducing it to a single String using the str
function. str
is a function that can be used to concatenate Strings, e.g. (str "Hello" "World")
will return "HelloWorld". reduce
applies str
to the first two elements of the List, takes the resulting String and applies str
to it and the third element. It continues doing this until it has exhausted the List. Finally, we use str
to prepend the "l" (lower case L) and append the "e" to the result of the reduce
function (as per the Bencode specification). Given a List like
["spam", "eggs"]
, the map
function will return a List that looks like ["4:spam", "4:eggs"]
, the reduce
function will return a String that looks like "4:spam4:eggs"
and the outermost str
function will return "l4:spam4:eggse"
.Before moving on to the Dictionary data type, let's take a quick look at the
bencode
function that bencode-list
relys on. bencode
is the main function will be used to bencode data. It needs to accept a data type, determine how to encode it, and return the encoded result. Based on what we've done so far, we might start with something like this:(defn bencode [data]
(cond
(string? data) (bencode-string data)
(number? data) (bencode-integer data)
(vector? data) (bencode-list data)))
It's just a simple conditional structure that checks the type of the supplied data, then delegates to other functions that know how to do the actual encoding. Think back to how
bencode-list
uses bencode
to encode each element in the List. Hopefully that is clear now! The only thing that our bencode function doesn't know how to do yet is encode Dictionaries. There are some really cool Clojure functions that are going to make the Dictionary encoding a snap, but those functions will need some special attention and explanation. I'm going to save that for Part 3.
Stay tuned!