BEncoding Objects in Clojure

I plan on playing with the Bittorrent protocol, I already have a bencode decoder to play with torrent files, but since I need to communicate with trackers, I need encoding. This post will walk through the steps required to encode objects using bencoding,

(defn encode [obj]
  (let [stream (ByteArrayOutputStream.)] 
    (encode-object obj stream)
    (.toByteArray stream)))

To encode an object, we call encode on it. We get a byte array representing the encoded object, you can then write it to a file or look at it by creating a String from it.

(defn- encode-object [obj stream]
  (cond (string?  obj) (encode-string obj stream)
        (number? obj) (encode-number obj stream)
        (vector? obj) (encode-list obj stream)
        (map? obj) (encode-dictionary obj stream)))

encode-object is where encoding begins, depending on the type of object passed to it, it will call the appropriate function.

(defn- encode-string [obj stream]
  (let [bytes (.getBytes obj "ISO-8859-1")
        bytes-length (.getBytes (str (count bytes) ":") "ISO-8859-1")]
    (.write stream bytes-length 0 (count bytes-length))
    (.write stream bytes 0 (count bytes))))

An encoded string has the format,

<string length encoded in base ten ASCII>:<string data>
4:spam -> "spam"

so what we do is we turn the string in to a byte array, calculate its length write everything to stream according to the format.

(defn- encode-number [number stream]
  (let [string (str "i" number "e")
        bytes (.getBytes string "ISO-8859-1")]
    (.write stream bytes 0 (count bytes))))

An encoded number has the format,

i<integer encoded in base ten ASCII>e
i3e -> 3

we build a string by prepending "i" and appending "e" to the number write the bytes to the stream.

(defn- encode-list [list stream]
  (.write stream (int \l))
  (doseq [item list]
    (encode-object item stream))
  (.write stream (int \e)))

In my implementation, bencoded lists are represented as clojure vectors, a bencoded list has the following format,

l<bencoded values>e
l4:spam4:eggse -> [ "spam", "eggs" ]

what we do is, iterate over the vector and for each object found, call encode-object on it.

(defn- encode-dictionary [dictionary stream]
  (.write stream (int \d))
  (doseq [item dictionary]
    (encode-object (first item) stream)
    (encode-object (second item) stream))
  (.write stream (int \e)))

An encoded map has the format,

d<bencoded string><bencoded element>e
d3:cow3:moo4:spam4:eggse -> { "cow" => "moo", "spam" => "eggs" }

the technique to encode a map is the same as a vector, we iterate over the map but call encode-object twice once for the key and once for the value.