Processing XML With Clojure

Although XML is nice in theory, I have always hated dealing with it. It requires so much boiler plate code just to parse or create a simple XML file. Recently I needed to do some XML processing and I still can't believe how easy it was to create and parse XML in clojure.

Clojure contrib includes a library for creating XML called prxml. Vectors become XML tags. Such as,

(prxml [:p {:class "greet"} [:i "Ladies & gentlemen"]])
; => <p class="greet"><i>Ladies &amp; gentlemen</i></p>

First let's define some data to turn in to XML.

(def data #{{:title "Clojure" :link ""  
             :description "Clojure Homepage"}

            {:title "Java"    :link "" 
             :description "JVM Homepage"}

            {:title "Debian"  :link ""   
             :description "Debian Homepage"}})

By default prxml function outputs to the screen if you want to output to a string use prxml in combination with with-out-str.

(defn articles []
   (fn [feed v]
     (conj feed 
            [:title (:title v)] 
            [:url (:url v)] 
            [:description (:description v)]]))
   () data))

We build a list of vectors for every node in the XML.

  [:title "Clojure"] 
  [:url ""] 
  [:description "Clojure Homepage"]]

  [:title "Java"] 
  [:url ""] 
  [:description "JVM Homepage"]]

If you wrap everything, it takes less than 20 lines of code to produce an RSS feed.

(defn xml-data []
   (prxml [:decl! {:version "1.0"}] 
          [:rss {:version "2.0"} 
            [:title "The Site"]
            [:link ""]
            [:description "The Site"]

Parsing XML is even easier, clojure core has built in support for XML processing. clojure.xml/parse can take a File, InputStream or String naming a URI and return a tree of the xml/element struct-map. You can then treat it like any other sequence.

Such as to iterate over all the titles in the XML file,

(let [input-stream (ByteArrayInputStream. (.getBytes (xml-data) "UTF-8"))]
  (for [x (xml-seq (parse input-stream))
        :when (= :title (:tag x))]
    (:content x)))

;;rss=> (["site-title"] ["Clojure"] ["Java"] ["Debian"])