Converting HTML to Hiccup DSL

Hiccup DSL for creating HTML/XML is great unless you have a lot of HTML code already written. At first my plan was to parse it, write it to a file and manually format it, then I stumbled on this post from compojure mailing list, it is a small utility function written by Robin Brandt. It converts the given HTML file to Hiccup DSL.

#^:shebang '[
             exec java -cp "/home/nakkaya/.m2/repository/org/clojure/clojure/1.4.0/clojure-1.4.0.jar:/home/nakkaya/.m2/repository/org/clojure/data.xml/0.0.7/data.xml-0.0.7.jar" clojure.main "$0" -- "$@"
             ]

(ns hiccup-converter
  (:use [clojure.data.xml :only (parse)])
  (:use clojure.pprint)
  (:import (java.io File)))

(defn format-attrs
  [m]
  (when m
    (format "%s" m)))

(defn empty-when-null
  [x]
  (if (nil? x)
    ""
    x))

(declare format-full-node)

(defn format-node
  [node]
  (cond
   (string? node) (format "\"%s\"" (.trim node))
   (nil? node) nil
   :else (format-full-node node)))

(defn format-full-node
  [node]
  (format "[%s %s %s]\n"
          (:tag node)
          (empty-when-null (format-attrs (:attrs node)))
          (clojure.string/join " " (map format-node (:content node)))))

(defn transform-str
  [str]
  (->> str
       java.io.StringReader.
       parse
       format-node
       read-string
       pprint
       print))

(defn transform-file [f-name]
  (transform-str (slurp f-name)))
(transform-str "<html>
                 <head>
                   <title>Tutorial: HelloWorld</title>
                 </head>
                 <body>
                   <h1>HelloWorld Tutorial</h1>
                 </body>
               </html>")
[:html {}
 [:head {} [:title {} "Tutorial: HelloWorld"]]
 [:body {} [:h1 {} "HelloWorld Tutorial"]]]

It will complain if you have badly written HTML, in my case it only complained about a bunch of br statements, a simple search and replaced fixed it. If you can't get it to accept your HTML try running it through JTidy, that should fix it.