Building a del.icio.us Link Recommender
This is a port of the link recommendation engine described in the Programming Collective Intelligence book, to Clojure. In the book author used pydelicious API to build his dataset, I tried a bunch of Java APIs but they all sucked one way or another so I decided to use their public feeds instead of wasting time fighting with the APIs.
Let's get some includes out of the way,
(ns delicious
(:require [clojure.zip :as zip]
[clojure.xml :as xml])
(:use clojure.contrib.zip-filter.xml))
request function will take a URL to a RSS feed, parse it and return a zip structure,
(defn request [url]
(let [conn (-> (java.net.URL. url) .openConnection)]
(zip/xml-zip (xml/parse (.getInputStream conn)))))
To build the dataset, I retrieve the popular bookmarks for clojure and build a list of users to get bookmarks for,
(defn popular-users []
(let [feed (request "http://feeds.delicious.com/v2/rss/popular/clojure")]
(vec (xml-> feed :channel :item :dc:creator text))))
Then fetch bookmarks tagged Clojure for each user,
(defn user-bookmarks [user]
(let [feed (request
(str "http://feeds.delicious.com/v2/rss/" user "/clojure"))]
(reduce (fn[h v] (assoc h v 1) )
{} (xml-> feed :channel :item :link text))))
And build a map similar to the one used in Making Recommendations,
(defn preferences [users]
(reduce (fn[m v] (assoc m v (user-bookmarks v))) {} users))
The difference this preferences has from the previous one is that everyone in the map will have the same set of URLs, with a value of one if the user bookmarked it or a value of zero if user did not bookmarked it. We need to build a list of all the URLs in the preferences and add to each user URLs that they did not bookmarked with a value of zero,
(defn fill-in [prefs]
(let [all-items (reduce (fn[s v] (merge s v) ) {} (vals prefs))]
(reduce (fn[h v]
(let [user (first v)
prefs (second v)
diff (reduce (fn[h v]
(if (nil? (get prefs v))
(assoc h v 0) h)) {} (keys all-items))]
(assoc h user (merge prefs diff)))) {} prefs)))
Since building the preferences requires network I/O which takes time, I defined a variable to hold the preferences to save time while playing,
(def dic (fill-in (preferences (popular-users))))
As similarity score I choose to use Euclidean Distance.
(defn euclidean [person1 person2]
(let [shared-items (filter person1 (keys person2))
score (reduce (fn[scr mv]
(let [score1 (person1 mv)
score2 (person2 mv)]
(+ scr (Math/pow (- score1 score2) 2))))
0 shared-items)]
(if (= (count shared-items) 0)
0
(/ 1 (+ 1 score)))))
delicious=> (euclidean (dic "clojurebot") (dic "infrared"))
0.0625
delicious=> (euclidean (dic "clojurebot") (dic "clojurebot"))
1.0
Now we have everything setup to calculate how similar users are,
(defn similarities [prefs person algo]
(reduce
(fn[h p] (assoc h (first p) (algo (prefs person) (second p))))
{} (dissoc prefs person)))
delicious=> (similarities dic "clojurebot" euclidean)
{"snearch" 0.0625, "infrared" 0.0625, "rrc" 0.0625,
"atreyu_bbb" 0.0625, "precip" 0.0625, "pelleb" 0.07142857142857142,
"studiomaestro" 0.5, "mreid" 0.07142857142857142, "drcabana" 0.0625,
"jolby" 0.0625, "agriffin73" 0.0625}
Following four functions are copied and pasted from Making Recommendations, only one weight-prefs function has changed slightly since everyone has the same set of URLs, refer to that post for how they work,
(defn weight-prefs [prefs similarity person]
(reduce
(fn [h v]
(let [other (first v) score (second v)
diff (dic other)
weighted-pref (apply hash-map
(interleave (keys diff)
(map #(* % score) (vals diff))))]
(assoc h other weighted-pref))) {} similarity))
(defn sum-scrs [prefs]
(reduce (fn [h m] (merge-with #(+ %1 %2) h m)) {} (vals prefs)))
;(sum-scrs (weight-prefs dic (similarities dic "snearch" pearson) "snearch"))
(defn sum-sims [weighted-pref scores sim-users]
(reduce (fn [h m]
(let [movie (first m)
rated-users (reduce
(fn [h m] (if (contains? (val m) movie)
(conj h (key m)) h))
[] weighted-pref)
similarities (apply + (map #(sim-users %) rated-users))]
(assoc h movie similarities) ) ) {} scores))
(defn recommend [prefs person algo]
(let [similar-users (into {} (similarities prefs person algo))
weighted-prefs (weight-prefs prefs similar-users person)
scores (sum-scrs weighted-prefs)
sims (sum-sims weighted-prefs scores similar-users)]
(interleave (keys scores) (map #(/ (second %) (sims (first %))) scores))))
Now that everthing is set, lets get some recommendations,
delicious=> (take 5
(reverse
(sort-by second
(apply hash-map
(recommend dic "clojurebot" euclidean)))))
(["http://clojure.org/" 0.4375]
["http://intensivesystems.net/tutorials/cont_m.html" 0.2265624999]
["http://intensivesystems.net/tutorials/web_sessions.html" 0.226562]
["http://www.bestinclass.dk/index.php/2009/12...++Blog%29" 0.171875]
["http://www.tbray.org/ongoing/...ng-Clojure" 0.1640625])