<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"><head><meta content="text/html; charset=UTF-8" http-equiv="content-type" /><meta name="description" /><meta content="clojure programming-collective-intelligence" name="keywords" /><meta content="Nurullah Akkaya" name="author" /><link href="/images/favicon.ico" rel="icon" type="image/x-icon" /><link href="/images/favicon.ico" rel="shortcut icon" type="image/x-icon" /><link href="/default.css" rel="stylesheet" type="text/css" /><link href="/rss-feed" rel="alternate" title="An explorer&apos;s log" type="application/rss+xml" /><link href="http://nakkaya.com/2009/11/13/pearson-correlation-score/" rel="canonical" /><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><title>Pearson Correlation Score</title></head><body><div id="wrap"><div id="header"><h1><a href="/">nakkaya<span class="fade-small">dot</span><span class="fade">com</span></a></h1><div class="pages"><a class="page" href="/">Home</a> | <a class="page" href="/projects.html">Projects</a> | <a class="page" href="/archives.html">Archives</a> | <a class="page" href="/tags/">Tags</a> | <a class="page" href="/contact.html" rel="author">About</a><form action="http://www.google.com/search" id="searchform" method="get"><div><input class="box" id="s" name="q" type="text" /><input name="sitesearch" type="hidden" value="nakkaya.com" /></div></form></div></div><div id="content"><div id="post"><h2 class="page-title">Pearson Correlation Score</h2><p>
This post will cover another topic from <a href="http://oreilly.com/catalog/9780596529321">Programming Collective
Intelligence</a> that is used to define similarities between items called
<a href="http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient">Pearson correlation score</a>, the formula for this algorithm looks like
the following,
</p>

\begin{equation}
  r = \frac{\sum XY - \frac{\sum X \times \sum Y }{N}}
           {\sqrt{(\sum X^2- \frac{(\sum X)^2}{N}) - (\sum Y^2- \frac{(\sum Y)^2}{N})}}
\end{equation}

<p>
This calculation returns a value between -1 and 1. Two users with a
similarity of 1 have rated every item identically. Unlike <a href="http://nakkaya.com/2009/11/11/euclidean-distance-score/">Euclidean Distance Score</a>
this formula doesn't need to be normalized. Pearson correlation score,
also accounts for average ratings for each user, a user that rates
everything 5 and a user that rates everything 1 will have a similarity
of 1. This  may or may not be the behavior you want depending on your
situation.
</p>

<div class="org-src-container">

<pre class="src src-clojure">(<span style="color: #ff5f00; font-weight: bold;">defn</span> <span style="color: #d7af00; font-weight: bold;">pearson</span> [x y]
  (<span style="color: #ff5f00; font-weight: bold;">let</span> [shrd (filter x (keys y))] 
    (<span style="color: #ff5f00; font-weight: bold;">if</span> (= 0 (count shrd))
      0
      (<span style="color: #ff5f00; font-weight: bold;">let</span> [sum1  (reduce (<span style="color: #ff5f00; font-weight: bold;">fn</span>[s mv] (+ s (x mv))) 0 shrd)
            sum2  (reduce (<span style="color: #ff5f00; font-weight: bold;">fn</span>[s mv] (+ s (y mv))) 0 shrd)
            sum1sq  (reduce (<span style="color: #ff5f00; font-weight: bold;">fn</span>[s mv] (+ s (<span style="font-weight: bold; text-decoration: underline;">Math</span>/pow (x mv) 2))) 0 shrd)
            sum2sq  (reduce (<span style="color: #ff5f00; font-weight: bold;">fn</span>[s mv] (+ s (<span style="font-weight: bold; text-decoration: underline;">Math</span>/pow (y mv) 2))) 0 shrd)
            psum (reduce (<span style="color: #ff5f00; font-weight: bold;">fn</span>[s mv] (+ s (* (x mv) (y mv)))) 0 shrd)
            num (- psum (/ (* sum1 sum2) (count shrd)))
            den (<span style="font-weight: bold; text-decoration: underline;">Math</span>/sqrt (* 
                            (- sum1sq (/ (<span style="font-weight: bold; text-decoration: underline;">Math</span>/pow sum1 2) (count shrd)))
                            (- sum2sq (/ (<span style="font-weight: bold; text-decoration: underline;">Math</span>/pow sum2 2) (count shrd)))))]
        (<span style="color: #ff5f00; font-weight: bold;">if</span> (= den 0)
          0
          (double (/ num den))) ))))
</pre>
</div>

<p>
Using the same critics map from <a href="http://nakkaya.com/2009/11/11/euclidean-distance-score/">Euclidean Distance Score</a>,
</p>

<pre class="example">
user=&gt; (pearson (critics "Lisa Rose") (critics "Gene Seymour"))
0.39605901719066977

user=&gt; (pearson (critics "Lisa Rose") {})
0
</pre>
<div class="post-tags">Tags: <a href="/tags/#clojure">clojure </a><a href="/tags/#programming-collective-intelligence">programming-collective-intelligence </a></div></div><div id="related"><h3 class="random-posts">Random Posts</h3><ul class="posts"><li><span>20 Apr 2010</span><a href="/2010/04/20/fractals-in-clojure-newton-fractal/">Fractals in Clojure - Newton Fractal</a></li><li><span>15 Feb 2017</span><a href="/2017/02/15/bare-metal-lisp-rc-control-using-ferret/">Bare Metal Lisp - RC Control using Ferret</a></li><li><span>10 Oct 2009</span><a href="/2009/10/10/processing-xml-with-clojure/">Processing XML With Clojure</a></li><li><span>03 Jan 2010</span><a href="/2010/01/03/clodiuno-a-clojure-api-for-the-firmata-protocol/">clodiuno - A Clojure API for the Firmata Protocol</a></li><li><span>09 Jan 2010</span><a href="/2010/01/09/a-simple-turtle-graphics-implementation-in-clojure/">A Simple Turtle Graphics Implementation in Clojure</a></li></ul></div><div id="disqus"><div id="disqus_thread"></div><script type="text/javascript" src="//disqus.com/forums/nakkaya/embed.js"></script><noscript><a href="//disqus.com/forums/nakkaya/?url=ref">View the discussion thread.</a></noscript><a href="//disqus.com" class="dsq-brlink">blog comments powered by <span class="logo-disqus">Disqus</span></a></div></div><div id="footer"><a href="/rss-feed"> RSS Feed</a><p>&copy; 2018<a href="http://nakkaya.com"> Nurullah Akkaya</a></p></div></div><script type="text/javascript">
//<![CDATA[
(function() {
	     var links = document.getElementsByTagName('a');
	     var query = '?';
	     for(var i = 0; i < links.length; i++) {
		     if(links[i].href.indexOf('#disqus_thread') >= 0) {
								       query += 'url' + i + '=' + encodeURIComponent(links[i].href) + '&';
								       }
		     }
	     document.write('<script charset="utf-8" type="text/javascript" src="//disqus.com/forums/nakkaya/get_num_replies.js' + query + '"></' + 'script>');
	     })();
//]]>
</script></body></html>