| 1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
<html> |
| 2 |
<html><head> |
<head> |
| 3 |
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><title>Recommender API README</title> |
<title>Recommender API README</title> |
|
</head> |
|
| 4 |
|
|
| 5 |
|
</head> |
| 6 |
<body> |
<body> |
| 7 |
|
<h1>Recommender API -- Developer's Companion</h1> |
| 8 |
|
(The updated version of this document can be found at |
| 9 |
|
http://drupal.org/project/recommender) |
| 10 |
|
|
| 11 |
<H1>Recommender API -- Developer's Companion</H1> |
<h3>Basic idea</h3> |
| 12 |
|
|
| 13 |
|
<p>Suppose we have three mice, Alex, Becky and Carol. They all love |
| 14 |
|
cheese, but different kinds. Alex likes Spanish cheese and Italian |
| 15 |
|
cheese. Becky also likes Spanish and Italian cheese, but she likes Swiss |
| 16 |
|
too. Carol, on the other hand, hates Spanish and Italian cheese, but |
| 17 |
|
loves Cheddar cheese. Now, suppose we also learn Alex hates French |
| 18 |
|
cheese, can we guess what Becky and Carol would say about French cheese?</p> |
| 19 |
|
<p>First, based on the fact that both Alex and Becky like Spanish |
| 20 |
|
and Italian cheese, we can infer that Alex and Becky have similar |
| 21 |
|
tastes. For the same reason, we can infer that Alex and Carol have quite |
| 22 |
|
opposite tastes. Then, since we know Alex hates French cheese, then we |
| 23 |
|
can reasonably guess that Becky hates French cheese too, whereas Carol |
| 24 |
|
probably likes French cheese.</p> |
| 25 |
|
|
| 26 |
|
<p>That is the basic idea of recommender systems. It can be used in |
| 27 |
|
many applications too. For an online store, we can think of the |
| 28 |
|
customers as the mice in the mouse-cheese analogy, and the products as |
| 29 |
|
cheese, then we can generate products recommendations for customers. For |
| 30 |
|
a taxonomy system, we can think of the content nodes as the mice, and |
| 31 |
|
the tags to describe the nodes as the cheese, then we can generate |
| 32 |
|
similarity scores among different content nodes based on the nodes-terms |
| 33 |
|
relationship, and we can suggest recommended terms for other nodes.</p> |
| 34 |
|
|
| 35 |
|
<p>The Recommender API module provides a set of APIs to calculate |
| 36 |
|
similarities among the mice, and then predict how each mouse would |
| 37 |
|
evaluate each cheese based on the evaluation from other mice, and |
| 38 |
|
finally generate a list of recommended cheese to each mouse. To use the |
| 39 |
|
APIs, you just need to have the mouse-cheese relationship stored in a |
| 40 |
|
table, and then select the right algorithm to do the calculation. |
| 41 |
|
Different algorithms will be explained below.</p> |
| 42 |
|
|
|
<h3>Basic idea</h3> |
|
|
<p>The tale of the mice and cheese: |
|
|
<br> |
|
|
Mouse: |
|
|
Cheese: |
|
|
Weight: if no weight, just use "1" or null<br></p> |
|
| 43 |
|
|
| 44 |
<h3>Algorithms</h3> |
<h3>Algorithms</h3> |
| 45 |
|
|
| 46 |
<p><strong>Classical</strong>: This is the classical family of collaborative filtering algorithms based on correlation coefficients. The most famous examples include the User-User algorithm, and the Item-Item algorithm. More readings:</p> |
<h4>Classical</h4> |
| 47 |
<ul><li>Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., & Riedl, J. (1994). GroupLens: An open architecture for collaborative filtering of netnews. Paper presented at the Proceedings of the 1994 Computer Supported Collaborative Work Conference.</li> |
<p>This is the classical family of collaborative filtering |
| 48 |
<li>Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1), 76-80.</li></ul> |
algorithms based on correlation coefficients. The most famous examples |
| 49 |
|
include the User-User algorithm, and the Item-Item algorithm. <strong>If |
| 50 |
|
you are not sure which algorithm to use, just use this one</strong> because it |
| 51 |
|
performs well in most cases, and is widely used in many applications.</p> |
| 52 |
|
<p>More readings:</p> |
| 53 |
|
<ul> |
| 54 |
|
<li>Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., & |
| 55 |
|
Riedl, J. (1994). GroupLens: An open architecture for collaborative |
| 56 |
|
filtering of netnews. Paper presented at the Proceedings of the 1994 |
| 57 |
|
Computer Supported Collaborative Work Conference.</li> |
| 58 |
|
<li>Linden, G., Smith, B., & York, J. (2003). Amazon.com |
| 59 |
|
recommendations: item-to-item collaborative filtering. Internet |
| 60 |
|
Computing, IEEE, 7(1), 76-80.</li> |
| 61 |
|
</ul> |
| 62 |
|
|
| 63 |
|
<h4>Slope-One</h4> |
| 64 |
|
<p>This algorithm has much better performance. In some cases, it |
| 65 |
|
generates better results too (see the reading below). But it's not |
| 66 |
|
widely studied in the academia or widely appied to many real world |
| 67 |
|
practices. Another drawback is that it cannot compute similarities among |
| 68 |
|
the mice, but can only predict mouse-cheese scores. </p> |
| 69 |
|
<p>More readings:</p> |
| 70 |
|
<ul> |
| 71 |
|
<li>http://en.wikipedia.org/wiki/Slope_One</li> |
| 72 |
|
<li>Lemire, D., & Maclachlan, A. (2005). Slope One Predictors |
| 73 |
|
for Online Rating-Based Collaborative Filtering. Paper presented at the |
| 74 |
|
SIAM Data Mining (SDM'05).</li> |
| 75 |
|
</ul> |
| 76 |
|
|
| 77 |
|
<h4>Co-ocurrences</h4> |
| 78 |
|
<p>This is a very simple and high performance algorithm. It only |
| 79 |
|
calculates similarities among the mice by how many cheese they share. |
| 80 |
|
For example, if mouse A and mouse B like 4 types of cheese in common, |
| 81 |
|
then the similarity score between A and B would be 4. This is the |
| 82 |
|
algorithm used in the "Similar By Terms" module. However, this algorithm |
| 83 |
|
has one major drawback -- suppose a mouse simply loves all cheese, then |
| 84 |
|
that mouse would have the highest similarity score with all other mice, |
| 85 |
|
which is incorrect.</p> |
| 86 |
|
|
| 87 |
|
<h4>PageRank</h4> |
| 88 |
|
<p><em>To be developed.</em> More readings: |
| 89 |
|
http://en.wikipedia.org/wiki/PageRank</p> |
| 90 |
|
|
| 91 |
|
<h4>SVD</h4> |
| 92 |
|
<p><em>To be developed.</em> More readings: |
| 93 |
|
http://en.wikipedia.org/wiki/Singular_value_decomposition</p> |
| 94 |
|
|
| 95 |
|
<h4>PCA</h4> |
| 96 |
|
<p><em>To be developed.</em> More readings: |
| 97 |
|
http://en.wikipedia.org/wiki/Principal_components_analysis</p> |
| 98 |
|
|
| 99 |
|
<h4>Influence Limiter</h4> |
| 100 |
|
<p><em>To be developed.</em> This algorithm is supposed to prevent |
| 101 |
|
manipulation of the recommender system. More readings:</p> |
| 102 |
|
<ul> |
| 103 |
|
<li>Resnick, P., & Sami, R. (2007). The influence limiter: |
| 104 |
|
provably manipulation-resistant recommender systems. Paper presented at |
| 105 |
|
the Proceedings of the 2007 ACM conference on Recommender systems.</li> |
| 106 |
|
</ul> |
| 107 |
|
|
|
<p><strong>Slope-one</strong>: TBA</p> |
|
| 108 |
|
|
| 109 |
<h3>Comparison of algorithms</h3> |
<h3>Comparison of algorithms</h3> |
| 110 |
|
|
| 111 |
<table> |
<table> |
| 112 |
<tbody> |
<tbody> |
| 113 |
<tr> |
<tr> |
| 114 |
<td></td> |
<td></td> |
|
<th>Missing data auto append</th> |
|
|
<th>In-memory calculation</th> |
|
|
<th>Incremental update</th> |
|
| 115 |
<th>Similarity</th> |
<th>Similarity</th> |
| 116 |
<th>Prediction</th> |
<th>Prediction</th> |
| 117 |
|
<th>Incremental update</th> |
| 118 |
|
<th>In-memory calculation</th> |
| 119 |
|
<th>Missing data auto append</th> |
| 120 |
|
<th>0-1 Weight field</th> |
| 121 |
|
<th>Performance</th> |
| 122 |
|
<th>Accuracy</th> |
| 123 |
</tr> |
</tr> |
| 124 |
<tr> |
<tr> |
| 125 |
<th>Classical</th> |
<th>Classical</th> |
| 126 |
<td>X</td> |
<td>X</td> |
| 127 |
<td>X (partial)</td> |
<td>X</td> |
| 128 |
<td></td> |
<td></td> |
| 129 |
|
<td>Partial</td> |
| 130 |
<td>X</td> |
<td>X</td> |
| 131 |
<td>X</td> |
<td>X</td> |
| 132 |
|
<td>poor</td> |
| 133 |
|
<td>high</td> |
| 134 |
</tr> |
</tr> |
| 135 |
<tr> |
<tr> |
| 136 |
<th>Slope-one</th> |
<th>Slope-one</th> |
| 137 |
<td></td> |
<td></td> |
| 138 |
|
<td>X</td> |
| 139 |
|
<td></td> |
| 140 |
<td></td> |
<td></td> |
| 141 |
<td></td> |
<td></td> |
| 142 |
<td></td> |
<td></td> |
| 143 |
<td>X</td> |
<td>medium</td> |
| 144 |
|
<td>medium</td> |
| 145 |
</tr> |
</tr> |
| 146 |
<tr> |
<tr> |
| 147 |
<th>Coocurrence</th> |
<th>Coocurrence</th> |
|
<td></td> |
|
|
<td></td> |
|
| 148 |
<td>X</td> |
<td>X</td> |
| 149 |
|
<td></td> |
| 150 |
<td>X</td> |
<td>X</td> |
| 151 |
<td></td> |
<td></td> |
| 152 |
|
<td></td> |
| 153 |
|
<td>X</td> |
| 154 |
|
<td>high</td> |
| 155 |
|
<td>poor</td> |
| 156 |
</tr> |
</tr> |
| 157 |
<tr> |
<tr> |
| 158 |
<th>SVD (TBD)</th> |
<td><em>PageRank (TBD)</em></td> |
| 159 |
|
<td></td> |
| 160 |
|
<td></td> |
| 161 |
|
<td></td> |
| 162 |
<td></td> |
<td></td> |
| 163 |
<td></td> |
<td></td> |
| 164 |
<td></td> |
<td></td> |
| 166 |
<td></td> |
<td></td> |
| 167 |
</tr> |
</tr> |
| 168 |
<tr> |
<tr> |
| 169 |
<th>PageRank (TBD)</th> |
<td><em>SVD (TBD)</em></td> |
| 170 |
|
<td></td> |
| 171 |
|
<td></td> |
| 172 |
|
<td></td> |
| 173 |
<td></td> |
<td></td> |
| 174 |
<td></td> |
<td></td> |
| 175 |
<td></td> |
<td></td> |
| 177 |
<td></td> |
<td></td> |
| 178 |
</tr> |
</tr> |
| 179 |
<tr> |
<tr> |
| 180 |
<th>PCA (TBD)</th> |
<td><em>PCA (TBD)</em></td> |
| 181 |
|
<td></td> |
| 182 |
|
<td></td> |
| 183 |
|
<td></td> |
| 184 |
<td></td> |
<td></td> |
| 185 |
<td></td> |
<td></td> |
| 186 |
<td></td> |
<td></td> |
| 188 |
<td></td> |
<td></td> |
| 189 |
</tr> |
</tr> |
| 190 |
<tr> |
<tr> |
| 191 |
<th>Influence Limiter (TBD)</th> |
<td><em>Inflence Limiter (TBD)</em></td> |
| 192 |
|
<td></td> |
| 193 |
|
<td></td> |
| 194 |
|
<td></td> |
| 195 |
<td></td> |
<td></td> |
| 196 |
<td></td> |
<td></td> |
| 197 |
<td></td> |
<td></td> |
| 201 |
</tbody> |
</tbody> |
| 202 |
</table> |
</table> |
| 203 |
|
|
| 204 |
|
<ul> |
| 205 |
|
<li>Similarity: whether it supports calculation of similarity |
| 206 |
|
scores among the mice.</li> |
| 207 |
|
<li>Prediction: whether it upports calculation of predictions |
| 208 |
|
scores for the mouse-cheese pair.</li> |
| 209 |
|
<li>Incremental update: whether it supports incremental update, or |
| 210 |
|
each time it rebuilds the whole index again</li> |
| 211 |
|
<li>In-memory calculation: whether it supports in-memory |
| 212 |
|
calculation (usually it's in database calculation)</li> |
| 213 |
|
<li>Missing data auto append: When a mouse doesn't rate a cheese, |
| 214 |
|
by default we treat it as missing data. However, we can ask the |
| 215 |
|
algorithm to automatically treat the missing data as 0.</li> |
| 216 |
|
<li>0-1 weight field: whether the "weight" field of the |
| 217 |
|
mouse-cheese pair can be only 0s and 1s.</li> |
| 218 |
|
<li>Performance: whether it is computation intensive.</li> |
| 219 |
|
<li>Accuracy: whether the result has a high quality.</li> |
| 220 |
|
</ul> |
| 221 |
|
|
| 222 |
|
<h3>How to use the APIs</h3> |
| 223 |
|
|
| 224 |
|
<p>The APIs usually require these parameters:</p> |
| 225 |
|
<ul> |
| 226 |
|
<li>$app_name: The name of your application that calls the API. |
| 227 |
|
This is used to identify different applications that uses Recommender |
| 228 |
|
API.</li> |
| 229 |
|
<li>$table_name: The name of the table that saves the mouse-cheese |
| 230 |
|
rating. Recommender API will do all the calculations based on the |
| 231 |
|
mouse-cheese relationships stored in the table.</li> |
| 232 |
|
<li>$field_mouse: The field of the mouse_id in $table_name to |
| 233 |
|
identify different mice.</li> |
| 234 |
|
<li>$field_cheese: The field of the cheese_id in $table_name to |
| 235 |
|
identify different cheese.</li> |
| 236 |
|
<li>$field_weight: The field of mouse-cheese rating, or weight, in |
| 237 |
|
$table_name. It signifies how strong is the relationship between a |
| 238 |
|
mouse and a cheese.</li> |
| 239 |
|
<li>$options: This is an array of options passed to the API, for |
| 240 |
|
example, whether to use the 'basic' or 'weighted' extensions of |
| 241 |
|
SlopeOne algorithm.</li> |
| 242 |
|
</ul> |
| 243 |
|
|
| 244 |
|
<p>The mouse-cheese table might not exists, or it's a part of a |
| 245 |
|
larger table. In that case, you might want to create a view from the |
| 246 |
|
larger table, or create a table and insert mouse-cheese records into |
| 247 |
|
that table. Then pass the view name or table name as $table_name to the |
| 248 |
|
API.</p> |
| 249 |
|
|
| 250 |
|
<p>Functions like recommender_similarity_*() are targeted to |
| 251 |
|
calculate similarity scores for the mice. Functions like |
| 252 |
|
recommender_prediction_* are to calculate prediction scores for the |
| 253 |
|
mice-cheese pairs. Functions starts with the underscore are private |
| 254 |
|
functions, not public APIs, and are subjects to change. Other functions |
| 255 |
|
are helper functions.</p> |
| 256 |
|
|
| 257 |
|
<p>Please refer to the documentation in the PHP file for more |
| 258 |
|
details. You can also read the code of the <a |
| 259 |
|
href="http://drupal.org/project/uurec">User-to-user Recommendation</a> |
| 260 |
|
module as an example of how to use the APIs.</p> |
| 261 |
|
|
| 262 |
|
<p>For support or other questions, please submit your request to the |
| 263 |
|
issue queue. Thanks.</p> |
| 264 |
|
|
|
<h3>Caveats</h3> |
|
|
<ol> |
|
|
<li>You might want to create a view as the "recommender_link" |
|
|
table</li> |
|
|
<li>Make sure no data got changed in "recommender_link" while |
|
|
we calculate the outcome. Otherwise it might lead to unpredictable |
|
|
results. |
|
|
One way to do it is to restrict the view to a certain time frame.</li> |
|
|
</ol> |
|
|
|
|
|
<h3>Future plan</h3> |
|
|
<p>Include basic reputation system algorithms which are similar to the |
|
|
recommender algorithms.</p> |
|
|
</body></html> |
|
| 265 |
|
</body> |
| 266 |
|
</html> |