/[drupal]/contributions/modules/recommender/README.html
ViewVC logotype

Diff of /contributions/modules/recommender/README.html

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph | View Patch Patch

revision 1.3, Sat Feb 28 20:49:53 2009 UTC revision 1.4, Sun Mar 1 04:12:56 2009 UTC
# Line 1  Line 1 
1  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">  <html>
2  <html><head>  <head>
3  <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><title>Recommender API README</title>  <title>Recommender API README</title>
 </head>  
4    
5    </head>
6  <body>  <body>
7    <h1>Recommender API -- Developer's Companion</h1>
8    (The updated version of this document can be found at
9    http://drupal.org/project/recommender)
10    
11  <H1>Recommender API -- Developer's Companion</H1>  <h3>Basic idea</h3>
12    
13    <p>Suppose we have three mice, Alex, Becky and Carol. They all love
14    cheese, but different kinds. Alex likes Spanish cheese and Italian
15    cheese. Becky also likes Spanish and Italian cheese, but she likes Swiss
16    too. Carol, on the other hand, hates Spanish and Italian cheese, but
17    loves Cheddar cheese. Now, suppose we also learn Alex hates French
18    cheese, can we guess what Becky and Carol would say about French cheese?</p>
19    <p>First, based on the fact that both Alex and Becky like Spanish
20    and Italian cheese, we can infer that Alex and Becky have similar
21    tastes. For the same reason, we can infer that Alex and Carol have quite
22    opposite tastes. Then, since we know Alex hates French cheese, then we
23    can reasonably guess that Becky hates French cheese too, whereas Carol
24    probably likes French cheese.</p>
25    
26    <p>That is the basic idea of recommender systems. It can be used in
27    many applications too. For an online store, we can think of the
28    customers as the mice in the mouse-cheese analogy, and the products as
29    cheese, then we can generate products recommendations for customers. For
30    a taxonomy system, we can think of the content nodes as the mice, and
31    the tags to describe the nodes as the cheese, then we can generate
32    similarity scores among different content nodes based on the nodes-terms
33    relationship, and we can suggest recommended terms for other nodes.</p>
34    
35    <p>The Recommender API module provides a set of APIs to calculate
36    similarities among the mice, and then predict how each mouse would
37    evaluate each cheese based on the evaluation from other mice, and
38    finally generate a list of recommended cheese to each mouse. To use the
39    APIs, you just need to have the mouse-cheese relationship stored in a
40    table, and then select the right algorithm to do the calculation.
41    Different algorithms will be explained below.</p>
42    
 <h3>Basic idea</h3>  
 <p>The tale of the mice and cheese:  
 <br>  
 Mouse:  
 Cheese:  
 Weight: if no weight, just use "1" or null<br></p>  
43    
44  <h3>Algorithms</h3>  <h3>Algorithms</h3>
45    
46  <p><strong>Classical</strong>: This is the classical family of collaborative filtering algorithms based on correlation coefficients. The most famous examples include the User-User algorithm, and the Item-Item algorithm. More readings:</p>  <h4>Classical</h4>
47  <ul><li>Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., & Riedl, J. (1994). GroupLens: An open architecture for collaborative filtering of netnews. Paper presented at the Proceedings of the 1994 Computer Supported Collaborative Work Conference.</li>  <p>This is the classical family of collaborative filtering
48  <li>Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1), 76-80.</li></ul>  algorithms based on correlation coefficients. The most famous examples
49    include the User-User algorithm, and the Item-Item algorithm. <strong>If
50    you are not sure which algorithm to use, just use this one</strong> because it
51    performs well in most cases, and is widely used in many applications.</p>
52    <p>More readings:</p>
53    <ul>
54            <li>Resnick, P., Iacovou, N., Sushak, M., Bergstrom, P., &amp;
55            Riedl, J. (1994). GroupLens: An open architecture for collaborative
56            filtering of netnews. Paper presented at the Proceedings of the 1994
57            Computer Supported Collaborative Work Conference.</li>
58            <li>Linden, G., Smith, B., &amp; York, J. (2003). Amazon.com
59            recommendations: item-to-item collaborative filtering. Internet
60            Computing, IEEE, 7(1), 76-80.</li>
61    </ul>
62    
63    <h4>Slope-One</h4>
64    <p>This algorithm has much better performance. In some cases, it
65    generates better results too (see the reading below). But it's not
66    widely studied in the academia or widely appied to many real world
67    practices. Another drawback is that it cannot compute similarities among
68    the mice, but can only predict mouse-cheese scores.&nbsp;</p>
69    <p>More readings:</p>
70    <ul>
71            <li>http://en.wikipedia.org/wiki/Slope_One</li>
72            <li>Lemire, D., &amp; Maclachlan, A. (2005). Slope One Predictors
73            for Online Rating-Based Collaborative Filtering. Paper presented at the
74            SIAM Data Mining (SDM'05).</li>
75    </ul>
76    
77    <h4>Co-ocurrences</h4>
78    <p>This is a very simple and high performance algorithm. It only
79    calculates similarities among the mice by how many cheese they share.
80    For example, if mouse A and mouse B like 4 types of cheese in common,
81    then the similarity score between A and B would be 4. This is the
82    algorithm used in the "Similar By Terms" module. However, this algorithm
83    has one major drawback -- suppose a mouse simply loves all cheese, then
84    that mouse would have the highest similarity score with all other mice,
85    which is incorrect.</p>
86    
87    <h4>PageRank</h4>
88    <p><em>To be developed.</em> More readings:
89    http://en.wikipedia.org/wiki/PageRank</p>
90    
91    <h4>SVD</h4>
92    <p><em>To be developed.</em> More readings:
93    http://en.wikipedia.org/wiki/Singular_value_decomposition</p>
94    
95    <h4>PCA</h4>
96    <p><em>To be developed.</em> More readings:
97    http://en.wikipedia.org/wiki/Principal_components_analysis</p>
98    
99    <h4>Influence Limiter</h4>
100    <p><em>To be developed.</em> This algorithm is supposed to prevent
101    manipulation of the recommender system. More readings:</p>
102    <ul>
103            <li>Resnick, P., &amp; Sami, R. (2007). The influence limiter:
104            provably manipulation-resistant recommender systems. Paper presented at
105            the Proceedings of the 2007 ACM conference on Recommender systems.</li>
106    </ul>
107    
 <p><strong>Slope-one</strong>: TBA</p>  
108    
109  <h3>Comparison of algorithms</h3>  <h3>Comparison of algorithms</h3>
110    
111  <table>  <table>
112          <tbody>          <tbody>
113                  <tr>                  <tr>
114                          <td></td>                          <td></td>
                         <th>Missing data auto append</th>  
                         <th>In-memory calculation</th>  
                         <th>Incremental update</th>  
115                          <th>Similarity</th>                          <th>Similarity</th>
116                          <th>Prediction</th>                          <th>Prediction</th>
117                            <th>Incremental update</th>
118                            <th>In-memory calculation</th>
119                            <th>Missing data auto append</th>
120                            <th>0-1 Weight field</th>
121                            <th>Performance</th>
122                            <th>Accuracy</th>
123                  </tr>                  </tr>
124                  <tr>                  <tr>
125                          <th>Classical</th>                          <th>Classical</th>
126                          <td>X</td>                          <td>X</td>
127                          <td>X (partial)</td>                          <td>X</td>
128                          <td></td>                          <td></td>
129                            <td>Partial</td>
130                          <td>X</td>                          <td>X</td>
131                          <td>X</td>                          <td>X</td>
132                            <td>poor</td>
133                            <td>high</td>
134                  </tr>                  </tr>
135                  <tr>                  <tr>
136                          <th>Slope-one</th>                          <th>Slope-one</th>
137                          <td></td>                          <td></td>
138                            <td>X</td>
139                            <td></td>
140                          <td></td>                          <td></td>
141                          <td></td>                          <td></td>
142                          <td></td>                          <td></td>
143                          <td>X</td>                          <td>medium</td>
144                            <td>medium</td>
145                  </tr>                  </tr>
146                  <tr>                  <tr>
147                          <th>Coocurrence</th>                          <th>Coocurrence</th>
                         <td></td>  
                         <td></td>  
148                          <td>X</td>                          <td>X</td>
149                            <td></td>
150                          <td>X</td>                          <td>X</td>
151                          <td></td>                          <td></td>
152                            <td></td>
153                            <td>X</td>
154                            <td>high</td>
155                            <td>poor</td>
156                  </tr>                  </tr>
157                  <tr>                  <tr>
158                          <th>SVD (TBD)</th>                          <td><em>PageRank (TBD)</em></td>
159                            <td></td>
160                            <td></td>
161                            <td></td>
162                          <td></td>                          <td></td>
163                          <td></td>                          <td></td>
164                          <td></td>                          <td></td>
# Line 67  Weight: if no weight, just use "1" or nu Line 166  Weight: if no weight, just use "1" or nu
166                          <td></td>                          <td></td>
167                  </tr>                  </tr>
168                  <tr>                  <tr>
169                          <th>PageRank (TBD)</th>                          <td><em>SVD (TBD)</em></td>
170                            <td></td>
171                            <td></td>
172                            <td></td>
173                          <td></td>                          <td></td>
174                          <td></td>                          <td></td>
175                          <td></td>                          <td></td>
# Line 75  Weight: if no weight, just use "1" or nu Line 177  Weight: if no weight, just use "1" or nu
177                          <td></td>                          <td></td>
178                  </tr>                  </tr>
179                  <tr>                  <tr>
180                          <th>PCA (TBD)</th>                          <td><em>PCA (TBD)</em></td>
181                            <td></td>
182                            <td></td>
183                            <td></td>
184                          <td></td>                          <td></td>
185                          <td></td>                          <td></td>
186                          <td></td>                          <td></td>
# Line 83  Weight: if no weight, just use "1" or nu Line 188  Weight: if no weight, just use "1" or nu
188                          <td></td>                          <td></td>
189                  </tr>                  </tr>
190                  <tr>                  <tr>
191                          <th>Influence Limiter (TBD)</th>                          <td><em>Inflence Limiter (TBD)</em></td>
192                            <td></td>
193                            <td></td>
194                            <td></td>
195                          <td></td>                          <td></td>
196                          <td></td>                          <td></td>
197                          <td></td>                          <td></td>
# Line 93  Weight: if no weight, just use "1" or nu Line 201  Weight: if no weight, just use "1" or nu
201          </tbody>          </tbody>
202  </table>  </table>
203    
204    <ul>
205            <li>Similarity: whether it supports calculation of similarity
206            scores among the mice.</li>
207            <li>Prediction: whether it upports calculation of predictions
208            scores for the mouse-cheese pair.</li>
209            <li>Incremental update: whether it supports incremental update, or
210            each time it rebuilds the whole index again</li>
211            <li>In-memory calculation: whether it supports in-memory
212            calculation (usually it's in database calculation)</li>
213            <li>Missing data auto append: When a mouse doesn't rate a cheese,
214            by default we treat it as missing data. However, we can ask the
215            algorithm to automatically treat the missing data as 0.</li>
216            <li>0-1 weight field: whether the &quot;weight&quot; field of the
217            mouse-cheese pair can be only 0s and 1s.</li>
218            <li>Performance: whether it is computation intensive.</li>
219            <li>Accuracy: whether the result has a high quality.</li>
220    </ul>
221    
222    <h3>How to use the APIs</h3>
223    
224    <p>The APIs usually require these parameters:</p>
225    <ul>
226            <li>$app_name: The name of your application that calls the API.
227            This is used to identify different applications that uses Recommender
228            API.</li>
229            <li>$table_name: The name of the table that saves the mouse-cheese
230            rating. Recommender API will do all the calculations based on the
231            mouse-cheese relationships stored in the table.</li>
232            <li>$field_mouse: The field of the mouse_id in $table_name to
233            identify different mice.</li>
234            <li>$field_cheese: The field of the cheese_id in $table_name to
235            identify different cheese.</li>
236            <li>$field_weight: The field of mouse-cheese rating, or weight, in
237            $table_name. It signifies how strong is the relationship between a
238            mouse and a cheese.</li>
239            <li>$options: This is an array of options passed to the API, for
240            example, whether to use the 'basic' or 'weighted' extensions of
241            SlopeOne algorithm.</li>
242    </ul>
243    
244    <p>The mouse-cheese table might not exists, or it's a part of a
245    larger table. In that case, you might want to create a view from the
246    larger table, or create a table and insert mouse-cheese records into
247    that table. Then pass the view name or table name as $table_name to the
248    API.</p>
249    
250    <p>Functions like recommender_similarity_*() are targeted to
251    calculate similarity scores for the mice. Functions like
252    recommender_prediction_* are to calculate prediction scores for the
253    mice-cheese pairs. Functions starts with the underscore are private
254    functions, not public APIs, and are subjects to change. Other functions
255    are helper functions.</p>
256    
257    <p>Please refer to the documentation in the PHP file for more
258    details. You can also read the code of the <a
259            href="http://drupal.org/project/uurec">User-to-user Recommendation</a>
260    module as an example of how to use the APIs.</p>
261    
262    <p>For support or other questions, please submit your request to the
263    issue queue. Thanks.</p>
264    
 <h3>Caveats</h3>  
 <ol>  
 <li>You might want to create a view as the "recommender_link"  
 table</li>  
 <li>Make sure no data got changed in "recommender_link" while  
 we calculate the outcome. Otherwise it might lead to unpredictable  
 results.  
 One way to do it is to restrict the view to a certain time frame.</li>  
 </ol>  
   
 <h3>Future plan</h3>  
 <p>Include basic reputation system algorithms which are similar to the  
 recommender algorithms.</p>  
 </body></html>  
265    </body>
266    </html>

Legend:
Removed from v.1.3  
changed lines
  Added in v.1.4

  ViewVC Help
Powered by ViewVC 1.1.2