1 Fuzzy Search Module Read Me Project Home: http://drupal.org/project/fuzzysearch
5 To install this module simply:
6 1. Install as usual, see http://drupal.org/node/70151 for further information.
7 2. Configure permissions for which roles can: Search content, Administer the
8 modules, View scoring information, and View debugging information.
9 3. At admin/build/block put the "Fuzzy search form" into a region in your theme.
10 4. Configure the module at /admin/settings/fuzzysearch.
11 5. Run cron until your site is 100% indexed.
12 6. Consider using a stopwords file to keep common words from bloating your
13 index. See fuzzysearch/stopwords/README.txt
14 7. Set up a regular cron job to keep your site fully indexed.
16 === What is indexed? ===
18 Currently this module indexes all filtered node content, taxonomy terms
19 associated with the nodes, cck text fields associated with a node, comments left
20 on the node, and any text being returned by the call to hook_nodeapi with the
23 === Fuzzysearch Index Settings ===
25 --Any time you change an index setting you must reindex your site for
26 the change to take effect.--
28 * You can choose the ngram length. This is the size of the chunks words are
29 broken into on indexing and searching. The default value is 3. The lower the
30 value, the more results (and more noise) you will get. Also, a lower value will
31 increase the size of your fuzzysearch_index table.
33 * Nodes to index per cron run: Adjust this lower if php is timing out during
34 cron and Fuzzysearch is the culprit.
36 * HTML tag scoring settings: A score is assiged to each of the tags listed. You
37 can adjust this to your preference. If, for example, links are especially
38 important, raise the a tag score. If you don't want any extra importance for
39 titles, set the h1 tag to 0.
41 * Rebuild index: Check this to requeue all nodes for indexing without clearing
44 * Rebuild and clear index: Check this to requeue all nodes for indexing and
45 clear the index. Searching will be incomplete until all nodes are once again
48 === Fuzzysearch Search Settings ===
50 Assume missing letters in search terms:
51 A search term as entered by a user may be missing letters. If you want to search
52 for longer words than the user has entered, you can increase this number. In
53 English for example, you will need at least 1 if you want to return a plural
54 search term ending with "s" from a singular word. 0 means the term will not
55 return longer words in the results. 1 means a 4 letter search term will also
56 check 5 letters words in the index.
58 Assume extra letters in search terms:
59 A search term as entered by a user may have extra letters. If you want to search
60 for shorter words than the user has entered, you can increase this number. In
61 English for example, you will need at least 1 if you want to return a singular
62 word from a plural search term ending with "s". 0 means the term will not
63 return shorter words in the results. 1 means a 5 letter search term will also
64 check 4 letters words in the index.
67 When indexed, each ngram is saved with the percentage of the word it belongs to.
68 "app" is 33.33% of "apple" because it is one ngram out of three (app, ppl, ple).
69 When searching, Fuzzy search lets you specify a minimum sum of percentages of
70 the ngrams it finds in each node. A lower number lets in more noise. Enter a
71 value between 0 and 100 to set the completeness required in the returned results.
73 It is best to set this value 10 points below your ideal minimum percentage. So
74 if you wanted to match results with at least 50% of the word matching, set this
75 value to 40. The match is calculated per indexed word and not by the search
76 phrase, ensuring that matches are relevant to the words in the phrase and not
77 just all the letter combinations in the phrase.
79 Also note that when a phrase matches more than a single word the completeness
80 can be higher than 100%, this is because the completeness of each word is summed
81 and then sorted as a measure of accuracy in the result set.
83 * You can filter results output by node type. This does not affect search
84 indexing, so you don't have to reindex if you change this.
86 *Checking the "Display scoring" checkbox is helpful for debugging when you are
87 trying to fine tune score modifiers. It will output completeness and score
88 values under each of the returned results.
90 === Fuzzysearch Display Settings ===
92 * Search results path:
93 Choose the search results path, for example: search/results. Do not use leading
94 or trailing slashes. The path must be unique in your site.
97 If selected, the results will be sorted by score first and completeness second,
98 which can make tag scores even more important. The default is to sort by
99 completeness first. You may want to try this if you find high scoring nodes
100 being pushed down in the results below lower scoring nodes with higher
103 *Checking the "Display debugging information" checkbox can also help you
104 understand how Fuzzysearch queries the index. You'll see the query and the
105 ngrams and also the regex used when highlighting misspelled or partial words.
107 *Result excerpt length:
108 Set the length of the displayed text excerpt surrounding a found search term.
109 Applies per found term.
111 Maximum result length:
112 Set the maximum length of the displayed result. Set to 0 for unlimited length.
115 Minimum spelling score:
116 Fuzzysearch tries to highlight search terms that may be misspelled. You can set
117 the minimum threshold, which is calculated as a ratio of ngram hits to misses in
118 a term. 0 may cause a misspelling to highlight everything, and 100 will only
119 highlight exact terms. Enter value between 0 and 100.
121 * Fuzzy Search will try to highlight misspelled words. You can
122 adjust the accuracy by setting a minimum spelling score, which is calculated as
123 a ratio of ngram hits to misses in a term, from 0 to 100, where 100 means no
124 misspellings are highlighted.
126 This works by replacing bad (misspelled, missing letters, extra letters) ngrams
127 with a wildcard. It is possible to get false matches. For example, searching for
128 "rendition" will also highlight "condition" if your spelling score is low
129 enough. However, these kinds of matches are likely to have lower score
130 completeness and be sorted to the bottom of your results if your search term
131 exists in your content.
133 === Fuzzy Search Blocks ===
135 * Fuzzy search form: This block provides the form where users will entier the
138 * Fuzzy search title query: The Drupal 6 version provides a block that performs a
139 fuzzysearch on a query in the path. This may be performance intensive, so use
140 with caution. If the fuzzysearch query is in the path, the block will return
141 search matches of node titles. It's up to you to put the query in your path like
144 http://example.com/node/add/question?fuzzysearch=arthritis%20knees%20pain
146 This would be good for similar content blocks, or to suggest existing content
147 before letting the user create new content.
151 The module provides a template, fuzzysearch-result.tpl.php, that
152 you can copy to your theme folder and modify. This affects the search results
153 page. There are some theme functions you can override to theme the fuzzysearch
154 block, and you can also override block.tpl.php.
156 === About Fuzzy Search ===
158 This module provides a fuzzy matching search engine for nodes.
159 Nodes are indexed when the site's cron job is run. The module automatically
160 queues a node for indexing once it is submitted, updated or a comment has been
161 made on it. Nodes can also be queued for reindexing by other modules when the
162 function fuzzysearch_reindex($nid, $module) is called, Where $nid is the nid of
163 the node to have reindexed and $module is a string containing an identifier of
164 the module calling for the node to be reindexed.
166 Fuzzy matching is implemented by using qgrams. Each word in a node is split
167 into 3 (default) letter lengths, so 'apple' gets indexed with 3 smaller strings
168 'app', 'ppl', 'ple'. The effect of this is that as long as your search matches
169 X percentage (administerable in the admin settings) of the word the node will be
170 pulled up in the results. One issue that is inherent with this method is cases
171 when a user searches for a word like 'athens' which contains the word 'the'
172 within it and has a completeness of 100%. In order to account for this
173 larger length words qgrams must match qgrams from words with a similar length.
174 This is an imperfect solution but it does a good job of returning the most
177 === Fuzzysearch Submodules ===
179 Fuzzysearch comes with the following example submodules:
181 1. fuzzysearch_filter_example
182 When enabled, this module provides an example of how to use
183 hook_fuzzysearch_filter().
185 See the API section below for information about this hook.
187 === Fuzzysearch API ===
189 === hook_fuzzysearch_score($op, $node) ===
191 This hook allows other contributed modules to modify the score of any
192 node being indexed. This affects nodes, not words. Site administrators can then
193 set how important these modifiers are to their particular site's use. Changing
194 the modifier score to 0 means that the modification being returned by that
195 particular module will have no effect on the scoring of the nodes on the site.
196 Setting the modifier to 10 means it will have maximum effect.
198 This simple example code from a contributed module implementing the scoring hook
199 returns a score multiplier of 5 if the node author is user 1. Any time a node is
200 changed fuzzysearch will apply the modifiers on the next cron run. You must
201 reindex your site to affect existing nodes, or resave the nodes.
204 * Implementation of hook_fuzzysearch_score
205 * @param $op 'settings' returns array with information about the module (seen
206 * in the admin settings form) 'index' returns a score modifier to the node
210 function custom_fuzzysearch_score($op, $node) {
215 'title' => t('Author is User 1'),
216 'description' => t('This multiplier lets you increase the score of nodes authored by user 1.'),
221 $score = $node->uid == 1 ? 5 : 0;
231 === hook_fuzzysearch_index($node) ===
232 Before fuzzysearch indexes a node, other modules have the chance to change the
233 node or prevent it from being indexed. If a module implements this hook and
234 returns FALSE the node will not be indexed. If it already existed in the index
235 it will be removed. Modules should check that they have a node object to work
236 with, as another module may have already returned FALSE.
238 Changes returned to the node object are only reflected in the fuzzysearch index,
239 not in the node as saved in the database.
241 Some uses for this hook include preventing a node type from being indexed,
242 boosting a node's search score for certain words, or adding additional text to
243 a node. This is slightly different than hook_nodeapi's update index operation in
244 that you can replace or change parts of the node rather than just adding text.
246 Example code of a contributed module implementing the indexing hook:
248 // Prevent private nodes from being indexed by fuzzy search.
249 function custom_fuzzysearch_index($node) {
250 if (!is_object($node) || $node->type == 'private') {
258 === hook_fuzzysearch_filter($op, $text) ===
260 Hook_fuzzysearch_filter($text) gives modules an opportunity to filter the the text
261 to be indexed before it is indexed and/or searched. The common use for this is
262 to do more complicated filtering than is allowed by the stop words text files.
264 $op == 'index' will filter words on indexing of content. $op == 'search' will
265 filter the search terms before the index is searched for results.
267 === About The Author ===
269 Drupal 6 version maintained by awolfey.
271 This module was created for Drupal as part of Google Summer of Code 2007 by
272 Blake Lucchesi www.boldsource.com blake@boldsource.com