| 1 |
// $Id$
|
| 2 |
|
| 3 |
DESCRIPTION
|
| 4 |
-----------
|
| 5 |
This module provides static page caching for Drupal 5.x, enabling a
|
| 6 |
potentially very significant performance and scalability boost for
|
| 7 |
heavily-trafficked Drupal sites.
|
| 8 |
|
| 9 |
For an introduction, read the original blog post at:
|
| 10 |
http://bendiken.net/2006/05/28/static-page-caching-for-drupal
|
| 11 |
|
| 12 |
FEATURES
|
| 13 |
--------
|
| 14 |
* Maximally fast page serving for the anonymous visitors to your Drupal
|
| 15 |
site, reducing web server load and boosting your site's scalability.
|
| 16 |
* On-demand page caching (static file created after first page request).
|
| 17 |
* Full support for multi-site Drupal installations.
|
| 18 |
|
| 19 |
INSTALLATION
|
| 20 |
------------
|
| 21 |
Please refer to the accompanying file INSTALL.txt for installation
|
| 22 |
requirements and instructions.
|
| 23 |
|
| 24 |
HOW IT WORKS
|
| 25 |
------------
|
| 26 |
Once Boost has been installed and enabled, page requests by anonymous
|
| 27 |
visitors will be cached as static HTML pages on the server's file system.
|
| 28 |
Periodically (when the Drupal cron job runs) stale pages (i.e. files
|
| 29 |
exceeding the maximum cache lifetime setting) will be purged, allowing them
|
| 30 |
to be recreated the first time that the next anonymous visitor requests that
|
| 31 |
page again.
|
| 32 |
|
| 33 |
New rewrite rules are added to the .htaccess file supplied with Drupal,
|
| 34 |
directing the web server to try and fulfill page requests by anonymous
|
| 35 |
visitors first and foremost from the static page cache, and to only pass the
|
| 36 |
request through to Drupal if the requested page is not cacheable, hasn't yet
|
| 37 |
been cached, or the cached copy is stale.
|
| 38 |
|
| 39 |
FILE SYSTEM CACHE
|
| 40 |
-----------------
|
| 41 |
The cached files are stored (by default) in the cache/ directory under your
|
| 42 |
Drupal installation directory. The Drupal pages' URL paths are translated
|
| 43 |
into file system names in the following manner:
|
| 44 |
|
| 45 |
http://mysite.com/
|
| 46 |
=> cache/mysite.com/0/index.html
|
| 47 |
|
| 48 |
http://mysite.com/about
|
| 49 |
=> cache/mysite.com/0/about.html
|
| 50 |
|
| 51 |
http://mysite.com/about/staff
|
| 52 |
=> cache/mysite.com/0/about/staff.html
|
| 53 |
|
| 54 |
http://mysite.com/node/42
|
| 55 |
=> cache/mysite.com/0/node/42.html
|
| 56 |
|
| 57 |
You'll note that the directory path includes the Drupal site name, enabling
|
| 58 |
support for multi-site Drupal installations. The zero that follows, on the
|
| 59 |
other hand, denotes the user ID the content has been cached for -- in this
|
| 60 |
case the anonymous user (which is the default, and only, choice available
|
| 61 |
for the time being).
|
| 62 |
|
| 63 |
DISPATCH MECHANISM
|
| 64 |
------------------
|
| 65 |
For each incoming page request, the new Apache mod_rewrite directives in
|
| 66 |
.htaccess will check if a cached version of the requested page should be
|
| 67 |
served as per the following simple rules:
|
| 68 |
|
| 69 |
1. First, we check that the HTTP request method being used is GET.
|
| 70 |
POST requests are not cacheable, and are passed through to Drupal.
|
| 71 |
|
| 72 |
2. Next, we make sure that the URL doesn't contain a query string (i.e.
|
| 73 |
the part after the `?' character, such as `?q=cats+and+dogs'). A query
|
| 74 |
string implies dynamic data, and any request that contains one will
|
| 75 |
be passed through to Drupal. (This also allows one to easily obtain the
|
| 76 |
current, non-cached version of a page by simply adding a bogus query
|
| 77 |
string to a URL path -- very useful for testing purposes.)
|
| 78 |
|
| 79 |
3. Since only anonymous visitors can benefit from the static page cache at
|
| 80 |
present, we check that the page request doesn't include a cookie that
|
| 81 |
is set when a user logs in to the Drupal site. If the cookie is
|
| 82 |
present, we simply let Drupal handle the page request dynamically.
|
| 83 |
|
| 84 |
4. Now, for the important bit: we check whether we actually have a cached
|
| 85 |
HTML file for the request URL path available in the file system cache.
|
| 86 |
If we do, we direct the web server to serve that file directly and to
|
| 87 |
terminate the request immediately after; in this case, Drupal (and
|
| 88 |
indeed PHP) is never invoked, meaning the page request will be served
|
| 89 |
by the web server itself at full speed.
|
| 90 |
|
| 91 |
5. If, however, we couldn't locate a cached version of the page, we just
|
| 92 |
pass the request on to Drupal, which will serve it dynamically in the
|
| 93 |
normal manner.
|
| 94 |
|
| 95 |
IMPORTANT NOTES
|
| 96 |
---------------
|
| 97 |
* Drupal URL aliases get written out to disk as relative symbolic links
|
| 98 |
pointing to the file representing the internal Drupal URL path. For this
|
| 99 |
to work correctly with Apache, ensure your .htaccess file contains the
|
| 100 |
following line (as it will by default if you've installed the file shipped
|
| 101 |
with Boost):
|
| 102 |
Options +FollowSymLinks
|
| 103 |
* To check whether you got a static or dynamic version of a page, look at
|
| 104 |
the very end of the page's HTML source. You have the static version if the
|
| 105 |
last line looks like this:
|
| 106 |
<!-- Page cached by Boost at 2006-11-24 15:06:31 -->
|
| 107 |
* If your Drupal URL paths contain non-ASCII characters, you may have to
|
| 108 |
tweak your locate settings on the server in order to ensure the URL paths
|
| 109 |
get correctly translated into directory paths on the file system.
|
| 110 |
Non-ASCII URL paths have currently not been tested at all and feedback on
|
| 111 |
them would be appreciated.
|
| 112 |
|
| 113 |
LIMITATIONS
|
| 114 |
-----------
|
| 115 |
* Only anonymous visitors will be served cached versions of pages; logged-in
|
| 116 |
users will get dynamic content. This may somewhat limit the usefulness of
|
| 117 |
this module for those community sites that require user registration and
|
| 118 |
login for active participation.
|
| 119 |
* Only content of the type `text/html' will get cached at present. RSS feeds
|
| 120 |
and URL paths that have some other content type (e.g. set by a third-party
|
| 121 |
module) will be silently ignored by Boost.
|
| 122 |
* In contrast to Drupal's built-in caching, static caching will lose any
|
| 123 |
additional HTTP headers set for an HTML page by a module. This is unlikely
|
| 124 |
to be problem except for some very specific modules and rare use cases.
|
| 125 |
* Web server software other than Apache is not supported at the moment.
|
| 126 |
Adding Lighttpd support would be desirable but is not a high priority for
|
| 127 |
the author at present (see TODO.txt). (Note that while the LiteSpeed web
|
| 128 |
server has not been specifically tested by the author, it may, in fact,
|
| 129 |
work, since they claim to support .htaccess files and to have mod_rewrite
|
| 130 |
compatibility. Feedback on this would be appreciated.)
|
| 131 |
* At the moment, Windows users are S.O.L. due to the use of symlinks and
|
| 132 |
Unix-specific shell commands. The author has no personal interest in
|
| 133 |
supporting Windows but will accept well-documented, non-detrimental
|
| 134 |
patches to that effect (see http://drupal.org/node/174380).
|
| 135 |
|
| 136 |
BUG REPORTS
|
| 137 |
-----------
|
| 138 |
Post feature requests and bug reports to the issue tracking system at:
|
| 139 |
http://drupal.org/node/add/project_issue/boost
|
| 140 |
|
| 141 |
CREDITS
|
| 142 |
-------
|
| 143 |
Developed and maintained by Arto Bendiken <http://bendiken.net/>
|
| 144 |
Ported to Drupal 5.x by Alexander I. Grafov <http://drupal.ru/>
|
| 145 |
Miscellaneous contributions by: Jacob Peddicord, Justin Miller, Barry
|
| 146 |
Jaspan.
|