| 1 |
// $Id: README.txt,v 1.1 2006/11/21 03:39:19 arto Exp $
|
| 2 |
|
| 3 |
NOTE: this module is currently in an alpha state. Come back in a bit unless
|
| 4 |
you're an experienced user and don't mind figuring things out on your own.
|
| 5 |
|
| 6 |
DESCRIPTION
|
| 7 |
-----------
|
| 8 |
This module provides static page caching for Drupal 4.7, enabling a
|
| 9 |
potentially very significant performance and scalability boost for
|
| 10 |
heavily-trafficked Drupal sites.
|
| 11 |
|
| 12 |
For an introduction, read the original blog post at:
|
| 13 |
http://bendiken.net/2006/05/28/static-page-caching-for-drupal
|
| 14 |
|
| 15 |
FEATURES
|
| 16 |
--------
|
| 17 |
* Maximally fast page serving for the anonymous visitors to your Drupal
|
| 18 |
site, reducing web server load and boosting your site's scalability.
|
| 19 |
* On-demand page caching (static file created after first page request).
|
| 20 |
* Full support for multi-site Drupal installations.
|
| 21 |
* Command line administration support (requires the drush module).
|
| 22 |
|
| 23 |
INSTALLATION
|
| 24 |
------------
|
| 25 |
Please refer to the accompanying file INSTALL.txt for installation
|
| 26 |
requirements and instructions.
|
| 27 |
|
| 28 |
HOW IT WORKS
|
| 29 |
------------
|
| 30 |
Once Boost has been installed and enabled, page requests by anonymous
|
| 31 |
visitors will be cached as static HTML pages on the server's file system.
|
| 32 |
Periodically (when the Drupal cron job runs) stale pages (i.e. files
|
| 33 |
exceeding the maximum cache lifetime setting) will be purged, allowing them
|
| 34 |
to be recreated the first time that the next anonymous visitor requests that
|
| 35 |
page again.
|
| 36 |
|
| 37 |
New rewrite rules are added to the .htaccess file supplied with Drupal,
|
| 38 |
directing the web server to try and fulfill page requests by anonymous
|
| 39 |
visitors first and foremost from the static page cache, and to only pass the
|
| 40 |
request through to Drupal if the requested page is not cacheable, hasn't yet
|
| 41 |
been cached, or the cached copy is stale.
|
| 42 |
|
| 43 |
FILE SYSTEM CACHE
|
| 44 |
-----------------
|
| 45 |
The cached files are stored (by default) in the cache/ directory under your
|
| 46 |
Drupal installation directory. The Drupal pages' URL paths are translated
|
| 47 |
into file system names in the following manner:
|
| 48 |
|
| 49 |
http://mysite.com/
|
| 50 |
=> cache/mysite.com/0/index.html
|
| 51 |
|
| 52 |
http://mysite.com/about
|
| 53 |
=> cache/mysite.com/0/about.html
|
| 54 |
|
| 55 |
http://mysite.com/about/staff
|
| 56 |
=> cache/mysite.com/0/about/staff.html
|
| 57 |
|
| 58 |
http://mysite.com/node/42
|
| 59 |
=> cache/mysite.com/0/node/42.html
|
| 60 |
|
| 61 |
You'll note that the directory path includes the Drupal site name, enabling
|
| 62 |
support for multi-site Drupal installations. The zero that follows, on the
|
| 63 |
other hand, denotes the user ID the content has been cached for -- in this
|
| 64 |
case the anonymous user (which is the default, and only, choice available
|
| 65 |
for the time being).
|
| 66 |
|
| 67 |
DISPATCH MECHANISM
|
| 68 |
------------------
|
| 69 |
For each incoming page request, the new Apache mod_rewrite directives in
|
| 70 |
.htaccess will check if a cached version of the requested page should be
|
| 71 |
served as per the following simple rules:
|
| 72 |
|
| 73 |
1. First, we check that the HTTP request method being used is GET.
|
| 74 |
POST requests are not cacheable, and are passed through to Drupal.
|
| 75 |
|
| 76 |
2. Next, we make sure that the URL doesn't contain a query string (i.e.
|
| 77 |
the part after the `?' character, such as `?q=cats+and+dogs'). A query
|
| 78 |
string implies dynamic data, and any request that contains one will
|
| 79 |
be passed through to Drupal. (This also allows one to easily obtain the
|
| 80 |
current, non-cached version of a page by simply adding a bogus query
|
| 81 |
string to a URL path -- very useful for testing purposes.)
|
| 82 |
|
| 83 |
3. Since only anonymous visitors can benefit from the static page cache at
|
| 84 |
present, we check that the page request doesn't include a cookie that
|
| 85 |
is set when a user logs in to the Drupal site. If the cookie is
|
| 86 |
present, we simply let Drupal handle the page request dynamically.
|
| 87 |
|
| 88 |
4. Now, for the important bit: we check whether we actually have a cached
|
| 89 |
HTML file for the request URL path available in the file system cache.
|
| 90 |
If we do, we direct the web server to serve that file directly and to
|
| 91 |
terminate the request immediately after; in this case, Drupal (and
|
| 92 |
indeed PHP) is never invoked, meaning the page request will be served
|
| 93 |
by the web server itself at full speed.
|
| 94 |
|
| 95 |
5. If, however, we couldn't locate a cached version of the page, we just
|
| 96 |
pass the request on to Drupal, which will serve it dynamically in the
|
| 97 |
normal manner.
|
| 98 |
|
| 99 |
IMPORTANT NOTES
|
| 100 |
---------------
|
| 101 |
* Drupal URL aliases get written out to disk as relative symbolic links
|
| 102 |
pointing to the file representing the internal Drupal URL path. For this
|
| 103 |
to work correctly with Apache, ensure your .htaccess file contains the
|
| 104 |
following line (as it will by default if you've installed the file shipped
|
| 105 |
with Boost):
|
| 106 |
Options +FollowSymLinks
|
| 107 |
* To check whether you got a static or dynamic version of a page, look at
|
| 108 |
the very end of the page's HTML source. You have the static version if the
|
| 109 |
last line looks like this:
|
| 110 |
<!-- Page cached by Boost at 2006-11-24 15:06:31 -->
|
| 111 |
* If your Drupal URL paths contain non-ASCII characters, you may have to
|
| 112 |
tweak your locate settings on the server in order to ensure the URL paths
|
| 113 |
get correctly translated into directory paths on the file system.
|
| 114 |
Non-ASCII URL paths have currently not been tested at all and feedback on
|
| 115 |
them would be appreciated.
|
| 116 |
|
| 117 |
LIMITATIONS
|
| 118 |
-----------
|
| 119 |
* Only anonymous visitors will be served cached versions of pages; logged-in
|
| 120 |
users will get dynamic content. This may somewhat limit the usefulness of
|
| 121 |
this module for those community sites that require user registration and
|
| 122 |
login for active participation.
|
| 123 |
* Only content of the type `text/html' will get cached at present. RSS feeds
|
| 124 |
and URL paths that have some other content type (e.g. set by a third-party
|
| 125 |
module) will be silently ignored by Boost.
|
| 126 |
* In contrast to Drupal's built-in caching, static caching will lose any
|
| 127 |
additional HTTP headers set for an HTML page by a module. This is unlikely
|
| 128 |
to be problem except for some very specific modules and rare use cases.
|
| 129 |
* Web server software other than Apache is not supported at the moment.
|
| 130 |
Adding Lighttpd support would be desirable but is not a high priority for
|
| 131 |
the author at present (see TODO.txt). (Note that while the LiteSpeed web
|
| 132 |
server has not been specifically tested by the author, it may, in fact,
|
| 133 |
work, since they claim to support .htaccess files and to have mod_rewrite
|
| 134 |
compatibility. Feedback on this would be appreciated.)
|
| 135 |
* At the moment, Windows users are S.O.L. due to the use of symlinks and
|
| 136 |
Unix-specific shell commands. The author has no personal interest in
|
| 137 |
supporting Windows but will accept well-documented, non-detrimental
|
| 138 |
patches to that effect.
|
| 139 |
|
| 140 |
BUG REPORTS
|
| 141 |
-----------
|
| 142 |
Post feature requests and bug reports to the issue tracking system at:
|
| 143 |
http://drupal.org/node/add/project_issue/boost
|
| 144 |
|
| 145 |
CREDITS
|
| 146 |
-------
|
| 147 |
Developed and maintained by Arto Bendiken <http://bendiken.net/>
|