When and how caching can save your Drupal site

This is the first of a series of blog posts debating caching strategies in Drupal. In this first post we will understand what Drupal is able of doing out of the box regarding caching, and what are the options to extend it to achieve sites that perform normally under high load.
Unlike a static HTML website, Drupal pages consist of small building blocks that are rendered independently of one another before they are bundled together and sent to the browser as an atomic unit. Because Drupal is a dynamic content generation platform, there are a series of complex events that are executed behind the scenes in order to generate the page that is sent to the browser such as establishing a database connection, loading settings and modules, initializing a user session, mapping the URL to a PHP page callback function to run the application’s business logic, and collecting the fringe elements that surround the main content of the page.

Most of these steps are executed every time a page is rendered, but an effective caching strategy can minimize the resource consumption of these items by storing repeated calculations. This gets tricky with authenticated traffic, because some elements require slightly personalized pages which eliminates the ability to cache the page holistically. In these more complex situations, a lower level cache has to be utilized to store individual building blocks.
As more items are cached, it becomes increasingly important to pay attention to the backend that stored the information.
By default Drupal caches information directly in the database. This is evident by the many tables starting with “cache_” in the Drupal database.
Drupal has the ability to cache generated pages or parts used to render those pages and reuse that information to render again new pages.

Caching layers

Page caching
If these pages look similar in all situations, and the user context does not matter at all, pages can be cached and served directly from cache. There are several options where and how to cache rendered pages:

  • Reverse proxies (external caching): a reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more servers. Using a reverse proxy like Varnish is a popular option in high performance Drupal architectures. Varnish sits in front of the web servers and cache requests that do not have cookies associated. Varnish is extremely fast by saving all cache information in memory and save this information to disk when needed, can be used directly to save Drupal pages for anonymous users. Drupal 7 has native support for reverse proxies, while Pressflow is needed when integrating with Drupal 6 to guarantee no session cookies are present for anonymous requests. By default Varnish drupal module will use the page cache lifetime to define expiration date, however other modules like Expire can be used to expire varnish cache in situations where structures changed and the cache should be updated, guaranteeing fresh information is served when needed to visitors. Pages served directly from varnish usually contain debug information that help to understand if caching is working as expected. At Acquia we are huge fans of varnish, any page served from acquia cloud will have this varnish information, for instance, acquia.com:
    Via          1.1 varnish
    X-Cache      HIT
    X-Cache-Hits 9970
    X-Varnish    816323757 811625039
  • Boost: Boost is a module that uses a modified apache .htaccess configuration to guarantee that requests could be served from static disk files instead of dynamic generated by PHP. To allow that, the module generates html pages anytime a page is accessed and saves a real html file in the disk. On the next request, Drupal will check if an html static version is available and try to serve it before asking Drupal to generate a new version. Boost ships with a built in crawler that runs on cron and makes sure expired content is quickly regenerated for fast page loading. Pages that are served directly from the boost static cache will contain a short markup information in the end of the html code like
    <!-- Page cached by Boost @ 2012-03-05 10:55:30, expires @ 2012-03-05 16:55:30 -->
  • Normal caching: Drupal has the ability to save pages directly in the preferred caching backend. When page caching is active, Drupal would save cached pages for anonymous users anytime a page is generated. The next anonymous request will get the cached version instead of regenerating the page.

This strategy by default only works with anonymous requests. The main reason behind it is that it is normal that pages have customizations that look different to different users. It can vary from small messages showing “Welcome Joe Black” in the top of the page to customized blocks that shows user favourites links or content where he commented. As pages look most of the times different to different users they can not be cached, which implies that by default pages are not possible to be cached to authenticated users in Drupal.

When serving pages from the cache there are Drupal hooks that will not even be called (hook_boot and hook_exit are the only ones that are always called). Before enabling it it is important to guarantee that certain hooks will not need to be called in all page requests.

However there are solutions to mitigate this situation. Next time, we will look to strategies to achieve partial caching and speed up significantly Drupal even when allowing authenticated users to interact in your site.