A short time ago I published a presentation I gave at DrupalACT entitled 'Varnish for Beginners'. Whilst the presentation itself went down well and those attending hopefully garnered a good amount of knowledge, I thought I'd share the basics in this blog post for those who would like to know more about it.
What is Varnish?
Varnish is a reverse proxy HTTP accelerator that is often placed in front of Drupal sites to act as a first line of defense against the swathe of anonymous users who're likely wanting to view all the interesting content present. Because Varnish is a separate service, it doesn't matter whether the web server that the Drupal site runs on uses Apache, NGINX, Comanche, IIS or another piece of software, it'll just work.
Varnish itself acts as a transparent go-between between the users and the web server backend, with any content surfaced from the backend with the correct cache headers being stored for a limited time. The advantages of Varnish are many, with the main ones being:
-Serving content from an in-memory cache means no slow PHP execution and no slow MySQL queries;
-Varnish is capable of delivering at a rate that makes it an F-15 to vanilla Drupal's Cessna;
-The headers are respected entirely from Drupal so unless something is specifically overridden in the Varnish configuration, whatever Drupal says to cache, Varnish will cache.
All of these together make websites behind Varnish fast! Even though the effects are only felt by anonymous users, the majority of traffic for most sites is likely anonymous so the benefit would be great.
Whilst Drupal has a number of different potential caching strategies, it's arguably Varnish that provides the easiest to set up when balanced against speed benefits.
-Drupal's built in database cache system is locked to the database, a slow system compared to memory.
-The contributed module 'Boost' is generally an acceptable choice on memory poor servers but caches pages as files in the filesystem. This can lead to problems on network filesystems like gluster due to the high rate of file read/write operations performed.
-Memcache is a great choice that works well with Varnish although is primarily used for lower level cache (ie caching bootstrap modules and views). Memcache forms a cache backend that Drupal can interact with actively to set and get cached items where Varnish is a passive cache that Drupal does not know about.
-Alternative caches like Redis and MongoDB do exist, but in a similar way to Memcache, to act as an active cache the Drupal site can set and get items from. Drupal also requires a little work to get Redis or Mongo working which makes it a little more technical to implement.
Installation, configuring and VCL
Even though CentOS requires an additional repository, installing Varnish is trivial on Debian, Red Hat and OSX distributions.
(yum|apt-get|brew) install varnish
is enough to get Varnish on the server and ready to start. The default configuration options in /etc/sysconfig/varnish or /etc/default/varnish should likely be changed to listen on port 80 and have a memory limit sufficient for the server.
DAEMON_OPTS="-a :80 \ -u varnish -g varnish \ -T localhost:6082 \ -f /etc/varnish/adam.vcl \ -S /etc/varnish/secret \ -s malloc,256M";
The Varnish Configuration Language file can be used to route requests to the cache or the backend depending on logic defined in the VCL. Whilst the default VCL bundled with Varnish does an acceptable job of handling a cache and speeding up sites for anonymous users, a lot may be changed to make pages more cacheable. Following guidance provided by the fourkitchens VCL and with additional logic from the Varnish documentation pages you can ensure you have a cache that flies!
Shielding Drupal from the Internet
Whilst I'm a huge advocate and evangelist for Drupal, and the ease at which sites may be created and extended; it must also be said that when a Drupal site is exposed to the internet and the potentially thousands of users who hit the site, it can struggle. With this in mind, we need to shield Drupal from the anonymous users, bots and spammers so Drupal can go on managing content without the relentless barrage of hits.
The above picture shows a number of the backend connections that Drupal makes on a typical site. With data stored in the database, and cache stored in Memcache, these contribute to a lot of the network back and forth of the Drupal CMS. Occasional additional connections to Mollom to counter spam, Drupal's update service and Google services mean that each time a user request skips Varnish and makes it to the backend, there's more delay. Any additional interactions with a local or remote service will both increase the time taken for the response and potentially overload the server.
By stopping user requests getting to the backed we can prevent the execution of PHP and prevent slow queries to database and other services†
† Slow in comparison to fast in-memory caching from Varnish
If you're looking to provide a little proof to those who make the business decisions about implementing Varnish, if you want to compile some stats about the software, if there's debugging to be done, or if you just want to see some cool graphs there are a number of tools that Varnish provides for all these purposes
-varnishstat - Provides a live view of a number of stats provided by Varnish about the state of the ;
-varnishlog - Provides more information than is even possible to comprehend that can be grepped and processed to provide further information, usually for debugging purposes;
-varnishncsa - Gives an output from Varnish that mimics that of an Apache access log
-varnishtop - Varnish provides a ranked list of entries, especially useful when used with the -i RxURL / TxURL parameters to show top requests and top requests that miss Varnish.
-varnishhist - Best used with the '-d' flag, provides a histogram where the '|' indicates a cache hit and the '#' a cache miss; the units being hits vs time. An example varnish histogram is beneath for viewing enjoyment!
Checking Varnish works
One of the simplest way to ensure Varnish is working as expected is to query it using isvarnishworking.com, a quick and easy method of observing Varnish headers. This is a lazy way of either curling the site or inspecting headers using the browser which would produce the following results:
$ curl -kLsiXGET www.adammalone.net ...snip... Cache-Control: public, max-age=21600 Last-Modified: Tue, 17 Sep 2013 00:00:15 +0000 Date: Tue, 17 Sep 2013 09:32:36 GMT X-Varnish: 1607010525 1607008083 Age: 12741 Via: 1.1 varnish X-Varnish-Cache: HIT
Try Varnish out!
Since Varnish is so simple to set up and start using from a beginner level I'd recommend everyone try installing it to gain benefits from the speed boosts! All Acquia Cloud servers come with Varnish in front of Drupal so if you want the most hassle free way of seeing the benefits of Varnish then sign up to a free account and fly with the cache!