In my previous post, I wrote about how multisite is a powerful mechanism for code management. In this post, I am going to talk about how to manage multi-site at an infrastructure level, and the considerations you should take when deciding upon your site architecture.
Gotchas and misconceptions
It's important to remember that multi-site is a way of managing code, not for deploying it - all of the scalability best practice that years of Drupal hosting has taught us applies as much to multi-site as it does to multi-docroot. You could have a single multi-site codebase deploying to any number of different server clusters. Just because your code is stored in a single codebase doesn't lock you into a single server, cluster, or configuration.
Multi-site doesn't need to mean a Single Point Of Failure
All sites in a multi-site share the same underlying Drupal core code, at a minimum. But as I've said, this doesn't mean that you are locked into any particular hosting model. The key consideration is how you manage updates; hosting a multi-site installation is not that different to hosting any other Drupal site.
Multi-site is a very efficient use of a server’s resources.
PHP is an interpreted language. This means that there is no compilation step when deploying code - you simply deploy your source code text files. When PHP executes, it needs to compile this source code to what is called opcode. Without an opcode cache, this compilation needs to happen on every single PHP request. An opcode cache keeps the opcodes in memory, removing the compilation step entirely, greatly speeding up execution. APC, an add-on, is the most commonly used opcode cache today but PHP 5.5 introduced an opcode cache, opcache, into the PHP trunk.
With multi-site, many sites share the same Drupal and module code. By extension, this means that many sites can share a single opcode cache. This saves memory, and frees that memory for other uses. For most use cases, memory is the most contended resource, so anything that frees up memory is very welcome.
If you run 50 sites from a single cluster, and one of them gets a huge traffic spike - does that mean the other 49 might be starved of resources? Certainly. But on the other hand, do you want those sites to reserve server capacity that isn't needed? Wouldn't you rather use that spare capacity to get you through the spike?
If the site that gets most of the traffic continues to use more than its fair share, you could spin it onto another cluster - or just increase the size of your existing one. The resources will be shared, so the 49 quiet sites will just dip into the resource pool as needed.
Dealing with traffic spikes is a challenge regardless of your code or server infrastructure. The most important aspect of performance management in any application is the quality of the code - a view that often gets lost as a generation of IT professionals take advantage of the fantastic power and flexibility we now have with modern hardware and Cloud environments. There's a balancing act to be struck between the investment in development hours and the cost of throwing extra hardware at a site under load. Acquia provides a lot of tools and services to try to ensure that customer code and configuration follows best practice, whether through self-service SaaS tools like Acquia Insight, or through our super-smart Professional Services or Support Teams.
Of course, if scaling up hardware is the best course of action, we can scale up Acquia Cloud hosted sites at any time, then scale them back down if or when the extra resources are no longer needed.
Database Server Load
Because many sites may potentially share a single database server, each can begin to impact others if they come under heavy load or have poorly optimised queries.
At Acquia, we have customers running scores of sites on multi-site installations and sharing a single database server. There's a lot of scope for optimisations in your Drupal architecture and code which can mitigate this potential issue.
Most multi-site installations don’t preclude you from running multiple database servers. Specifying a database server is a simple configuration change in settings.php.
While there's a lot of value in having shared code across many sites, it does mean in some cases, you need to be careful about managing updates. When you update Drupal core, you're updating all of your sites at once, which can be a double edged sword.
It is absolutely best practice to run the latest point release of Drupal and contributed modules - there are serious security concerns with not doing so. You don't want to have sites in the wild that are out of date. Running multi-site just means you need to test ALL of your sites up front, rather than doing them piecemeal.
This encourages you to follow Drupal best practices - if your code is in good shape, you can upgrade 50 sites in the time it would otherwise take to do one. You don't want per-site hacks.
Often Drupal or module updates require running update.php or
drush updb to make schema or configuration changes. This means that you need to run these updates across all of your sites' databases as soon as possible after you deploy your code.
This potentially could lead to a loss of data integrity or increased server load during the update process. In practice, most db updates I have run for production sites have been very lightweight and there has been no issue with this process. If you put each site into maintenance mode (as is recommended) before you run the updates, this is a non-issue.
In a previous job, I had a text file with all of my domains in it, one per line. I then ran a very simple shell script to run
drush updb for each site, with a brief pause between each update. It would be only slightly more effort to put each site into maintenance mode before running the updates, or to spawn multiple processes which ran the updates in parallel.
As always make sure you test in staging first. Once your process is in place, you can update 50 sites with just a couple of commands.
Running Cron for many sites that share a single database or web server can put a large load on the server, especially if the cron jobs run at the same time. This is true whether you are using multi-site or separate docroots, but its something we see placing a lot of load on servers from time to time.
It’s best to spread out your cron runs, running one for each site as far apart as possible to avoid concurrent load.
Scaling is hard, but we can help
Whenever I talk to people about Drupal or web development in general, either in my capacity at Acquia or in the Drupal community, I try to reiterate that this stuff is complex. But that's what appeals to me about open source, and Drupal in particular - people are ready to admit that we are pushing boundaries and that we can all benefit from each others experience.
Managing a lot of sites is a tough job, but the fantastic thing about working at Acquia is I get to interact with people who run some of the biggest and most complex sites in the world, every single day. If you are wondering what the best solution for your goals is, please get in touch. We're always happy to help.