I’ve only been with Acquia for three weeks as a intern and I have to say I love every moment of it. I work on the Operations and Hosting Engineering team and last week, my second week, I was placed on a on-call rotation for Operations. A on-call rotation is like being an E.R. Doctor though my patients were servers. You have to set up your phone and laptop to receive ‘critical’ notifications from our internal monitoring and notification system as well as be prepared with all the tools you could possibly need to handle any situation. Being on call is like being a technology boyscout, always prepared.
I won’t lie to you, I was very nervous - two weeks working at Acquia and being trusted with handling any event that could potentially arise - from a server failing, database servers falling out of sync (which occurs rarely) or handling any type of request that comes in from any of our clients that needs to be resolved right away.
My week on call wasn’t too rough; the worst that occurred was a web server hard drive died at 3AM on Saturday and I had to wake up and revive the server as quickly as possible. Rather than wait for the drive to be recovered, I took the web server out of rotation and replaced it with a brand new server, in less than 30 minutes. I’ll be honest, the next time this type of emergency occurs I will have it resolved in less than 20 minutes. The great thing is, our client’s website was never offline. If you’re wondering how that is, even our basic level of Acquia Managed Cloud hosting provides our customers redundant servers with automatic fail over at every level of the infrastructure. This particular web server happened to fail in a multi-tier configuration, pictured here:
I have experience working in hosting environments specifically for Drupal and I have to admit Acquia really has our magic recipe for all our systems down to a true science. We have built a world class monitoring infrastructure upon Nagios with many of our own special customizations. The operations team knows well before our customers ever notice an issue. Our goal is to have any issue resolved before our customers notice it. I have to say our uptime numbers are pretty impressive for as large an infrastructure we possess.
We now have over 1000 servers in operation supporting our Managed Cloud, Dev Cloud, and Drupal Gardens customers. The majority are in the United States however our fastest growing region is in Europe; with plans of deploying servers in the Asia-Pacific region very soon. Our average uptime status ranges from 100% to 99.98%. This statistic is the average from our daily, seven day and 30 day system up time statistics. You have to admit, that’s impressive.
I hope to follow this blog post up with many more about what Acquia’s Operations and Hosting Engineering teams do for you and everyone who’s a customer or partner of Acquia. If you have any specific topics you would like to learn about I’ll see what I can do for you.