Level up your New Relic monitoring

Every Acquia subscription comes with a New Relic Pro account for application performance monitoring (APM). If you haven't tried it, claim your Pro account today.

It may take some time to develop familiarity with New Relic's interface and also the unique performance attributes of your application, but don't be intimidated. In this hands-on guide, I explain what you will find in New Relic and leave you with four, simple tips that you can implement today to get more out of your New Relic monitoring solution. 

Overview of APM

The “APM & Services” page is a good place to start your exploration of New Relic. You will find four graphs that provide an overall picture of your application’s health: Web transactions time, Throughput (in requests per minute), Error rate, and Apdex score.

Take a look at the primary graph — Web transactions time. By default, the last 30 minutes of application activity are displayed, but you may adjust the reporting window to reveal longer periods such as 24 hours, 3 days, even 12 months. 

example of Web transactions time graph

For most applications, Web transactions time should average one second (1000 ms) or less. On high performing applications, response times of 500 ms or better are achievable, but your mileage may vary depending on your Drupal site architecture and any custom functionality enabled by your application.

How do I know if my application is healthy?

Compare the "Web transactions time" graph with the "Throughput" graph. As a general rule, the average response time should remain about the same when throughput increases significantly. In other words, your application should not bog down whether it is fulfilling 5,000 PHP requests per hour, or 10,000 (the graph's trendline should be relatively even).

If you see dramatic up and down variation in your graphs, start digging into APM data until you understand the root cause for the variation. Perhaps your application has a few slow function calls or heavy cron processes that may be dragging down the average. On the other hand, if response times increase only when Throughput increases, this may indicate poor caching. 

With a little practice, you should be able to identify your current, optimal performance baseline simply by scanning these graphs visually. Adjust your reporting window until identify a healthy looking period of time where the trendline is relatively "flat" — this represents your current baseline!

What is Apdex?

What are the business objectives for your application? Is it to generate qualified leads? Reduce time to complete a task? Whatever your business objectives, you should be monitoring — and aiming to improve — your application's performance over time. Use your Apdex score to accomplish that goal!

Apdex (short for “Application Performance Index”) is a standard industry methodology that attempts to distill numerous technical data points into a single, user satisfaction score. A perfect score of 1.0 means that 100% of your application's users are "satisfied," whereas a score (below 0.5) is considered unacceptable. In Apdex terminology, a score below 0.5 means your users are “frustrated.” A more middling score of 0.7 suggests users are “tolerating” your application performance — but that means performance could be better!

One challenge with the Apdex is that no two applications are the same. New Relic’s default T value (i.e. "Threshold value") assumes all PHP applications will take an average of 0.5 seconds to fulfill dynamic requests. This may not be realistic for you, therefore I want to show you how to recalibrate your T value using your real-world production data. 

Tip #1: Calibrate the Apdex score

To ensure your Apdex score is a meaningful metric for measuring success, recalibrate its T value. You will need your App ID which you can locate by visiting the services page. Click on “APM & Services,” then click the small “…” adjacent to your application in the list.

Once you have this App ID, select “Query Your Data” from the main navigation menu. Enter the query below (replace the App ID where appropriate):

select percentile(duration, 70) from Transaction where appId=<AppId> since 12 hours ago

What this formula will do is calculate a new performance threshold that uses 70% percentile data from your application to yield an Apdex score of 0.85. With your new T value generated, complete the calibration by updating the Application settings — see the screenshot below. 

Menu > APM & Services > Application

Now wait a few days and return to New Relic confirm your Apdex score is trending at around 0.85 i the graph. If it isn't, you may re-run the calculation with different parameters until your healthy time frames reflect an Apdex score of 0.85.

It is okay to leave room for improvement when you calibrate your Apdex. The goal is here is to measure and improvement and guard against regressions. It is very gratifying to see your average Apdex score improve over several weeks or several months. On an annual basis, consider recalibrating your Apdex to reflect changes in your application and business expectations over time.

Tip #2: Configure performance monitoring

As application owner, you need to know when your users are frustrated.  Apdex score is a good proxy for performance, so let's use it for alerting purposes!

Navigate to the “Alerts & AI” page, select “Alert Conditions & Policies,” and choose the option to create a “New alert policy.” Name your monitor “Apdex” and accept the other defaults provided. On the next screen, choose “APM” as shown: 

settings screen

Click through a few more screens to get to where you will “Define thresholds.” It's generally smart to have the alert triggered whenever Apdex score drops below 0.5 for more than 5 minutes.

settings preview

Since New Relic records all your incidents, an effective alert will enable your team to review past performance for future optimizations.

Tip #3: Configure uptime monitoring

Okay, this one's easy. New Relics ping monitors will poke your site from multiple locations around the world. Hopefully your site is always responding. New Relic uses this day to compute your application's uptime (daily, weekly, and monthly). You can even download reports in CSV format to share with your team!

Everyone's goal is 100% uptime, but let's see how you do! From the main menu, select “Synthetic Monitoring.” Click where it says “Create monitor” then choose “Ping” as monitor type. Finally, enter the URL of your Acquia-hosted application.

Note: I recommend entering your origin hostname, which on Acquia takes the form https://[docroot].prod.acquia-sites.com — where [docroot] is the machine name of your application. Although you could enter your vanity domain, customers with CDNs or 3rd party proxy may find a basic "Ping" check provides a false uptime reading (i.e. if your CDN serves a cached version of your homepage, but your Drupal CMS is not actuallly responding). 

Tip #4: Implement deployment markers

Here's how to your monitoring strategy full circle. New Relic enables you to "mark" code deployments and directly correlate changes in your custom code with changes in your application's performance. Obviously, you can use this data your advantage to demonstrate the business value of developer time spent on performance and optimization tickets! 

In the example below, we see deployment markers overlaid on the Apdex score graph. Notice how, over successive code deployments, the Apdex score dropped from blue (excellent!) to green (“satisfied”) and ultimately to yellow (which is borderline “frustrating”). If you do not have a good monitoring strategy you may not notice these regressions until it is too late and poor application performance is causing bigger problems for your business. 

graph showing markers with reduced performance after each marker

There are several ways to deployment markers: You can use the New Relic CLI, implement a custom script using PHP, or simply add the New Relic module to your Drupal codebase and configure it to deployment markers automatically whenever your site configuration changes.

Happy monitoring!