The performance of a web application plays a critical role in how an application is perceived by its users. It is important to measure it, identify the causes if it changes and react swiftly to any unexpected changes. This article describes an industry leading tool, New Relic, and how it can be used to monitor and improve your site performance. Setting up a good web application monitoring system can be tiresome, but it's well worth it. Without the monitoring tools the only thing we could tell is if our site is performing as expected or not. In order to improve the performance we have to be able to identify the worse performing user actions and profile them independently to pinpoint the cause. New Relic achieves that and more in just a few screens, all without manually adding any profiling code to your application.
New Relic is a real-time application monitoring service, providing various metrics about the performance of your production site, covering everything from application database queries through to the time it takes for the end-user to view a page. This data is then collected, post-processed and converted to simple and clean charts presented in the New Relic web interface. Since the New Relic agent has to collect and report the data, it does add some overhead to the application stack. Unless you're running a service that has to respond in a few milliseconds, however, the overhead added is minimal and is far outweighed by the value of the reports enabling you to detect and solve problems early.
This article covers both the basic functionality of New Relic (that can be used for free) as well as describing what the Enterprise version has to offer.
The New Relic installation is split into several distinct components:
- the agent component – a PHP extension, which collects the data and reports to a locally running New Relic daemon.
- the daemon component, acting as a proxy between the PHP agents and New Relic datacenters. The main responsibility of the daemon is to reduce the time of reporting the data to New Relic.
- New Relic reporting suite – the main New Relic website, where the data is presented for the user.
Both the agent and the daemon components are installed using the provided newrelic-install script. The script will detect the available PHP installations and deploy the agent extension to all of them.
Please refer to the official documentation for more detailed information regarding the installation and configuration of the New Relic package on your specific platform.
Forever. Seriously. That’s right. Free. Just the basics.
While the free version of New Relic does not have all the bells and whistles of the Enterprise version, it does provide some basic, yet useful, feedback regarding your site performance.
If you are interested in an overview of how your application performs at PHP level, this is the chart to look at. It displays the average execution time of your PHP scripts in real time, split into separate layers by their execution type:
- time spent to execute database queries
- time spent in PHP
- external web service calls
Depending on which of the application layers take the most time, different optimisation (or scaling) techniques can be applied.
- The time spent in the database can usually be reduced by doing one or more of the following:
- analysing the queries that are executed, ensuring that they are using indexes correctly or creating new indexes for them
- caching the most frequently-accessed and computationally-expensive result sets from the database
- optimising the structure of your application’s data using techniques such as database partitioning
- scaling up the database layer (e.g. adding more slave nodes)
- The PHP time is the time that your application is processing the data and good tactics for bringing this down could include:
- caching intermediate results
- optimising the code, using faster algorithms
- adding more webnodes (if the hardware limits are reached)
- A few things that can help to optimise the network operations:
- cache everything that can be cached;
- introduce an asynchronous job queue if possible.
Even if the performance of your application is good, there is no guarantee that the users of the site will get it loaded within a reasonable timescale. The New Relic chart for browser page load time provides an overview of how your website is performing.
Similar to the application performance overview shown previously, this chart is composed of several separate layers:
- Web application time tells how long it takes for your application to process the requests. More information about this layer can be seen in the application performance chart shown earlier in this article.
- Network time is the time that is spent purely for the user request to travel to your application server and then for the response to reach the browser, disregarding the time spent in the application itself. To optimise this component one would need to review the network performance of your system architecture by considering the following questions:
- do you compress the response sent to the site users (depending on Accept-Encoding)?
- is it just that the response is huge and some of it can be loaded later?
- is the bandwidth used reaching its limits and you simply need a faster network link?
- would distributing your servers worldwide (see more on this later) help?
- set caching headers so that static files could be cached by the client
- reduce the number of DOM elements
- use a CDN for the static files
- optimising image files
- using CSS sprites instead of multiple images
- using correct caching headers
- using a CDN to bring your content closer to the user (there's also a chart for that, shown later in this post)
An extended list of rules and how they affect your website’s frontend performance can be found at Best Practices for Speeding Up Your Web Site. Also, there are several tools that can help you to analyse the frontend performance such as Yahoo’s YSlow or Google’s PageSpeed.
More information about the Real User Monitoring functionality can be found at How Does Real User Monitoring Work? (New Relic documentation) or How we provide real user monitoring: A quick technical review (New Relic blog).
Even if your application is performing very well, it is only performing this way given the request rate at that time. As the number of users on the site increases, new bottlenecks “appear” – which can slow the overall usage of the site or even bring it down. As a result, the rate your application is handling the requests is at least as important as the time it takes for your application to send the response.
There are two separate throughput lines available – one for the browser requests and one the application. The browser throughput tells how many pages were requested per minute. Some of those requests may be served from cache before even reaching the application server, other pages may include additional application requests via Ajax. Thus – the two lines may be completely different and suggest different optimisation targets.
In addition to measuring your website performance in time, New Relic provides an Apdex score, which tells how many of your site visitors were satisfied, tolerating or frustrated by the response time of the application.
Once the target times for the browser and application servers are set, it will be used to calculate the Apdex ratio:
- satisfied requests are all requests that have completed in less than the target time (T), and “pull” the Apdex score towards 1.0
- tolerating requests are those which have taken more than T, but less than 4*T. These requests are given the score of 0.5
- frustrated requests are all the rest, and their Apdex score is 0.0
The main difference between using the Apdex ratio and the application response time, is that no one request (outlier) can affect the global ratio more than any other. This makes it a more scientific metric for the global overview of your site’s performance if your goal is to answer the question “what proportion of the site visitors see a page loaded quickly enough?”.
For more information about the Apdex score see the New Relic documentation about this metric.
Worldwide site delivery
Is your site performing well for local users? What about the users overseas? The Internet is really fast these days, however it is not instant. The further your user physically is from your servers, the longer distance the information packets will have to travel.
To get a glimpse of how your site is performing for different countries, you could look at the worldwide Apdex chart. New Relic also provides more detailed information for the enterprise customers.
Since the performance problem in this case is usually due to the global network speed, there is no fix that can be applied locally – you’ll need to bring your service closer to the user. Depending on your application needs one or more of the following measures can be employed:
- use a CDN to serve static files from local servers
- use local dynamic content caching servers for slower areas
- implement your service locally for the slower areas
“The Total Package!”
The basic functionality that New Relic offers for free can give us a lot of valuable insights about the global site performance. We can see what areas need more attention than others and this alone can save some precious time while optimising the site. Yet, it does not provide some of the (sometimes crucial) information; where exactly is the bottleneck?
In addition to the free Lite account New Relic offers two more plans (Standard and Pro) which extend the basic reports and introduce some new ones, allowing you to drill down to the root of performance problems quickly and efficiently.
Application profile traces
One of the best features offered to help debug performance problems is the comparison between different web transactions and the ability to see timed application traces of slow calls. New Relic provides charts similar to those described above for each web transaction type (provided that New Relic supports the framework you’re using). Also, a list of slow transaction traces is included with the detailed information.
There are already several tools available to profile your PHP code, such as Xdebug and XHProf. Xdebug is a really powerful development tool as well as offering profiling capabilities. XHProf is simple to configure and relatively easy to use, and there are also companion tools such as XHGui which make life even easier. So what is different about New Relic?
The code profiling trace that New Relic provides is a call tree with only Incl. Wall (absolute and relative) information. This tree alone is not very well suited for a generic code analysis since it does not provide the count of how many times a method was invoked, nor its total time during the application run. The power of it is that it is integrated with all other New Relic features and is easily accessible for a quick review once a slower transaction is detected. In addition to PHP code profiling, New Relic also provides a separate report for slow SQL statements, with their execution times and call counts.
With the help of these integrated traces, finding slower pieces of the application code is a straightforward task, helping to keep the focus on the site as a whole while still being able to detect problems and pinpoint them to the method level.
Compare with historical data
In addition to displaying the current state, New Relic also provides a comparison mode. When this mode is turned on, all the basic charts are affected – in addition to the current data they now also provide information from one day and one week ago. This mode is especially useful to show whether the site is performing any better (or worse) than before.
How well is your website performing under load? The easiest way to answer that is to look at the scalability chart that New Relic provides. The chart plots the application response time versus the throughput.
This chart can quickly give you an idea about how well your website is responding given that there are a certain number of requests per minute. If the response time is constant as the throughput increases then your site is performing well. However, if you notice that the response time is increasing together with the throughput then it is time to take action. Finding the bottleneck using New Relic should now be an easy task using the database and application code profiling tools described above.
More information about this chart can be found in the New Relic blog.
New Relic is an amazing service to monitor your web application. It is simple and powerful – all the numbers are presented in such a way that a quick glance to the chart enables one to tell a lot about the site’s performance. In this blog post we have reviewed the common problems that New Relic can help us to detect and provided several suggestions of how to fix them.
Also, it is probably worthwhile mentioning that while New Relic is very good at what it does, it is a service to monitor your application and it usually works best if combined with a separate system to monitor the server resources or the performance of each service you’re using – understanding how the whole application ecosystem behaves is essential in order to build a stable and well performing web service.