Techniques for improving performance on a higher traffic Drupal site

June 12, 2014

North Carolina High School Athletic Association website wins Drupal awardLast week we were thrilled to learn that a site we developed for the North Carolina High School Athletic Assocation (NCHSAA) was selected as Best Sports Website in the 2014 Blue Drop Awards.

Over the coming weeks we'll write a few posts highlighting some of the features of the site. In this post, I want to focus on some of the performance techniques we used to ensure quick page loads.

Score ticker

One of the key requirements for the site redesign was to implement a score ticker based on a feed of data from NCHSAA-affiliated schools. Although I encourage you to visit nchsaa.org to interact with the score ticker (Frank did an outstanding job writing it in jQuery and making it adapt well to mobile devices), here is a screen shot showing what it looks like:

The feed of data is generated by MaxPreps, a site which crowd-sources data from athletic events.

The problem is that the data retrieved from MaxPreps comes in at a few megabytes; much more than we would want to serve to the client when they are visiting the homepage, especially if they are coming from a mobile device. The feed data is so large because there can be multiple sports in the ticker, with multiple divisions, and each division can have dozens or hundreds of games.

To deal with this issue, we did a few things:

  1. On page load, we check a custom cache bin (cache_nchsaa_banners) to see if we have data for each sport that we want to display in the ticker. Each row in the cache bin corresponds to a valid sport, e.g. boys_basketball or girls_soccer, etc. (As an aside, Lullabot's guide to caching is an invaluable resource to those new to Drupal and caching.)
    • (a) If there is no data in the cache, then we call out to the MaxPreps API. MaxPreps might send hundreds of games for a particular sport, and a large number of these may belong to schools that are not affiliated with NCHSAA, so we make a further optimization by removing all of these entries.
    • (b) Once we've pared down the data, we run it through a function that generates the HTML that we'll output into the page; that HTML is then cached.
    • (c) A div is output to the page, <div id="banners-ajax"></div>.
  2. Next, our client-side code kicks in.
    • (a) On page load, the client's browser makes a call to /api/v1/banners/onload/json and downloads about 8KB of data that populates the visible part (#banners-ajax) of the score ticker block: the names of the sports that have data, the divisions for the first sport, and the games for the first division loaded for the first sport. So in the screenshot, the labels "Baseball", "Soccer (M)" and "Class 1A - 4A" are loaded, along with the game data for Class 1A. But no data is downloaded for the other sports or divisions.
    • (b) From there, when a user clicks on other sports or divisions, API calls are made from the client-side code to URLs like /api/v1/banners/soccer/json or /api/v1/banners/soccer/division-1a/json. In this way, we limit the amount of data requested from the client and served by the site to just what the user has requested.

The trade-off with this approach is that there is a small delay as the user clicks from one sport to the next; but for us this was far preferable to the alternative of sending hundreds of kilobytes or 1-2 MB to the client.

Sports menu

Another performance optimization we made was under the "Sports" menu. Each sport menu item contains some "fast facts" about the sport (start and end of season, play-off dates, etc). All of this is managed in each sport node in a specific field. To load the data, we JSON-encoded an array of sport names and their fast facts, then output it at the top of the page (this comes out to around 24 KB).

<meta name="nchsaa-sports-menu" content='<?php print _nchsaa_core_get_sports_menu(); ?>'>

This data is cached, so it loads very quickly. A further optimization would be to load it only once the user hovers over the Sports menu, but we did not implement that just yet. Below you can see the code that is used to check the cache, or load data fresh from the database.

/**
 * Get a JSON encoded array of sports menu Fast Fact items.
 */
function _nchsaa_core_get_sports_menu() {
  if ($cache = cache_get('nchsaa_sports_menu_data')) {
    $menu_data = $cache->data;
  }
  else {
    $result = db_query("SELECT n.nid FROM {node} n WHERE n.type = 'sport' AND n.status = 1");
    $data = array();
    foreach ($result as $record) {
      $node = node_load($record->nid);
      $alias = "/" . drupal_get_path_alias(sprintf('node/%d', $node->nid));
      $wrapper = entity_metadata_wrapper('node', $node);
      $fast_facts = $wrapper->field_season_fast_facts->value();
      if (count($fast_facts) && isset($fast_facts['safe_value'])) {
        $data[$alias] = $fast_facts['safe_value'];
      }
      else {
        $data[$alias] = '';
      }
    }
    $menu_data = drupal_json_encode($data);
    cache_set('nchsaa_sports_menu_data', $menu_data, 'cache', CACHE_TEMPORARY);
  }
  return $menu_data;
}

Varnish and Memcache

The site overall benefits tremendously from having a Varnish cache sit in front of it. What is Varnish?

Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture. (source)

Since there are relatively few authenticated users (authenticated user caching is a very different set of problems, and for that I recommend you read "Authenticated User Caching Concepts in Drupal 7"), most of the site's content is served directly from the Varnish cache, instead of requiring time and resource consuming database queries.

Memcache is also enabled in production, along with Entity Cache. This further reduces the load on the database server and makes retrieving content quicker.

Kudos to Acquia Cloud for making it incredibly simple to enable and configure both of these services.

Opt for simpler implementations instead of module "open buffet"

Wherever possible, we opted to create simpler implementations rather than adding another module, and thus tried to avoid the "open buffet" problem that we see frequently in Drupal sites.

As an example, one of the requirements of the site was to implement an ad tracking service. Of course, an ad tracking service could be something very simple or something very complex. In the project planning process we determined that what was really needed was a way to report metrics to NCHSAA's sponsors on how many people had clicked on a sponsor's banner ad.

Since we were already using Google Analytics, it was straightforward to set up an Advertisement content type with a field for uploading an image, and then use a theme template to generate Google Analytics tracking code for each Advertisement node. No additional modules, and as a bonus we were able to re-use the functionality offered by an existing module (Google Analytics).

Develop with an eye towards performance

The examples above are indicative of the overall approach we took when building the site: simpler is better, avoid adding modules if you don't really need them, and load data only when the user asks for it.

This quick overview of some of the performance techniques we used on the NCHSAA aims to spark ideas while highlighting aspects of the NCHSAA project. Let me know if you have any questions or suggestions for how to do things differently.


Comments

Really nice work. Love the site, and you highlighted some really good techniques for getting and storing data that would otherwise be expensive to generate. The delay on the calendar switching isn't even noticeable since the JSON response is cached by Varnish.

Thanks for your comment! Much appreciated.

Great article. I've been looking into stats data myself for a while now on a project Im currently working on. You mention the maxpreps api, but I see nothing when I go over to their website. Is there a special way to access their api?

You'll need to get in touch with MaxPreps. You need to have an account with them to have API access. From there, you call a particular URL with some parameters (e.g. ?apikey=KEY&sport=soccer&gender=boys) to get data back.

Hi Kosta,

I recently reached out to MaxPreps tech department and they indicated there was no MaxPreps API...is this still functional for your drupal site or have you switched to a new data source?

Apart from improvements in codes, one could also tweak web servers and caching packages. I have used two different stacks, LAMP and another one. While LAMP is easy to setup but it is not really that fast. The other stack I have using and currently using is Apache, nginx, Memcached, Varnish and PHP-FPM, which is the stack support by Cloudways platform (https://www.cloudways.com/en/drupal-cloud-hosting.php ) that I am using to host my website. This stack is optimized to focus on speed through caching and quick processing of PHP codes with FastCGI. I would recommend you try this stack or this platform and run benchmarks of your Drupal website to see how much it will improve the performance.