Replacing PhantomJS and CasperJS with JSDOM and Chai in our Drupal Content Testing Pipeline

Replacing “PhantomJS and CasperJS” with “JSDOM and Chai” in our Drupal Content Testing Pipeline

Replacing “PhantomJS and CasperJS” with “JSDOM and Chai” in our Drupal Content Testing Pipeline

May 12, 2020

DesignHammer has a long relationship with Drupal, Jenkins, CasperJS, and PhantomJS. You can read through the history of the project in our previous blog posts:

In 2017, the maintainer of PhantomJS (the headless browser which powers our content tests), stepped down from the project leaving it unmaintained. Over the last few months, we started experiencing very long build times (over 11 hours!) for our pipeline, specifically during the CasperJS/PhantomJS test suite. After a whole lot of troubleshooting, we narrowed the problem down to a concurrency issue when running multiple instances of PhantomJS causing hangs.

PhantomJS hangs caused unreasonably long build times.

Since an update for PhantomJS was certainly not forthcoming, we decided to take the plunge and completely rewrite the content tests.

We had several goals with rewriting the system:

  1. Run faster We needed to fix the excessively long build times which were causing recurring build failures. We also wanted to try and shave off some of the time the content tests take when running normally as they have always been the longest portion of the build.
  2. Better logs and errors We wanted to make the test failure messages more clear and reduce the logspam for passing tests (6 lines per test, 21,000 tests!).
  3. Simpler For obvious reasons we wanted to reduce the footprint of the test suite. This meant writing less code, using fewer dependencies, and relying on a simpler concurrency model.
  4. Report, not fail Most test failures are due to simple typos that are easily corrected and not particularly impactful. Test failures should be logged and reported but should not cause the build to fail entirely.

To accomplish these goals we rebuilt the content test suite using JSDOM. JSDOM provides a pure-Javascript implementation of the DOM without actually implementing all of the components required for a full-featured web browser. Before choosing JSDOM, we considered simply replacing PhantomJS with Headless Chrome or another modern scriptable browser. However, our tests do not rely on testing browser behavior. Rather, we care about making sure the server has rendered the source data properly. We need to fetch a page from the site and then query the DOM of that page to find content (titles, links, citations, etc.) and compare it to the ground truth from our source data. JSDOM gives us that capability without a lot of additional overhead.

We then re-implemented the individual CasperJS tests using the Chai assertion library. We’ve used Chai on several other projects, typically alongside the Mocha or Karma unit-testing frameworks, and like the readability of its expect() assertions. For this project, we skipped the unit testing framework and just wrapped the assertions.

An example test using JSDOM and Chai looks like:

return JSDOM.fromURL(url)
  .then(dom => {
    const content = dom.window.document.body.textContent

    expect(content.includes('Successfully located Drupal entity')).to.equal(true, 'Loaded Drupal entity')
    expect(content.includes('We found more than one entity matching')).to.equal(false, 'Found multiple entities')
    expect(content.includes('Loaded node title')).to.equal(false, 'Loaded node title')

    expect(dom.window.document.title).to.containIgnoreSpaces(name, 'Page title includes correct Item name')

    // If URL is provided, check for that.
    if (externalUrl) {
      const loadedUrl = dom.window.document.querySelector('.field-name-field-external-url a').href
      // Here we normalize the source URL. This allows us to ignore certain differences and be
      // generally more permissive since Drupal will do the same.
      let normalizedUrl = externalUrl.toLowerCase()
      normalizedUrl = normalizedUrl.replace(/[#|/]$/, '') // Remove trailing # or /
      expect(loadedUrl.toLowerCase()).to.containIgnoreSpaces(normalizedUrl, 'URL was migrated')
    }
  })
  .catch(err => {
    switch (err.name) {
      case 'StatusCodeError':
        errorLog(err.statusCode + ' error while loading ' + err.options.uri)
        break
      case 'AssertionError':
        errorLog('\nTest failure while checking node "' + name + '"\n' + url + '\n' + err.toString())
        break
      default:
      errorLog('Error while checking node "' + name + '"\n' + url + '\n' + err.toString())
    }
  })

We have a test file like this for each type of content we are testing. A main.js file imports these content-specific tests and builds a set of jobs. In total, the system generates ~21,000 jobs. Each one is a function that loads a page on the target website into a JSDOM object and runs a set of assertions on the content using Chai. Errors or assertion failures are caught and logged.

We then push these jobs into a queue and run it. We use the queue package for this. The queue is run with a concurrency of 4. This is a big speed win because, if you recall from a previous post, we had been using the Groovy pipeline to handle parallelization of the content tests broken down by content type. We ran multiple content types at once, but within a single content type, the tests ran serially. This meant that the content tests could never run faster than the longest individual content test.

By building a queue of ALL jobs and running them concurrently we were able to preserve concurrency throughout the entire build resulting in a major speedup.

The system summarizes the results of the tests and writes them to a log file that is linked in Slack and email notifications. It is easy to just click the link and see the failures without having to scroll through thousands of log lines. The summary also contains details about resource usage and timing information for the tests so we can get granular data about how long the tests take to run and how much memory they consume.


Content Test Error Log 2020-05-07 16:23:22
Ran 21839 tests in 28.27 minutes
Used ~231.40 MB of memory

-------------------------------------------------------------------
Therapeutic Areas - 33 tests, 0 errors in 23.5 seconds (cumulative)

-----------------------------------------------------------------
Publications - 6884 tests, 0 errors in 38.82 minutes (cumulative)

------------------------------------------------------------
Journals - 1359 tests, 2 errors in 6.75 minutes (cumulative)

Test failure while checking Journal "Dicp"
https://example.com/api/journal/00763
AssertionError: expected invalidurl to contain http://www.example.com ignoring spaces

Test failure while checking Journal "Apple Food B"
https://example.com/api/journal/00211
AssertionError: expected title to contain apple food b ignoring spaces

----------------------------------------------------------------
Staff Members - 671 tests, 0 errors in 3.78 minutes (cumulative)

-------------------------------------------------------------------------
Publication Authors - 12892 tests, 0 errors in 63.32 minutes (cumulative)

The new structure also ensures that test failures no longer cause the build to fail. Now, errors are logged but the build continues. We can review the error log and resolve content issues as they are identified but simple typos or character encoding problems will not prevent content from being imported.

Finally, we knew that implementing the concurrent queue and using JSDOM would result in some amount of speed up in the tests. We weren’t certain before we started how much of an improvement we would see. Getting the content test time under three or four hours seemed reasonable. With everything implemented and tested in Jenkins on the production server, and running with a concurrency of 4, the total content test time is just under 30 minutes to test all 21,000 records. The total build process time is running at about 2 hours. This was an approximately 90% reduction in total build time from our longest builds.

Updating to queue, JSDOM, and Chai resulted in a major speedup for our build.

We are very happy with the results of this new JSDOM and Chai combination and are looking forward to the future of our content test pipeline as we look to Drupal 9 and beyond.


Add new comment