Some of my favorite projects are website performance audits. Hunting for opportunities to make an application faster and leaner can be rewarding. Recently Tyler Thompson and I had a chance to dive into an e-commerce website’s codebase and search for performance opportunities during a quick two-week sprint.
Setting the stage
Our target for this project was a feature-rich e-commerce site. Time is money, and in e-commerce this is especially true: if your customers grow impatient and leave, you lose money. The project goals were simple: choose appropriate performance metrics, identify bottlenecks, propose solutions, and implement those solutions when possible. We’d only have two weeks for the entire project, from inspecting the codebase to profiling the application to implementing solutions.
The site was a React application and used the Next framework for server and client rendering. Behind the scenes Yarn managed dependencies and (like all Next applications) Webpack bundled the application.
Performance is a loaded word
But wait, what do we mean by website “performance”? A website with good performance could show something to readers quickly upon load, or it could respond quickly to interaction. Where performance is measured also makes a big difference: if a site feels fast on a developer’s laptop but slow on a user’s phone, is it still performant? Many times we talk about performance in a qualitative way, using words like “fast” or “sluggish”. Descriptors are great, but if we want to fix something, it’s best to be able to measure it.
For this project we chose a performance metric called “Time to Interactive,” or TTI. TTI lives up to its name: it’s the time between when a user requests a page and when they can interact with it. Specifically it’s the time for the page to display useful content, to register event handlers for visible page elements, and to respond to user interactions quickly (within 50 milliseconds).
Why does it need to be so complicated? Couldn’t we just measure the time to load a page? For better or for worse, a modern webpage has many moments that could be described as a “page load”—markup has downloaded, or content is shown to the user, or all required scripts have been executed, or all styles have been applied, or all images have been displayed. TTI is a convenient moment in this process; it’s intuitive, and—most importantly—it’s user-focused. Improving TTI will almost certainly improve a user’s experience.
Measuring time to interactive with Lighthouse
TTI might be daunting to define, but with the right tool it’s easy to measure. Google’s Lighthouse provides an excellent performance auditing suite, and it's conveniently included with Chrome. (Lighthouse can perform other audits as well, from accessibility and SEO, but we’ll focus on its performance audit.)
Lighthouse delivers a wealth of information about the audited webpage: an overall performance score; metrics like times to first byte, first contentful paint, first meaningful paint, interactive, and CPU idle; diagnostics detailing which aspects of the page affected performance; and a prioritized list of opportunities for improvement.
We targeted mobile devices with our Lighthouse performance audits. This means that the viewport, processor, and network will all simulate a mobile device. The CPU will have a 4x slowdown, and the network connection will be throttled to fast 3G speeds. We targeted mobile devices for a few reasons: it’s easiest to start by optimizing for one metric rather than many, mobile users accounted for half of the website’s traffic, and almost all improvements to the mobile experience will also benefit desktop users.
We also chose to audit the homepage of the website. This page gets many visitors; improvements here would impact the rest of the site.
Conveniently, Lighthouse provides a prioritized list of opportunities, distinguishing large impact efforts from meaningless ones. For example, it showed us that deferring offscreen images would reduce TTI by about nine seconds, but deferring unneeded CSS would only reduce TTI by about 0.6 seconds.
We saw that the largest single dependency was a 400 kB chat widget developed by an internal team. This chat widget is helpful for customer service, but is shown at the bottom of the page. This means that we can treat it as offscreen and non-critical. Non-critical components are perfect candidates for lazy loading. Lazy loading is the process of splitting a module or component into a separate bundle. This new bundle is downloaded only when needed, after the primary bundles have executed and a user can interact with the page. Using React Loadable we split the chat widget out of the main bundles and instantly reduced the main bundle’s size by 13%.
We also saw that we could replace a few third-party dependencies. object-hash was used to key JSON objects; we replaced it with fast-stable-stringify, which was 99% smaller. A release candidate of cheerio included two HTML parsers; we downgraded to a stable version that included a single parser and saved 190 kB.
Since the project uses Yarn to manage dependencies, we took advantage of its resolutions feature to remove some duplicate dependencies. These duplications arise with “transitive” dependencies, which are not direct dependencies of the project but instead dependencies of those direct dependencies. Yarn’s resolutions control which versions of transitive dependencies are used, essentially overriding your own dependencies’ package.json files. This feature should be used with caution and the results should definitely be tested!
In our project we found 12 duplicated transitive dependencies that could be de-duplicated to close (minor or patch) versions. This group contained heavy libraries, including two extra versions of lodash (67 kB apiece) and react-dom (90 kB). After automated and manual testing, we confirmed it was safe to remove 250 kB of duplicated dependencies.
One of the largest single dependencies in our bundle was a metadata file inside moment-timezone. It included timezone details for every global timezone from 1900 to 2038, weighing in 180 kB. Using Webpack’s NormalModuleReplacementPlugin we replaced it with a custom 1.5 kB file that included only the geographies and years we needed for this project. (Others have noticed this giant dependency, too: recently Wordpress’ Gutenberg project chose to stop importing the metadata file altogether.)
Reducing image overhead
Lighthouse’s number one recommendation was deferred loading of offscreen images. One of the easiest ways to do this (in modern browsers) is to use an Intersection Observer to detect when an image is onscreen and only then load it. The handy (and tiny!) react-intersection-observer package provides a simple React abstraction.
Refactoring every component that loaded an image (either in an img component or as the background image for a div) would take a bit of time. We were on a tight timeline, so we decided to focus on large images. Just lazy loading 30% of the images on the homepage improved time to interactive by 8%.
Reliable and automated performance testing
We wanted to test the results of our efforts reliably and easily. A complex test like Lighthouse has many moving parts—factors unrelated to code changes can affect a test. It’s best to run a performance test multiple times so that you can be more confident in the results.
Using Lighthouse’s UI inside of Chrome is great for one-off tests, but it’s cumbersome to run multiple times. Luckily Lighthouse and Chrome can both be automated. The Google Chrome team maintains lighthouse and chrome-launcher packages, so writing Node scripts to control the profiling process is easy. We wrote a simple script to execute batches of multiple lighthouse runs. We then combined the data to ensure we had statistically significant results.
Lessons and results
We exited the project with a few important lessons. First, large bundles don’t just hurt download time; they must be parsed, compiled, and executed, so they have an outsized effect on performance. Second, images shouldn’t be ignored; deferring image loading can have amazing benefits. Third, any other offscreen work—like loading chat components—should also be deferred when possible.