JP's blog

Tech talk with a french twist.

Front End Performance Case Study: GitHub

A couple weeks ago, someone in my Twitter timeline retweeted the following tweets from DOM Monster:

In HTML5, quotes around attribute values are optional. Which means, omit them. Smaller, faster, less clutter.

Here’s an example. This page from GitHub contains about 4,517 quote characters. Most of these can be omitted.
https://github.com/rails/rails/commit/3756a3fdfe8d339a53bf347487342f93fd9e1edb

This is a very specific and interesting performance optimization suggestion. I never really thought about it much or even looked at it to see what kind of performance impact, the omission of quote characters might have on a web page. However, after reading these tweets, I started to wonder: how well is GitHub doing on front end performance?

A quick look at the Chrome dev tools with an empty cache for the GitHub url mentioned above showed a total load time of 5.43 seconds (onload at 3.33 and DOM ready at 1.33s) and an HTTP request count of 73 for a total of 1.36MB. That seemed a little high (for reference, the HTTP archive trends reports an average of 84 requests for a total of about 1MB). As some of you who know me from the SF web performance meetup, I really like dissecting web pages and figuring out what might make a given page slower.

Disclaimer: I do not work for GitHub and I have not contacted GitHub prior to writing this blog post. They might be aware of some of the issues and might be working on the front end performance of GitHub already. In any case, I would love to learn what they think of the following analysis.

Webpagetest

One of my favorite tools in my performance toolbox is WebPagetest. WebPagetest is a front end performance analysis tool that crawls a web page and generates a performance report card. It includes a lot of performance metrics, waterfall charts for first view and repeat view, page speed/yslow type of overview scores, … Also, it runs on a variety of real browsers (you can choose between IE, Chrome, Firefox, etc).

I ran the previous GitHub url and you can see the results using the following links:

I tried to run the test with Chrome as well but encountered some issues with SSL in WebPagetest (I haven’t looked at the problem at this time and I am not sure if the issue is with Chrome or WebPagetest).

Let’s dive in!

Javascript blocks rendering

The problem

According to WebPagetest, the rendering starts pretty late (at around 6 seconds in Firefox and 3 seconds in IE).

In most browsers, a javascript <script> tag will block rendering as the browser is loading and parsing the javascript code. The reason for this is fairly simple: a javascript file might include code that could modify the HTML document via a document.write() for instance (boooh, don’t do that) or any other DOM manipulation technique. As this might affect the rendered page, browsers usually stop any rendering activity while parsing and evaluating javascript.

If you look at the waterfalls, the rendering start on GitHub is delayed by 2 javascript files (from the file name it seems that one is the jQuery source code and the other is the actual GitHub javascript code).

Solutions

The 2 <script> tags are located at the top of the HTML document in the head. A performance best practice recommends to place the javascript <script> tags at the bottom of the HTML document (right before the </body> closing tag) in order for the browser to start parsing the file as late as possible.

Another approach could be to use an asynchronous script loading technique (via a script loader such as LABjs or ControlJS; a custom solution as described here would work too).

This should allow the page rendering to start sooner.

Reduce the size of downloads

The problem

WebPagetest is accounting for 1,425KB of data transferred. That is quite a lot.

The worst offender in GitHub’s case seems to be the Gravatar 3rd party service. Gravatar is a popular service that allows to show a personal avatar for users of the interweb based on their email address. In the case of that GitHub page, Gravatar is loading 34 images for a total of 500KB. That is more than a third of the payload.

The second worst offender is the host called camo.githubapp.com. It seems to be serving 2 images in our example. At first, I was a bit puzzled about that host as I could not quite figure out why GitHub was serving directly 2 images worth 500KB (that is another third of our total payload) instead of using their CDN. After a bit of research I found the Camo project. GitHub is enforcing SSL for every page. When GitHub users write comments and insert funny cat pictures to illustrate their point, the said cat pictures might be served from a non SSL host. This will cause a nagging mixed content warning from the browser. Camo is basically an SSL image proxy that solves the issue. See more on this blog post. Note that one of the 2 images is taking 7 seconds to load.

Solutions

We have about 1,000KB of images. A common practice is to compress and optimize images. There is a lot of tools out there for that purpose. WebPagetest performance review tab estimates that some of the images can be compressed and save about 40KB. Note that it does not include one of the largest images, which is a animated gif proxied through Camo. I sent the gif to Smush.it and it came back 50KB lighter. Bottom line: potential savings would be around 10%. Compressing images might not be easily achievable. I don’t think it would be a good idea to compress them on the fly (it would take some extra CPU resources to do that through Camo for instance). A better approach might be to have the image compressed at the source (in the case of Gravatar, when a user uploads an avatar). That would probably require a significant effort and I am not sure it’ll be worth the gain.

An alternative could be to implement asynchronous image loading. When you first load the GitHub page, none of that content is visible (all the images are below the fold). Knowing that, the HTML page could be generated with a placeholder for the image which would be loaded later on via javascript. The javascript could be listening for scroll events and monitor the visibility of the images in order to decide when to load them. Pamela Fox wrote about that not long ago. You can also see the technique in action on Meetup’s website (for instance visit this meetup page and look at the sidebar with the list of attendees while you scroll). Note that for users browsing with javascript disabled, they would miss the images. I am not sure that would be a deal breaker as I am assuming most of GitHub users have javascript turned on.

Reducing the number of HTTP requests

The problem

WebPagetest shows 71 HTTP requests. Trying to reduce those numbers is usually a good idea. There is always some overhead with HTTP request, so the less you do, the better it is.

I mentioned earlier the presence of 2 <script> tags in the head. I am wondering if there is a specific reason for them to not be combined together. The same goes for the CSS stylesheets (github-xxxx.css and github2-xxxx.css).

GitHub uses some sprites which is good and avoids some HTTP requests. I am wondering if GitHub would see any gain by combining sprites together (aka ├╝ber-sprite ). I am looking at the various icons in the png files (in WebPagetest, under the waterfall, click the link “View all Images”). It looks like mini_button_icons.png, repostat.png, public.png, file_modes.png, jump.png, diffstat.png… could all be combined into one.

Some pages include several individual icons which could be sprited. For instance a github profile/dashboard page has all the timeline icons in individual files (pull_request.png, push.png, fork.png, issues_closed.png, issues_comment.png, watch_started.png, etc).

Solutions

Well, I guess I spoiled that one. You’ve read it above. Combine all the things!s

The double sometimes quadruple logo

The problem

In the waterfall for Firefox, you can see 2 logos being loaded (the regular black version and the alternative hover version in blue: [email protected] and [email protected]).

In the waterfall for IE, you’ll notice something interesting: GitHub is loading 4 logos! It is loading a smaller version (both regular and hover) as well as a higher resolution one (I am assuming given the 4x in the filename that is intended for retina displays). It seems to be a bug in the code. The smaller logos are enclosed in a conditional HTML comment targeted at IE. The conditional has an if branch but not a else branch. So when the browser is not IE, the if branch is ignored and the high resolution logos are used. But when the browser is IE, the if branch is executed loading the smaller logos. Then the high resolution logos are loaded as well. I am actually wondering why does IE need the smaller logos? Are the higher resolution logos not suitable for IE?

Solutions

The change between regular and hover is done with a CSS :hover rule that tweaks the logos opacity to 0 or a 100 to hide or show the desired version. The two version of the logos could probably be sprited together and used as CSS background image instead of inline <img> tags. Another approach: always load the regular black logo and lazy load (or preload) the hover logo via javascript. The hover effect could be achieved via javascript as well.

For the issue specific with the IE conditional comment, maybe look at removing the conditional comment all together.

Caching issue with IE over SSL?

The problem

With Firefox, WebPagetest summary shows 82 requests and 1,420KB for the first view. The repeat view seems to be using caching and performs only 12 requests for a total of 65KB. Great!

With IE however, WebPagetest shows that a repeat view executes 45 requests for a total of 1,329KB (first view was 71 for 1,425KB). Looking at the various requests details (click on a filename in the waterfall), I can see that a far future expire header is set as well as the Cache-Control header for max-age. According to this page on MSDN, this might be a known issue with IE (or more accurately with WinINET)

Epic pull requests can be big

The problem

Some pull requests on GitHub can affect a lot of files and generate a really long diff. Take this one for instance. Look at the diff. It generates an HTML document of 18.48MB (1.05MB gzipped). It takes 10 seconds in Chrome before getting the first byte of that page.

Solution

This is a tough one. My first thought would be to look at asynchronous requests and reusing the viewport visibility trick mentioned above. Generate the diff skeleton without the diff content for each file. Then, use pjax to load the diff of each file. pjax could be triggered in 3 cases: First, after the initial page load, the diff of each file could be loaded one by one. Second, if a user clicks a specific file anchor at the top, then immediately load that diff via pjax. Third, as the user scrolls up or down the page, maybe some javascript code could detect which diff is visible and load it via pjax?

Closing words: A note about performance metrics

You know the saying “Premature optimization is the root of all evil”. Related: don’t optimize without measuring first.

Measure. Optimize. Monitor. Rinse & Repeat.

The above analysis is looking at a very specific page but it was chosen fairly randomly from a tweet.

A better and more proper approach would be to setup first some kind of real user monitoring (such as Boomerang, New Relic or the new Torbit Insight) and find which pages are the worst offenders. Then start optimizing. After that, monitor the performance metrics and see if the optimization works as expected and if metrics are improving.

Monitoring is one thing, but using web analytics in addition to performance metrics is also important to help drive optimizations decision. For instance, let’s imagine that GitHub’s visitors using IE represents less than 1% of all GitHub’s visitors. In this case, fixing the issues specific to IE (such as double logo or SSL caching) might not be the most important optimization. Now, let’s imagine that GitHub’s visitors are in 99% of the case repeat users. At which point, they might be visiting GitHub with a primed cache (which would include assets such as avatars and icon images).

In the end, is GitHub slow? I actually don’t think so. When I use GitHub, I don’t have a perception of slowness or lag. Perceived performance is an important concept to consider (by opposite to the hard numbers discussed above, which tend to say “GitHub is slow”). Yet, I am wondering what would be the performance business case for GitHub? Even if GitHub does not feel slow, faster is always better. In the case of e-commerce websites or search engines, there is a direct correlation between performance and revenue (Amazon, Bing, Google, Walmart, they all shared numbers over the past few years). In the case of GitHub, what is the performance business pitch? Would a faster GitHub translate to more users? Would that then drive more revenue?

Thanks to Ryan Bigg for reviewing this post and providing great feedback.

Possibly related posts

Comments