A couple weeks ago, someone in my Twitter timeline retweeted the following tweets from DOM Monster:
In HTML5, quotes around attribute values are optional. Which means, omit them. Smaller, faster, less clutter.
Here’s an example. This page from GitHub contains about 4,517 quote characters. Most of these can be omitted.
This is a very specific and interesting performance optimization suggestion. I never really thought about it much or even looked at it to see what kind of performance impact, the omission of quote characters might have on a web page. However, after reading these tweets, I started to wonder: how well is GitHub doing on front end performance?
A quick look at the Chrome dev tools with an empty cache for the GitHub url mentioned above showed a total load time of 5.43 seconds (onload at 3.33 and DOM ready at 1.33s) and an HTTP request count of 73 for a total of 1.36MB. That seemed a little high (for reference, the HTTP archive trends reports an average of 84 requests for a total of about 1MB). As some of you who know me from the SF web performance meetup, I really like dissecting web pages and figuring out what might make a given page slower.
Disclaimer: I do not work for GitHub and I have not contacted GitHub prior to writing this blog post. They might be aware of some of the issues and might be working on the front end performance of GitHub already. In any case, I would love to learn what they think of the following analysis.
One of my favorite tools in my performance toolbox is WebPagetest. WebPagetest is a front end performance analysis tool that crawls a web page and generates a performance report card. It includes a lot of performance metrics, waterfall charts for first view and repeat view, page speed/yslow type of overview scores, … Also, it runs on a variety of real browsers (you can choose between IE, Chrome, Firefox, etc).
I ran the previous GitHub url and you can see the results using the following links:
I tried to run the test with Chrome as well but encountered some issues with SSL in WebPagetest (I haven’t looked at the problem at this time and I am not sure if the issue is with Chrome or WebPagetest).
Let’s dive in!
According to WebPagetest, the rendering starts pretty late (at around 6 seconds in Firefox and 3 seconds in IE).
<script> tags at the bottom of the HTML document (right before the
</body> closing tag) in order for the browser to start parsing the file as late as possible.
This should allow the page rendering to start sooner.
Reduce the size of downloads
WebPagetest is accounting for 1,425KB of data transferred. That is quite a lot.
The worst offender in GitHub’s case seems to be the Gravatar 3rd party service. Gravatar is a popular service that allows to show a personal avatar for users of the interweb based on their email address. In the case of that GitHub page, Gravatar is loading 34 images for a total of 500KB. That is more than a third of the payload.
The second worst offender is the host called
camo.githubapp.com. It seems to be serving 2 images in our example. At first, I was a bit puzzled about that host as I could not quite figure out why GitHub was serving directly 2 images worth 500KB (that is another third of our total payload) instead of using their CDN. After a bit of research I found the Camo project. GitHub is enforcing SSL for every page. When GitHub users write comments and insert funny cat pictures to illustrate their point, the said cat pictures might be served from a non SSL host. This will cause a nagging mixed content warning from the browser. Camo is basically an SSL image proxy that solves the issue. See more on this blog post. Note that one of the 2 images is taking 7 seconds to load.
We have about 1,000KB of images. A common practice is to compress and optimize images. There is a lot of tools out there for that purpose. WebPagetest performance review tab estimates that some of the images can be compressed and save about 40KB. Note that it does not include one of the largest images, which is a animated gif proxied through Camo. I sent the gif to Smush.it and it came back 50KB lighter. Bottom line: potential savings would be around 10%. Compressing images might not be easily achievable. I don’t think it would be a good idea to compress them on the fly (it would take some extra CPU resources to do that through Camo for instance). A better approach might be to have the image compressed at the source (in the case of Gravatar, when a user uploads an avatar). That would probably require a significant effort and I am not sure it’ll be worth the gain.
Reducing the number of HTTP requests
WebPagetest shows 71 HTTP requests. Trying to reduce those numbers is usually a good idea. There is always some overhead with HTTP request, so the less you do, the better it is.
I mentioned earlier the presence of 2
<script> tags in the head. I am wondering if there is a specific reason for them to not be combined together. The same goes for the CSS stylesheets (
GitHub uses some sprites which is good and avoids some HTTP requests. I am wondering if GitHub would see any gain by combining sprites together (aka über-sprite ). I am looking at the various icons in the png files (in WebPagetest, under the waterfall, click the link “View all Images”). It looks
diffstat.png… could all be combined into one.
Some pages include several individual icons which could be sprited. For instance a github profile/dashboard page has all the timeline icons in individual files (
Well, I guess I spoiled that one. You’ve read it above. Combine all the things!s
The double sometimes quadruple logo
In the waterfall for Firefox, you can see 2 logos being loaded (the regular black version and the alternative hover version in blue:
In the waterfall for IE, you’ll notice something interesting: GitHub is loading 4 logos! It is loading a smaller version (both regular and hover) as well as a higher resolution one (I am assuming given the 4x in the filename that is intended for retina displays). It seems to be a bug in the code. The smaller logos are enclosed in a conditional HTML comment targeted at IE. The conditional has an if branch but not a else branch. So when the browser is not IE, the if branch is ignored and the high resolution logos are used. But when the browser is IE, the if branch is executed loading the smaller logos. Then the high resolution logos are loaded as well. I am actually wondering why does IE need the smaller logos? Are the higher resolution logos not suitable for IE?
The change between regular and hover is done with a CSS
:hover rule that tweaks the logos opacity to 0 or a 100 to hide or show the desired version. The two version of the logos could probably be sprited together and used as CSS background image instead of inline
For the issue specific with the IE conditional comment, maybe look at removing the conditional comment all together.
Caching issue with IE over SSL?
With Firefox, WebPagetest summary shows 82 requests and 1,420KB for the first view. The repeat view seems to be using caching and performs only 12 requests for a total of 65KB. Great!
With IE however, WebPagetest shows that a repeat view executes 45 requests for a total of 1,329KB (first view was 71 for 1,425KB). Looking at the various requests details (click on a filename in the waterfall), I can see that a far future expire header is set as well as the Cache-Control header for max-age. According to this page on MSDN, this might be a known issue with IE (or more accurately with WinINET)
Epic pull requests can be big
Some pull requests on GitHub can affect a lot of files and generate a really long diff. Take this one for instance. Look at the diff. It generates an HTML document of 18.48MB (1.05MB gzipped). It takes 10 seconds in Chrome before getting the first byte of that page.
Closing words: A note about performance metrics
You know the saying “Premature optimization is the root of all evil”. Related: don’t optimize without measuring first.
Measure. Optimize. Monitor. Rinse & Repeat.
The above analysis is looking at a very specific page but it was chosen fairly randomly from a tweet.
A better and more proper approach would be to setup first some kind of real user monitoring (such as Boomerang, New Relic or the new Torbit Insight) and find which pages are the worst offenders. Then start optimizing. After that, monitor the performance metrics and see if the optimization works as expected and if metrics are improving.
Monitoring is one thing, but using web analytics in addition to performance metrics is also important to help drive optimizations decision. For instance, let’s imagine that GitHub’s visitors using IE represents less than 1% of all GitHub’s visitors. In this case, fixing the issues specific to IE (such as double logo or SSL caching) might not be the most important optimization. Now, let’s imagine that GitHub’s visitors are in 99% of the case repeat users. At which point, they might be visiting GitHub with a primed cache (which would include assets such as avatars and icon images).
In the end, is GitHub slow? I actually don’t think so. When I use GitHub, I don’t have a perception of slowness or lag. Perceived performance is an important concept to consider (by opposite to the hard numbers discussed above, which tend to say “GitHub is slow”). Yet, I am wondering what would be the performance business case for GitHub? Even if GitHub does not feel slow, faster is always better. In the case of e-commerce websites or search engines, there is a direct correlation between performance and revenue (Amazon, Bing, Google, Walmart, they all shared numbers over the past few years). In the case of GitHub, what is the performance business pitch? Would a faster GitHub translate to more users? Would that then drive more revenue?
Thanks to Ryan Bigg for reviewing this post and providing great feedback.