CSS woes

Implementing CSS support for the Page Analyzer

Our new web page analyzer is getting more popular by the day. Just after its release we even had a call from a well-known Internet organization who wanted to license it for their internal use, to gather general statistics about web site performance. They were impressed with the analyzer, but quickly noticed that it had a flaw - it would always load all resources that were referred to in a CSS file, no matter if the resources were needed to render the web page in question, or not.

Many sites today have a single CSS file that is used for the whole site. That CSS will contain references to e.g. various images that are used for backgrounds on different pages on the site. However, a web browser that loads the CSS file, will parse it and realize that while the background object a.jpg is needed on the current page, another background object b.jpg isn't needed. It is used on some other page on the site, but not on the page that is currently being loaded and rendered. The web browser will see this and refrain from loading b.jpg.

What this all means is that our page analyzer would often load a lot more objects (images etc) than a real browser would, which of course meant that the total page load time would be slower than you'd experience with a real browser. In short, our browser emulation would not be very accurate in many cases.

An interesting side note here is that no other page analyzers seem to fully support CSS. We have looked around and found that all available analyzers seem to have this same problem.

We decided to see what we could do about implementing true CSS support in our analyzer, and started by looking at available CSS libraries for Python (the analyzer is written mainly in Python). We found two contenders - CSSutils and CSSEngine. After looking at both, we decided to go with CSSEngine as the code base seemed to be a better starting point for what we wanted to do. The drawback with that library was that it hasn't been maintained the last 3 years or so.

We cleaned up the code a bit, made it more forgiving so it wouldn't croak and die when it encountered the sometimes horrible mess that constitutes CSS files on the Internet, and made it CSS3-aware (CSS version 3 is fairly new, so the library had no support for it).

Then we started testing our new CSS parsing capabilities, and found to our disappointment that it was slow. For the sites out there with large CSS files, the parser was really slow. Glacial, in fact. What to do? We started out by using Cython to convert the Python code to C and then compile it. This improved speed 3-4 times, but it was still too slow. Parsing a really large CSS file would take 15 seconds or more, and that wasn't acceptable.

So the hunt went on to improve speed. It turned out that the culprit was the CSS parsing, mainly the execution of all the scattered regular expressions. We found Ply, a python implementation of the popular unix programs Lex and Yacc. After some more searching we found css-py, a lexer and parser-grammar written using Ply. Unfortunately css-py seems to have been unmaintained for over a year and both the lexer and parser-grammar needed alot of modification to support our needs, handling broken CSS and CSS3. In the end we used parts of the lexer but had to rewrite the parser-grammar.

With the help of Ply, we managed to speed up execution enough that we were happy with the performance. The end result then was that the parser/cascader became a mix of code from CSSEngine, py-css and newly written code. It supports the following:

So, while the user agent emulation we had before was very likely better than anything any other analyzer could come up with, it is now better yet. We now emulate most important, performance-affecting characteristics of modern browsers (and other user agents) and can therefore provide a fairly accurate analysis, that tells the user how fast his/her page loads in different browsers. This is something that few, if any, other analyzers can claim being able to do.

 

 

 

Two is better than one

New proxy recorder feature

As a premium (paying) Load Impact user, you get access to the HTTP recorder feature. This is a zero-configuration recorder that you will likely not find anywhere else, because it is quite tricky to implement. Unfortunately, it also cannot handle 100% of all advanced javascript out there. Simple Javascript that uses e.g. XMLHttpRequest to load static URLs works fine, but when Javascript starts generating dynamic URLs it gets more complex, and then our zero-configuration recorder might fail to record all transactions that happen.

When the zero-configuration recorder has failed, the only remaining option has been to manually enter the load script for the test, URL by URL. It is tedious work, so we decided we needed an alternative recorder, implemented as a true HTTP proxy. That would mean it would be able to record any and all transactions, no matter how they were generated.

The new recorder is live now. When you configure your test, you select "Record session" like before, then you will be presented with a choice of using the old "Zero-configuration", or the new "proxy recorder":

 

 

When you've chosen the new proxy recorder, a window will open that tells you how to configure your web browser to use the proxy server (which is also the recorder). At the bottom of that window, you will find a "Test proxy settings" button you can use to verify that you've managed to configure your browser correctly. If you haven't configured things correctly, this is what you will see when pressing the button:

 

 

 

And a correct proxy configuration in your browser looks like this:

 

 

 

Once the proxy is configured, you can start surfing your website. All your actions will be recorded. Note that everything you do in the browser, in all tabs and all windows, will be recorded. So make sure to close all unnecessary tabs so you don't record any transactions that happen in them.

The exception to this rule is the site loadimpact.com - nothing you load from us will be included in your load script. This is handy as it means you can simultaneously record your load script and keep a couple of tabs/windows open for loadimpact.com also, without getting any loadimpact.com URLs in your recording.

 

 

 

Page analyzer update

The web page analyzer has been updated with more features

Today we installed some updates for the new web page analyzer. These are the major changes we've made:

New top-of-page layout:

The new layout emphasizes the browser emulation feature of the analyzer. This feature makes our analyzer quite unique, as most analyzers have no emulation capability whatsoever, and of those few that do (we know of only one), the number of browser-specific, performance-affecting characteristics they emulate are not many.

New config parameter: client bandwidth limit

The client bandwidth parameter is configurable under the "Advanced settings" panel. Only premium users can change this parameter. The client bandwidth limit emulates a certain client bandwidth. The actual bandwidth used during the analysis will of course also be limited to the available bandwidth to (or from) the web server that is being tested - whichever of the two is lowest. The default setting is 10 Mbit/s bandwidth limitation for a test.

 

Load diagram:

We also made some small changes to the load diagram presentation. For compressed resources, we added information about the original, uncompressed file size in the mouseover tooltip:

 

Summary:

For the Summary section (below the load diagram) we added info about what browser emulation a test used, and a "Status:" line that shows whether the analysis ran to completion without any problems, or if it timed out or was aborted for some reason.

 

 

Statistics are like bikinis

Statistics are like bikinis - What they reveal is suggestive, what they conceal is vital. I thought I'd give you some statistics from Load Impact and let you judge whether they are vital or not. We continuously gather statistics about Load Impact usage because, well, it's kind of fun I guess. And interesting. Maybe you will find some of it interesting too.

How many load tests?

Load Impact went live a little more than six months ago and at the time of writing, we have executed 36,967 load tests. Many of these tests are small tests, a good percentage of them have not run to completion but have timed out after a while (most likely because the server we were testing responded too slowly, or not at all at some load level), or been aborted by the user starting them, but the figure is still impressive. No other load testing service comes even close, as far as we know.

Executed load tests, per week

 

 

 

 

 

 

 

 

 

                                Load tests executed, per week

 

How big is the average load test?

All these tests have transferred 5,363,621,242,416 bytes of data from the sites tested. This is 5.4 TB and means each test has transferred on the average 0.15 GB, or about 150 MB data. Remember though, that these figures include aborted tests and tests that timed out. Approximately 58% of all tests have run to completion. We have made 602 million transactions (requests), which gives us an average transaction size of 8,910 bytes. 34,528 unique IP addresses have been involved in the tests and it seems that the average test loads a little over 20 different URLs/objects and consumes about 4.5 Mbit/s bandwidth at the 50-client level.


HTTP response codes

Looking at aggregated results from a few thousand of the latest tests we can see how often different webserver response codes appear. We store the response (load) times per URL and per response code, so that for example we know the average response time for all 404-responses for a certain URL. We also store how frequently a certain response code appears. Users can only see data for the 200-responses as of now, but in a future version of Load Impact it will be possible to plot graphs of all response codes (and to plot their frequency, not only their response times). Anyway, this is the current distribution of the most common response codes seen in our load tests:

  • 92% of the time, the object that is requested results in a 200-response from the server. This is the "OK" response code that means the server will deliver the object.
  • 3% of the time, however, the server returns a 404 response, indicating that the resource could not be found (or returned). This is often due to obsolete HTML code that refers to old, non-existing objects.
  • 2% of the time, we get a 401 response, indicating that some resource we're trying to fetch requires authorization.
  • 1% of the time, we get no response at all from the server. The request times out. This might, for example, be due to server overload as a result of the load we generate.
  • 0.6% of the time we get a 503 response from the server. This usually means that the server is overloaded and unable to service the request but still able to respond with this error code.


We have seen a number of other codes also, of course. Including some exotic ones:

  • 507 - insufficient storage (WebDAV). It would be interesting to know how someone managed to fill up their WebDAV storage by running a load test for their site.
  • 509 - bandwidth limit exceeded. This is an Apache-specific response code that Apache uses when it refuses to return something because a preconfigured bandwidth limit has been exceeded. Unless you want to test your bandwidth limiter, this feature should probably be turned off during load testing.
  • 512 - what is this?  If anyone knows when this response code is sent, and by what webserver software, it would be interesting to know.



If you find statistics interesting, do write us and tell us what data you'd be interested in and we might dig it up and write an article about it here.

The Web Page Analyzer

Detailed page load analysis of your webpage

The beginnings

A while ago we sat down and thought a bit about who our users were, and what they needed. A large part of our users are web developers, using our service to test and optimize their web code. They need tools to help them do this, and while Load Impact may be an excellent tool to help show how web performance changes with changing load, we felt that it lacked some features to help web developers really debug their web pages.

Developers want to not only apply load to a server and see what happens, they also want to be able to analyze the performance of a single page, in detail, with or without external load applied. We felt that Load Impact was lacking a good web page analyzer feature.

We know this functionality already exists out on the net. Pingdom, for example, has a pretty good analyzer at their site. However, we had two reasons for making our own analyzer. Firstly, we knew we could do better :-) Existing analyzers out there were not very detailed, not very well documented, did not look so good (some are downright ugly), or lacking in features. Secondly, we wanted to be able to integrate the analyzer with our load testing platform in the future, so that load testing configuration settings can be used when analyzing pages, and vice versa.

So, we told one of our developers to spend the summer researching existing web page analyzers out there, and create something better. And that is exactly what he did.

 

 Try it out

 The analyzer actually speaks for itself, so if you want to you can skip the rest of this article, and just try out the analyzer right away.

 

If you want to access it later on, it is easy to find through its link on the main menu.

 

 

So what is it? How does it work?

The basic idea is pretty simple; You enter the address of a web page, and the Page Analyzer loads the page as if it was a web browser. Then it shows you what objects were loaded, in what order, and how long each object took to load. It also gives you information about return codes, compression ratio, etc. and provides some general statistics about the page in a summary at the end.

Here is a screenshot of the Page Analyzer in action. It has loaded http://sun.com and shows us the objects that were loaded, in the order they were loaded. Click the image for a higher-resolution version.

The colored bars show the load time for each object. The load time is divided into several components, and if you put the mouse pointer over the bar you will get a small lightbox that tells you what these components are. On this screenshot we have put the mouse pointer over an object that spent 1 ms in the load queue (waiting to be loaded), 15 ms was spent establishing a TCP connection to the web server, 469 ms was the time to first byte (the time it took to send the request and until the server started delivering the data), and finally the download took an additional 179 ms after "first byte".

 

 It shows pretty pictures

Moving the mouse pointer over the URL of an image causes the image to be displayed in a lightbox, like you see on the screenshot here.

 

 

 

And now for the advanced stuff

Unlike any other analyzer out there, our Page Analyzer does not just load a web page. It also makes a pretty good effort at emulating different browsers when it loads things.

The browser emulation settings
control this functionality. You can configure the Page Analyzer to emulate a whole range of different browsers/clients, including search engine bots/crawlers such as Googlebot.

The emulation modifies the HTTP-User-Agent header to mimic a certain browser/client, of course, but it doesn't stop at just doing that. It also emulates a lot of other parameters that vary between browsers and which may affect performance.

 

To emulate different browsers/clients, the Page Analyzer will:

  • Vary limits for max no of concurrent connections to a single webserver, and total number of connections
  • Use different Accept- headers
  • Emulate behaviour of browsers that can't always parallelize downloads of certain resources
  • Use different keep-alive timeout settings

The end result is a page analyzer that is smart enough to let you fetch a page as if you were using a specific browser. From the start we emulate all major browsers, Googlebot and other search engine crawlers, etc. In future versions we will likely provide a way for users to modify all settings individually.

 

 More advanced features - compression ratios

On this screenshot you can see that the Page Analyzer reports compression ratio for downloaded objects.

Click the image for a higher-resolution version

 

 Over-compression

And on this screenshot the Page Analyzer warns about a couple of very small objects that have actually increased in size as a result of them being compressed.

 

← Previous  1 … 6 7 8 9 Next →