IE 8 Charset Bug
Earlier today I was playing around with a performance test case when I noticed that stylesheets and scripts were being requested twice in IE8.
Eventually I figured out that the problem had to do with how the character set was specified. This is how things seem to work: In IE8, pages that rely on the
META tag to define the character set — that is, pages that do NOT use the HTTP Content-type header to specify the character set — will request some stylesheets and scripts twice. Pages that indicate a character set via the HTTP Content-type header behave properly.
Test Page Using META Tag
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
Test Page Using Content-type HTTP Header
<?php header('Content-type: text/html; charset=utf-8'); ?>
Other Test Pages
Interestingly, if you do not specify the character set at all, IE8 does not request the stylesheet and script twice. View test page.
If you use both the META tag and the Content-type HTTP header, the header will override the META tag and IE8 will only request the resources once. View test page.
Is it really a bug?
Then I thought it might have something to do with my webhost’s nginx and Apache configurations. But I tested some other sites at random and found the same problem with pages that did not specify the charset via HTTP header (two such pages: John Resig’s blog and css3please.com).
Billy Hoffman has a great post about charset and performance issues, in which he mentions that Firefox will re-request a URL if it detects a new charset after the initial URL request. That sounds like a very possible explanation for what is going on here: the initial stylesheet and script requests are sent, THEN the browser parses the META tag, forcing the stylesheet and script to be re-requested. But this seems odd considering the META tag occurs in the document before the stylesheet and script references.
Specifying the character set server-side instead of with the META tag is already considered a best practice, but if this IE8 behavior is accurate, the penalty for not adhering to the best practice is significant.
In the comments Dave posted a link to this post by Eric Lawrence, which explains the bug and announces that it was (mostly) fixed in a 3/30/10 update. The post is worth reading, but I wanted to highlight the part where Eric recommends using the Content-type header to specify the character set:
Both comments and pings are currently closed.
However, the Update kills the bug in Scenario #1 by disabling the Lookahead Downloader when a restart is encountered. Hence, we continue to strongly recommend that web developers specify the CHARSET in the HTTP Content-Type response header, as this ensures that the performance benefit of the Lookahead Downloader is realized.