My Blog List

A Note on User Agent Identifiers and Browser Statistics

Whenever anyone gives statistics purporting to identify what percentage of users are using which browsers, this is (if it's not just somebody's wild guess) probably taken from analysis of the user agent identifiers of visitors to a Web site. This identifier is part of the HTTP protocol, and is a string that usually gives the name and version of the browser being used. Unfortunately, there is no real consistency in the format of this string, which makes analysis very difficult and statistics suspect.
Netscape (back when it actually existed as a distinct browser) always used "Mozilla" as its name in these strings, but many other browsers "lie" and also identify themselves as "Mozilla", something that got established quite a few years ago (during the 1990s "browser wars") because the other browser makers wanted to get through browser-identifiers on sites that disabled Netscape-specific enhancements when any other browser was being used. So they identified themselves as Mozilla/2.0 (compatible; RealBrowserName) -- even though they weren't always truly compatible with Netscape. One of the browsers doing this was MSIE, which used strings like Mozilla/2.0 (compatible; MSIE 2.0). When MSIE got enough market share to be the "browser to imitate" by many of the Brand X's, you started seeing strings like Mozilla/3.0 (compatible; MSIE 3.0; RealBrowserName), which is pretending to be MSIE pretending to be Netscape. There was much debate among developers and testers of Mozilla in its early days on what to do about its user agent string (which starts with "Mozilla/5.0" even though this does not correspond to the actual version number of any Mozilla-based browser to date), with some wanting a "clean start" by changing its opening word to something else (even though the old pre-Firefox Mozilla Suite, once the flagship project of the Mozilla organization, was actually the only browser that could honestly call itself "Mozilla"), while others were deathly afraid to make the slightest alteration (even to change the version number with each release as Netscape always did) lest it discombobulate "browser sniffers" and lock Mozilla users out of sites. So it seems like we're stuck for the indefinite future with user agent strings that get further and further away from honestly describing the browser name and version they represent, and contain increasing amounts of fossilized deadwood that can't be removed because some site, somewhere, allegedly depends on its presence.
I think browsers that "spoof" others like this are doing the cause of independent browsers a disservice. In the short run, such dodges help users get around clueless browser detection in Web sites, but in the long run it causes those same clueless webmasters to see statistics that confirm their belief that "everybody uses [fill in currently popular browser]", even if a large chunk of those users are really using something else but pretending to be using the popular browser. (One site claims that, using a test page that both logged the presence of "MSIE" in user agent strings and used a "conditional-comments" proprietary Microsoftism to cause a particular stylesheet to load only in true MSIE browsers, fully 18% of browsers claiming to be "MSIE" actually are not.) Thus, I have all the browsers I use configured to use a completely honest user agent string wherever this is an available option (e.g., my copy of Opera used an "Opera" string with no mention of Mozilla or MSIE, even before they made this the default), and wish that this were the default for all browsers (with a "spoofing" string, if available at all, only present as a settable option for the special purpose of going to a site that otherwise doesn't work).
Speaking of Opera, after a long time of defaulting to a "spoofing" identifier, they finally got honest and started using a logical user agent string with "Opera/x.xx". But, after a while of this, they found a new idiocy to perpetrate: when they reached version 10.0, the first major browser to get to a double-digit version number, they found that some moronic browser-sniffers couldn't handle such a number and looked at only one digit, reading it as either version 1 or version 0 of Opera and demanding that users upgrade before using their site. So the Opera people had to start lying once again, this time starting their strings with "Opera/9.80" and adding a "Version/10.00" later in the string with the real version. Is this a temporary workaround they'll eventually be able to drop, or are they stuck permanently this way? Are other browsers that reach 10.0 going to have to do similar things in the future? How many different version numbers will Firefox wind up with? (It's got several, already, including the meaningless "Mozilla/5.0", a Gecko version number that's in the "rv:" parameter rather than the Gecko token you might naively expect to have it -- that has the build date -- and an actual Firefox version number that follows "Firefox".)
There seems to be no end to the degree of foolishness that gets committed in the name of browser identification. Google Chrome, for instance, uses Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13 (some of these numbers vary with different versions). Parts of this are announcing the browser to be Mozilla, AppleWebKit (a toolkit used as part of the low-level coding), KHTML (the Konqueror rendering engine, used in this browser), "like Gecko" (the Mozilla rendering engine, not used in this browser), Safari (Apple's browser that also uses KHTML and can be regarded as a "sibling" of this), and (buried in the next-to-last slot) Chrome (the actual name of the browser).
All of this makes it very hard to identify what browsers are really being used. To make it even harder, there are a few browsers that actually let the user change the user-agent string, and some users put in None Of Your Business, a joke name like Nutscrape, or random garbage characters. For the use of my own Web log analysis, I use a Perl routine I developed that attempts, as best I can, to parse out the true browser being used (modified every time I run into another browser that does it differently), but it's not perfect. So don't put too much trust in anybody's browser usage statistics. (And that isn't even considering various Web caching systems that make all site hit counts suspect, and the fact that any stats based on hits to inline images like counters or ad banners will exclude text-mode browsers, browsers with image loading turned off, and accesses by users with filtering programs that skip loading online ads, etc.)
Also note that "user-agents" are not fully synonymous with browsers. Browsers are user agents, but so are some other things, such as indexing robots. So some of the weird names like "Scooter" you might see in your logs are not "brand X" browsers, but indexers from a search engine. Be hospitable to them or you won't get indexed, or you'll get indexed under something inappropriate (try a Google search for "Unsupported Browser" some time, and see how many sites that were rude to the Googlebot got indexed under their "Get a better browser, loser" brushoff page rather than their real content). Unfortunately, spammers also have robots that go through Web sites harvesting e-mail addresses to annoy.
Other user agents include programs to download a site for offline browsing or to generate a map or outline of a site. Others are "download managers," such as Go!Zilla and SmartDownload, which take over when the user starts to download an executable file from the Web, managing the download process and giving the ability to resume an aborted download. You may see any of these turn up in your logs along with browsers.

Hall of Shame

Make your site better by looking at other sites that show, by example, what not to do!
NOTE: The inclusion of a site in my "Hall of Shame" links should not be construed as any sort of personal attack on the site's creator, who may be a really great person, or even an attack on the linked Web site as a whole, which may be a source of really great information and/or entertainment. Rather, it is simply to highlight specific features (intentional or accidental) of the linked sites which cause problems that could have been avoided by better design. If you find one of your sites is linked here, don't get offended; improve your site so that I'll have to take down the link!
Well, at least none of the sites below put users in jail for using the "wrong" browser!
  • The site to get a Washington DC trip permit, for tour buses and the like, says "Customers must use Internet Explorer when trying to access the Trip Permit Website."
  • The FedEx site reportedly shoos away people using the "wrong" operating system, such as Linux users, telling them that they're using an unsupported browser and should switch to IE or Firefox (even if they're using a Linux version of Firefox).
  • This site told me "We're sorry, but this site is not currently compatible with Netscape", even though I wasn't using Netscape at the time.
  • Hall of Shame Dishonor Roll Champion: The FEMA disaster relief application required MSIE 6.0 at the time of Hurricane Katrina, and turned you away if you were using anything else. In a just world, the person responsible for this would be sentenced to a week of living in what remains of the New Orleans Superdome among the piles of excrement left by the evacuated refugees. Come to think of it, many FEMA personnel deserve this fate. However, they appear to have fixed their site now so it doesn't turn any browser users away. They still deserve a "shame" note for ever designing a site with such a stupid restriction.
  • Hilton's secure site redirects anybody whose user agent string doesn't start with "Mozilla" to this so-called Web Standards Page; I guess I missed the part of the W3C specs that made it a "Web Standard" that user agent strings must start with "Mozilla". (Opera, in its honest-identification mode, fails this test.)
  • Big Noise Music sends anybody not using IE for Windows to a page that says you need "Internet Explorer 5 (or better)". Mozilla is much better, but they still won't let it in.
  • MovieLink blocks every browser but MSIE, every platform but Windows, and also refuses you if cookies or scripting are disabled, your connection speed seems to be too slow, or you seem to be outside the United States. Reports are that even the new beta release of Internet Explorer is blocked, as the developers of this site seem to take the attitude "Ban everyone and everything, unless specifically permitted."
  • Another site that brushes off Opera but lets in Mozilla is PhotoDisc (Getty Images). Its "Get Lost" page tells you to get IE or Netscape, failing to mention Mozilla or Safari. Since their offerings are of particular interest to graphic professionals, many of whom use Macs which come with Safari as their standard browser, it makes no sense for these people to go out of their way to tick them off like this.
  • Proffs.nu used to redirect all non-MSIE browsers to a really obnoxious page lecturing browser makers on how they need to be misfeature-compatible with MSIE because that's what the rest of the world uses, and telling users to either "upgrade" to MSIE or at least reconfigure their browsers to pretend to be MSIE to get around the redirection. (This latter piece of advice can be translated: "Please make your browser lie about what it is in order to get past the cluelessness of idiot webmasters like us.") However, they've changed it since, and now let all browsers in (and even have a "Download Mozilla Firefox" icon and a W3C icon indicating valid HTML). Nevertheless, they still say that they have some pages that are blocked from non-MSIE users because the pages "do not show these pages the way we would like them to be shown" and "cannot handle some of the web technologies from Microsoft" -- in other words, the site author still can't keep himself from using proprietary stuff and depending on browser quirks.
  • This credit union site tells many users (including those of the Mozilla Suite) that their browser is "nonstandard", then gives them a link to enter the site anyway; however, in some cases, this link doesn't work (I think it depends on cookies being enabled).
  • NatWest supports Mozilla, but if you try to enter their site with the newly rebranded SeaMonkey (which is the exact same thing as Mozilla, with a different name) you get turned away.
  • The Blue Shield of California sees fit to redirect some pages when accessed by some of the "wrong" browsers to this "You Need to Upgrade" page. I'm not sure exactly what browsers get sent there; it seems to work OK in Mozilla. I don't see anything in the site that couldn't have been done just fine with browser-neutral code.
  • Fidelity Investment's benefits section is reported to turn away the "wrong" browser types, but it seems to work for me in Mozilla. So I guess Mozilla isn't the "wrong" browser, but reports are that Opera is, at least when it's set to identify itself honestly.
  • NetZero uses this page to tell Netscape 6.x users that they have to downgrade to Netscape 4.x to get their service to work. (Reportedly, that page puts some browsers in an infinite loop with a blank page constantly reloading, perhaps in cases where cookies are disabled.)
  • Facebook redirects certain browsers (including Lynx and Links) to a page that says "we're not cool enough to support your browser."
  • Google Maps has a broader range of supported browsers than most of the "browser-sniffing crowd", but if you're using something other than IE, Mozilla, Firefox, or Netscape, or too old a version of any of them, you still get brushed off.
  • TotalJobs claims in its "Browser Policy" to explicitly block various browsers, including Mozilla and Firefox. However, their clueless developers apparently can't even do a stupid browser-block correctly; there doesn't seem to be any problem accessing their site with those browsers.
  • Is turnabout fair play? Ben Goodger's blog, a fervent supporter of Mozilla's Firefox browser, used to turn away all MSIE users, sending them to a page that says "The browser you are using (Microsoft Internet Explorer) is not supported at this time due to incomplete support for web standards." Some other browser users, such as users of pre-Firefox Mozilla browsers, could see the site but get a gentle urging to upgrade to Firefox. (It doesn't seem to do this any more, however.) I think turning people away from a site because of the browser they use is wrong no matter which browser is being discriminated against, so I have to oppose this.
  • Ironically, Mozilla's own site is guilty of a user-agent-based blockage of sorts; they've restricted access to the Addons site if you're running certain old versions of Firefox, as discussed in these message threads. This was necessitated by a security issue where those versions could execute malicious code from other sites that fooled the browser into thinking it was from the Mozilla addon site, which could be stopped by blocking the actual site in question.

Links


No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

dg3