Raster (Pete) points out that companies who should care about preserving links are often the ones that break them. His example is the local “old school” News / Sports / Talk AM Radio Station that happens to be one of the biggest media outlets in the area (and the only local radio station that I listen to). The radio station recently re-launched their web web site with a new look, new navigation and a bunch of new features (and it is much improved over their previous web site). They have been mentioning that it was coming on the air for a couple weeks and have been talking about it on every show since it launched (the sports guy talked about the new web site for longer than he did about the Brewers win over the Giants). They probably spent hundreds of thousands of dollars on the project. And they seem to have broken every link to the site. A quick check of the search engines (both Google and Live) found that every link on the search results page takes you to an error page (Hint: did you know if you search for site:domainname.com the search engine will show you every page indexed on a site - it is a great way to see what the search engines have found about your site).

I don’t want to pick on the radio station too much, because while they have broken a lot of links they at least did one thing correctly - they trapped the 404 message (at least as of Today they have, Pete mentioned he got a 404 error from it). When you are doing a major site overhaul (like this one), there are two types of errors that you can create:

  • Type 1 error - a link that was created in the past no longer points to the same content on your site and the user gets a very nasty error (up to the most dreaded error the HTTP 404 error)

  • Type 2 error - a link that was created in the past no longer points to the same content on your site, but you either redirect them to the content or provide them a friendly page where they can find the content themselves

If you follow any of the dead links to the radio station web site, you get re-directed to a standard error page that is branded like the rest of the site and contains the major navigation (sports, weather, etc) and gives you a search dialogue box. Not the article that you were looking for, but it is much nicer than the standard 404 error page that you get from the web server.

The Type 2 error is still bad: you can cause user frustration and you definitely lose out on the search engine boost that you can get from all the links pointing to your content (With our current dependence on page rank that cost can be extremely high).

Call to action

Test your web site for 404 trapping - I learned this neat little trick to test the lowest level 404 trapping. Go to the root of the web server and type in a complete nonsense path, for example http://amazon.com/DHFHIHJJIASJRJT/JTYUKODIUEFHE.JGHTUIU to see how well Amazon.com traps 404s. Make up the path, page name and page extension. If the complete nonsense gets trapped/redirected, chances are that you are good. Then you can start testing for paths that actually exist and for extensions that are real. Also if everything on your site is an .aspx, make sure that you also alternative extensions (such as .asp and .htm). You want the 404 to be trapped in all situations.

Set the trapping at the web server level also - Many web application environments (ASP.NET will be my example) allow you to set 404 traps at the application level. This is fantastic, because I can set the trigger in the web.config and I have several options that are easy to change. Don’t stop there! You should also set the 404 trap in the web server itself. ASP.NET will often be set to only serve his content (.aspx extensions). What if the link is broken and ASP.NET does not get fired for that request? Always have a “fall back” back in the Web Server itself.

Don’t forget the load balancers - Many of us have the luxury of running load balanced servers that are behind a hardware load balancing device. These devices are great, but they add a level of complexity that you must plan for and test. My example above where I created a nonsense URL might never reach my web server if the load balancer is doing certain inspections. You have to make sure that the load balancer is also configured for proper 404 trapping.

Do as I say, not as I do - give me a day or so to get larryclarkin.com (and another domain that I own) to handle 404s properly. 🙂