INetU Managed Hosting

Three Reasons for Web Site FAIL.

August 26th, 2009 by Scott W.

Today’s websites are expected to be available 100% of the time. The Internet has connected a global marketplace of consumers and businesses where any amount of unplanned downtime can be disastrous for retail orders and/or reputation. If you desire to have a highly available website, study this list of the three reasons websites fail:

1. Management, developers and administrators aren’t invested in disciplined change management.

If you want high availability, you must manage your changes. A stack of shiny hardware alone does not provide uptime. So the last time you suffered a web site outage and during the problem your developers and admins were all claiming innocence that nothing changed, well, odds are they are lying. After you discover the cause, you get the same song and dance, “Oh, well that shouldn’t have caused a problem.”

Anyone with experience in IT will intuitively understand this truth. Luckily, we also have some data to back this up. Donna Scott, VP & Research Director, Gartner, notes that, “80 percent of unplanned downtime is caused by people and process issues, including poor change management practices, while the remainder is caused by technology failures and disasters.” I think our unwillingness to address this truth directly is because we’d rather blame some “system” or “hardware” for root cause rather than the people who should be held accountable for uptime. Throwing money at software and hardware will not solve people problems. Buying the super redundant n+4 failover hot-cold-warm standby solution won’t fix the problem that “Jimmy type-twice-think-once” has root (Administrator) access to the production environment.

Now that I’ve thrown admins and programmers under the bus, I’ll do the same for IT management. Many times, management eschews formal change management because of the concern for lack of agility. “If we have to test and then promote to production, we won’t stay competitive.” That’s the same hogwash from the top-down. If your people do not buy into the discipline of change management, than you live in a house divided, and your web site will not be stable.

2. Lack of separate production and test environments.

Production environment . . . you mean there are other kinds? I chuckle when I hear of the proverbial five nines (99.999%) of uptime requirement. If you ask the simple question, “Exactly how much downtime is five nines on an annual basis?” and you are answered with a stunned silence, you know you are in trouble. 99.999% of uptime on an annual basis is a mere five minutes of downtime per year. Everyone wants this kind of availability but no one wants to pay for it—caviar dreams and a sardine budget! The majority of this cost is building multiple environments so that development, validation, and production are separate islands so that releases can be put through a rigorous quality control process before being promoted to production.

3. Failure to see things from the end-user perspective.

Watching workload metrics on the server such as disk IO, CPU and memory usage are fine, but they do not always correlate to how the end-user experiences the application. Busy doesn’t always mean slow, and idle doesn’t always mean fast. I’d rather get bread from the grocery store in a Ferrari that is 50% busy versus riding on the back of Grandma’s scooter that is 99% idle. Application and transaction monitors that simulate end-user activity (logging in, adding items to a shopping cart, etc.) from local and remote networks are great ways to see how your site performs. This also helps clarify the conversation with clients and their issues when you need to define what is meant by slow: 2 seconds to login? Over 10 seconds?

The Solution

These problems shouldn’t really come as a surprise, but how do you deal with them? Here is short list of tools, tips, and technologies that can help avoid outages:

  1. Virtualization is a great tool to lower the cost of non-production environments.
  2. Load tests are a great to find out when and where your environment will break.
  3. Synthetic transaction monitors can continuously monitor your website as the end-user sees it.

The main point here is that the reason your web site went down is probably because somebody made a change that had an unanticipated consequence. Accepting this truth will help develop an environment that is most effective for minimizing the chance of unplanned outages.

Other posts that might interest you:

Leave a Reply

©1996-2010 INetU Inc, All rights reserved.