INetU Managed Hosting

Posts Tagged ‘high availability’

What Your Web Host is Not Telling You about 100% Uptime

February 3rd, 2010 by Jeff P.

In a perfect world your website would be available all day, every day, completely without fail. In reality, downtime happens. Hosting providers like to guarantee uptime, but what does that really mean? Here are three things your hosting provider isn’t telling you about 100% uptime:

#1 – Uptime is your responsibility, too.

When you talk about uptime, you mean that your site is available to your audience. When a hosting provider talks about uptime, they mean network uptime, and possibly hardware availability if you are using shared resources instead of dedicated servers.

In a dedicated hosting environment, device availability and fault tolerance are your responsibility. If a hard drive fails, did you purchase a RAID configuration to protect yourself? Did you elect to build out a database cluster? Redundant firewalls?

Application availability is also affected by your developers. In many cases, changing a single file can drop your site off the radar, even though the pipes are live and the hardware is functional. Do you have a separate development area to prevent this kind of thing from happening? What controls do you have in place to make sure that only stable edits are pushed live?

#2 – Downtime happens. There is no preventing it.

Read the full post »

Beware the uptime braggarts.

October 28th, 2009 by Scott W.

Be careful about the distinction between uptime and high availability. One should be the goal of your server infrastructure. The other is just a geek bragging right.

Many system administrators will brag about their systems having high uptimes. The uptime of a system is how long it has been running without a reboot. The current longest running uptime, as being tracked by Uptimes Project, is a VMS cluster that has been up just shy of 12 years as of the date this blog was published. Since we strive for 100% uptime, shouldn’t this be an impressive record to share with all of your friends as an exemplary form of sysadmin kung-fu? Actually no, never rebooting your systems fosters a “if it ain’t broke don’t fix mentality” that carries quite a few risks:

  • First, if a machine has not been rebooted kernel, OS, and application patches are probably not being kept up-to-date. In today’s age on the Internet this can be a very dangerous practice. Your systems may be vulnerable to known exploits. In addition, if you run into problems with your system, your vendor will require applying recommended patches as a first course of action. It’s better to have these things taken care of before adding to work during a problem. We’ve seen out of date Windows servers need over 4 hours of patches requiring multiple reboots.
  • Second, you’re potentially leaving around rotten Easter eggs. By rotten Easter eggs I mean changes to systems that don’t make it through a reboot: production services that should be running, startup scripts that do not work properly, IP addresses that have been added, speed/duplex issues that have been resolved since the machine has been up. So worst case scenario is that your server goes down (either planned or unplanned) and after the reboot you have a server with poor network performance, not all IP addresses are alive, and the database application isn’t running. If the reboot was not planned (a crash) this adds to the confusion when trying to bring the system back online.

Read the full post »

How to Manage Sessions in a High Availability Infrastructure

October 7th, 2009 by Chris G.

Session management is relatively easy for a simple website handled by a single web server. Session information is typically stored in memory or disk and all is well. What about when you have a large website that’s served by a number of web servers? You could store sessions in memory or on disk in the same manner, but that creates a problem. The session information will not be accessible by the other web servers. Since the HTTP protocol is stateless, it is your responsibility to maintain sessions as HTTP requests are spread among the various web servers.

What about “sticky sessions”? Most hardware load balancers support an option called sticky sessions. Sticky sessions can be enabled to keep all HTTP requests from a given user on the same web server. This avoids some of the session management problems, but it introduces a potentially more serious problems. To name a few:

  • The distribution of load between web servers can become uneven.
  • Scheduled maintenance is more difficult since you cannot simply remove a web server from the load balancer without impacting users.
  • High availability is impacted since users will lose their session information if the web server they are “stuck” to crashes. When they are directed to another web server, they might have lost the shopping cart they spend an hour filling. Even worse, they might not go through the effort to refill it!

How are these problems avoided?

Read the full post »

How to Get True Firewall High Availability

September 17th, 2009 by Jason B.

High availability is a necessity for any high performing web application today. Consumers want access to their information anytime, and not having high availability means your potential clients will go somewhere else for their needs or experience frustration.

One of the first areas to consider for a high availability network is the firewall. After all, it is usually your first line of defense and when it doesn’t work, it leaves your business down. Many people think the way to fix this is to add a second firewall and call it a day, but there are many options within the configuration that can improve your client’s experience and site availability. Here are two important considerations:

Current Sessions

The first things to think about are your current sessions. Most firewalls can perform what is known as stateful failover. In a regular failover, all connections are dropped. Clients then need to re-establish their connections when the other firewall takes over. In a stateful failover, however, the active unit shares connection information with its peer, so in the event of a failover the other firewall already has the connection information. This may seem like a trivial issue, but it decides whether the client sees the failure or not.

Typically, entry level models do not have stateful failover, so this is a business decision that needs to be made before purchasing or upgrading firewalls. Also, you will need to make sure your application qualifies for stateful failover. HTTP is sometimes not enabled for stateful failovers by default, since they are typically not long lasting connections.

Redundant Links

Another item to think about is redundant links. Failover should only happen if the primary firewall is unresponsive, because the failover process takes some time to happen. Depending on firewall models, failover times can vary. I typically see Cisco ASA firewalls reliably failover in under 10 seconds with default settings, but that still leaves a window of downtime. Redundant links is a technology I setup for clients who want added HA by reducing the chance of failover. With redundant links, there are 2 cables used per interface rather than one.

So, picture a circumstance like this:

Read the full post »

INetU Labs takes on the Dell MD3000i: Is it an Enterprise-capable workgroup SAN?

September 2nd, 2009 by Andy B.

Recently INetU Labs put Dell’s low cost workgroup SAN through its paces to see how it compares to the more robust (and costly) Equallogic and EMC offerings. The results are in, and it seems that correctly configured, the MD3000i is great product with plenty of bang for your buck.

Configuration

For testing we used an MD3000i populated with a mix of 146GB SAS and 500GB SATA drives. The SAN shipped with a single controller but a second was added to test failover. A word of warning here – Dell configures the duplex mode based on how the SAN is ordered; if you add a second controller later you’ll need to use the command line tool to enable it, a process that’s not stated as a clear requirement and takes a little digging on the Internet to find documentation for. That being said, once you find the docs you’ll have it set in no time. Our test unit was a major firmware revision behind, and bringing it up to date took a good twenty minutes. Minor revision updates probably won’t take as long, but this is something to keep in mind if you’re striving for multiple nines of availability.

Once the hardware was configured and updated, the software install was a snap. The management software is somewhat cumbersome but gets the job done, and configuring the LUNs is a simple process. We were testing multipath (MPIO), and Dell requires a specific version of the iSCSI initiator on Windows servers, so be careful here, too. Fortunately, the supplied driver CD made sure the right version was installed.

Benchmarking

Read the full post »

©1996-2010 INetU Inc, All rights reserved.