Understanding High Availability
According to TechTarget, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to "100% operational" or "never failing."
High availability can be achieved within an environment, whether on –premises or in the cloud by:
- Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system.
- Reliable crossover. In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover.
- Detection of failures as they occur. If the two principles above are observed, then a user may never see a failure.
Why is High Availability Important?
If mission-critical data and/ or applications become unavailable, the enterprise is placed in jeopardy.
In a report, issued by the Ponemon Institute, the average cost of a data center outage rose from $690,204 in 2013 to $740,357 in 2016. The costs of downtime can be detrimental to the health of your business. When your data center suffers any incident and your data is not accessible, the result is a disruption in your business operations.
The reliability of your data center to prevent and recover from unforeseen incidents is critical to your organization's success. Downtime could also result in idle employees, disgruntled customers and bad publicity.
Ensuring High Availability with Your Service Provider
According to Dusten Tornow, Director of Product Development at OneNeck, housing data and applications in the cloud could enhance availability and reliability. However, choosing the right IT service provider with the right service level agreement (SLA) and determining which parts of your IT infrastructure and applications need to be covered is critical.
In its simplest form, an SLA is a contract between a service provider and a customer that specifies, in measurable terms, what services the service provider agrees to provide in exchange for a fee.
No one ever really wants to think of being without service or having an interruption in normal processes, but things happen. However, having a solid SLA in place that protects both businesses and service providers is critical. To be effective, however, it must be created when the contract is first signed.
Who needs an SLA?
Any business that outsources any part of their IT infrastructure should have an SLA in place. Usually, SLAs are between companies and external suppliers. It’s a contract that protects both parties — the business and the service provider.
What’s the purpose of an SLA?
At the core, SLAs all guarantee the service provider will deliver on something. It’s the “something” that truly sets SLAs and providers apart. Understanding and establishing what the “something” is that the SLA protects is one of the most critical components. Another significant factor is the compensation a company will receive, should the service provider fail to deliver on their agreed upon SLA.
Oftentimes new to market providers will wrap an SLA around response time. However, businesses experiencing an outage or interruption in service rarely find this to be satisfactory in the long-run. It’s because responding in accordance with the parameters established in an SLA is very quick and easy (often automated).
On the other hand, an SLA based on the availability of a system or application is far more meaningful — and much more complex for a service provider to deliver on. Only a mature service provider will have the people and processes to deliver on an availability SLA.
Take for example a credit card processing company who is unable to process transactions. If the interruption lasts a few minutes, it’s a big deal. What if it lasts an hour, half-a-day or even longer? It quickly adds up to significant financial loss and can lead to greater consequences. If the SLA is based solely on the service provider responding within a certain amount of time, how does that help the business recover? In this example, is the service provider incented to resolve the issue ASAP? Not really. However, the service provider who is guaranteeing availability of the down application is certainly doing everything possible, as quickly as possible to resolve the issue.
The magnitude of the issue and the potential impact on a business could (and should) be factored into the SLA. Although the definition and concept of an SLA is mutually agreed within the industry, there are lots of differences in how providers define SLAs. It’s important to establish the parameters ahead of time.
Six segments to understand when discussing SLAs:
- Definition of what the SLA intends to provide to both parties (business and service provider).
- Clearly define each system or application that’s covered (e.g., hostname, IP, serial numbers).
- Definition of an SLA violation. Metrics by which the service is measured. If an outage occurs, what are the established time increments?
- Outline the level of service expected by the business from the provider. Do you expect them to call within a certain amount of time? How frequently do you expect updates as the situation is being escalated and resolved?
- Remedies / penalties should the service levels not be achieved. How will your business be compensated? Service providers typically offer service credits to customers should performance be compromised.
- The expectations of the provider. What does the service provider expect from the customer? If the customer’s staff is unavailable, does that void the SLA?
Another important element to understand when establishing an SLA with your service provider — they are typically measured in nines. Essentially, the more nine’s the less downtime service providers are allowed annually.
Downtime Calculation Key
~ 7.5 cumulative hours of unscheduled downtime per month
~ 45 cumulative minutes of unscheduled downtime per month
~ 6.0 cumulative minutes of unscheduled downtime per month
~ 0 minutes of unscheduled downtime per month
Now that your SLA is in place, what’s next?
- Customers should review their service provider SLA any time a contract change is made. Does the SLA still align with the business needs?
- Monitor and manage the SLA. Is it your responsibility to notify the service provider of an outage?
- Customers should set appropriate expectations with their IT staff and business units to align with SLA expectations. This is often a gap.
OneNeck delivers a comprehensive suite of tailored, end-to-end enterprise-class hybrid IT solutions, including engineering and managing IT infrastructure and enterprise application management services — with a 100 percent uptime availability SLA. Visit www.oneneck.com