HDI - Implementing Availability Management

Overview Implementation Operations Optimization Measurement Annexes

10.1 Overview

10.1.1 Description

10.1.1 Description Availability Management is the control (and continuous improvement) of the availability and reliability of IT services and the supporting IT infrastructure and organization. Availability Management ensures that the requirements of the business are met.

Availability Management entails systematically undertaking preventative and corrective maintenance of IT services, within justifiable cost. Technical, organizational, procedural, security and contractual aspects have an important role in this process.

This chapter examines Availability Management from the perspective of the Support Center. It is not a guide to traditional Availability Management; nor is it definitive in scope (as an example, none of the statistical analyses that are essential to Availability Management are discussed or included).

Availability Management is a complex, technology-led process that underpins much of IT Service Management. This chapter focuses on the issues that should be known to the Support Center and discusses interfaces that the Support Center could be integral to facilitating.

10.1.2 Relationships with other processes

The responsibility for ensuring that the data required within an IT service is available to endusers is that of the process Availability Management. The organizational function that actually carries out the tasks involved can vary from a representative of end-users to the Operations unit.

10.1.3 Key inputs and outputs to the process

Availability Management is the control (and continuous improvement) of the availability and reliability of IT services and the supporting IT infrastructure and organization. Availability Management ensures that the requirements of the business are met.

Availability Management entails systematically undertaking preventative and corrective maintenance of IT services, within justifiable cost. Technical, organizational, procedural, security and contractual aspects have an important role in this process.

This chapter examines Availability Management from the perspective of the Support Center. It is not a guide to traditional Availability Management; nor is it definitive in scope (as an example, none of the statistical analyses that are essential to Availability Management are discussed or included).

Availability Management is a complex, technology-led process that underpins much of IT Service Management. This chapter focuses on the issues that should be known to the Support Center and discusses interfaces that the Support Center could be integral to facilitating.

10.1.2 Relationships with other processes

The responsibility for ensuring that the data required within an IT service is available to endusers is that of the process Availability Management. The organizational function that actually carries out the tasks involved can vary from a representative of end-users to the Operations unit.

10.1.3 Key inputs and outputs to the process

Figure 10.1 - Key relationships

Availability Management is at the center of a spider's web of activities, as described below.

Configuration Management

Network Services Management

Computer Operations Management

Support Center

Problem Management

Procurement

Change Management

Capacity Management

Finance

IT Contingency Planning (and /or Business Continuity Management)

Operations

Development

Testing

Security

DescriptionSourceImportance
INPUTS
Business requirementsCustomer High
Impact assessment of requirementsAvailability Mgt High
IT requirements (e.g. reliability, maintainability) VariousHigh
Incident, problem, change and config. data VariousHigh
Monitoring event data SystemsMedium
SLA SLMHigh
OUTPUTS
Design criteria for recovery SCMHigh
Availability Management techniques Availability MgtHigh
Availability Management targets Availability MgtHigh
Monitoring requirements Availability MgtHigh
Availability Management plan Availability MgtHigh

10.1.4 Availability Manager

The Availability Manager calculates the actual IT service availability (using service targets), correlates system-detected errors and errors reported through incident records, and validates IT service availability depending on which data source provided the availability information.

The Availability Manager's responsibilities are to:

Role implementation
The Availability Manager role can be taken by a single individual, or by a team of individuals, not necessarily organizationally collected into one unit.

It is possible to combine the roles of Availability Manager (AM) and Service Level Manager (SLM), but the roles of (proactive) Problem Manager and Availability Manager should not be merged. Note that the above does not describe the role of the Support Center Manager (SCM), but it is included to clarify that the SCM role and AM role should be properly delineated.

10.1.5 Possible problems and issues

Possible problems
Commitment: it is rare to find that Availability Management goals are shared and conflicting priorities do occur. Senior management commitment should be sought at an early stage. Tools: specific tools are hard to find and not always as described. Research the tools market carefully to ensure that the range of tools needed to support the function and to interface with other disciplines is available and cost effective.

Supplier dependency: serviceability requirements may not have been defined because of the reluctance of suppliers' commitment to provide data. Make sure all new contracts include serviceability in the requirements specification.

Quick wins
Try to establish Availability Management through evidence of rapid improvement. Problem Management and other similar roles will be the best source of such data.

Quality issues
Availability Management is a process that underpins good quality provision of IT services. By its very nature, analysis of points of vulnerability, risk assessment and building-in of redundancy, Availability Management ensures that customers are provided with first-rate service.

Security issues
Confidentiality, Integrity and Availability (CIA) are the fundamental building blocks of IT security. IT security was defined by OGC as `balanced security in depth'; justifiable countermeasures are in place to ensure continued IT service within secure parameters. The Availability Management function has a closer relationship than most with the IT security management function.

The major security issues of Availability Management are:

[To top of Page]

10.2 - Implementation

10.2.1 The implementation process

The major support function to Availability Management arises in the activities of the Support Center (SC) team. The SCM has a more limited role.

Figure 10.2 - Availability Management Implementation

In Figure 10.2 it is assumed that the SCM and team are peripheral to both Availability Management and SLM, and are coordinating activities.

10.2.2 Support Center Manager's role

Responsibilities and activities Unless the SCM also fills the Availability Management role, their contribution is restricted to defining appropriate SLA and SLR criteria with the customer community and creating a monitoring function to ensure compliance to agreed targets of availability. The SCM should ensure that Incident Management processes are followed to contribute ticket data effectively to Availability Management.

A role to assist with design of policy and procedures and to contribute to ARCI/RACI matrices is recommended.

Deliverables

Competencies
The SCM does not require specific competencies in order to liaise with the Availability Management function.

Key Performance Indicators (KPIs)

10.2.3 Support Center Function's role

Responsibilities and activities
Availability Management covers the entire lifecycle of IT service components, from initial design to decommissioning, and meets the availability requirements stipulated by the business.

Support Center activities to be carried out in support of Availability Management include:

Availability of an IT service could be monitored up to 24 hours per day, seven days per week or according to Availability Management and Service Level requirements in place. From a security perspective, it is important that the IT service is only available according to required specifications to the end-users specified in agreements between IT and the business (and to those involved in the Availability Management process and software development process), and to representatives of the end-users.

Deliverables
The information that has been collected by Availability Management is now examined and evaluated to identify ways in which availability can be improved (for example weak Configuration Items identified, changes to procurement policy or the IT infrastructure initiated etc.).

A well-documented Availability Management plan is a key deliverable.

Competencies required
The SC does not require specific competencies in order to liaise with the Availability Management function.

10.2.4 Other key roles and functions in the implementation process

A detailed list of the functions (and therefore the roles) having major impact on Availability Management was included under `inputs and outputs', together with information about activities and deliverables. Note that with regard to Availability Management, the SCM role is one of coordination rather than a specific task.

10.2.5 Planning for implementation

Steps to take
  1. Obtain management commitment. Before anything else, ensure that senior management is committed to the project. Availability Management is neither cheap nor quick to get underway.

  2. Develop an Implementation Plan. As with any IT Infrastructure Library discipline, planning for implementation of Availability Management is vital. It is recommended that you use a recognized method such as PRINCE2 or PMI. The principal tasks are: project design, project plan, resource allocation, development of cost models, monitoring and plans for future review.

  3. Determine the Availability Management requirements. These are derived from business requirements. Processes and procedures must be in place to obtain all relevant requirements of all the IT services required. These requirements must be agreed before full scale planning takes place.

  4. Design for Availability Management. The primary task is to ensure that availability of IT services does not fall below the management requirements, as Availability Management is integral to the change process and to the IT development processes.

  5. Design for security. As mentioned earlier, see CIA.

  6. Produce Availability Management plan. The plan should be produced and updated periodically and should focus on changes in Availability Management requirements, IT architecture, technology and demand.

Groups to contact
If the SCM is coordinating the Availability Management implementation, the following groups must be contacted:

10.2.6 Support Center Manager's role

A liaison role is the only requirement if SCM is not coordinating activities.

Necessary resources and relationships

Necessary information and data

Measurements that should be in place

10.2.7 Implementing key process activities: hints and tips What to implement first

To implement an Availability Management function successfully, there are two main elements that should be developed concurrently:

A number of process/procedural components must be in place (covered in Annex A10.1 in more detail):

Things that always work
If senior management is committed, then implementation will be (relatively) simple. Keep everyone involved but keep the decision-making apparatus simple. Work assiduously to persuade your critics and your managers that `staying the course' is the only way that the project will be successful. Do not underestimate the tendency of one or more participants to find the process complex and time consuming.

If you do only one thing 'by-the-book' make sure it is project management.

Little things that deliver big returns
Train everyone involved. It is often overlooked! Be sure to keep management informed, especially when things are going well; even bad news, so long as it is advance knowledge and not a surprise, can work to your advantage if delivered in the right way.

Little things that always get forgotten
Make sure you have enough time to carry out the activities. Even in a small organization, Availability Management data will have similar volume and complexity to Configuration Management data. Determination of business requirements alone can take many weeks.

10.2.8 Methods and techniques

Other than the tried and tested methods of managing projects, communicating and obtaining commitment, the rest is down to skill, or luck. Availability Management is often underresourced and underrated; successful implementations cost a lot of time and money. If you can find a friendly organization that will offer a site visit, you can achieve much by taking along your executive sponsors to find out the benefits first hand.

10.2.9 Audits for effectiveness

Availability Management is unusual in that three types of audit are necessary:

A project evaluation review should be carried out once Availability Management has been implemented to determine if budgets/timescales were adhered to - then prepare a more formal post implementation report (PIR).

The PIR should represent the final phase of the project and cover whether the objectives were achieved as well as lessons learned.

Reviewers should check that the following indicators of an effective Availability Management function have been met:

Other items to be included are more general and include the number of service failures resulting in downtime, quality of products and services from external suppliers, as well as customer and management satisfaction, all of which are indicators of project success--or otherwise.

[To top of Page]

10.3 - Ongoing Operation

10.3.1 The ongoing process

The SCM and team are at the center of activities in the Availability Management process as described in this chapter. They are not actively involved in Availability Management, unless the Availability Management role has been assigned to them.

10.3.2 Support Center Manager's role

Responsibilities and activities Unless the SCM is coordinating Availability Management activities, the role is restricted to managing the SCM team and liaising with the Availability Manager to ensure service levels are maintained.
Deliverables
None.

Competencies required
None, other than the generally expected management competencies.

KPIs
Compliance with agreed service levels.

10.3.3 Support Center Function's role

Responsibilities and activities
Depending on where the Availability Management role is located (inside or outside SCM), the responsibilities and activities to be carried out by the appropriate person are:

Deliverables
Management reports and updates, documentation, and a continually updated database (or databases) are the deliverables of these activities.

Competencies required

KPIs

Steps and tips for maintaining this process
  1. Establish a Configuration Management Database. Detailed knowledge of the IT infrastructure is the most important maintenance tip; ensure a Configuration Management Database is in place. If not, take stock of all Configuration Items (CIs), determine the relationships between them and record all information about component availability.

  2. Determine a process for registering incidents. The processing paths, the connections between CIs that make for understanding how a single failure can create a `domino' effect need to be determined. Registration of incidents and problems must then be set up and documented.

  3. Regular management updates. Positive results following introduction of Availability Management often take time to reflect; therefore keep management apprised of any improvements through regular reporting.

[To top of Page]

10.4 - Optimization

10.4.1 - The optimization process

Optimization (CMM level five) of processes will occur only when all the ITIL disciplines are being practiced to a level of integration at least to CMM level four. Some experts argue that optimization can never be achieved because best practices are themselves continually updated and improved. A good example is the gradual `standardization' of the ITIL best practices into the BS 15000 organizational standard. As the ITIL changes, so will BS 15000, though clearly over a different timescale; either way, organizations will find that they are constantly updating their own best practices and such updating will be easier if all processes are under control (CMM level three).

Full integration of the ITIL processes is time consuming, expensive and often difficult to costjustify. You must think hard about the value of the investment if the desire is to go beyond full control into integration and optimization. [Note]

10.4.2 Support Center Manager's role

The demarcation lines between SCM and Availability Management must be clear; then the appropriate person can be assigned to one or more, (or all) of the following tasks:

All of the above must be planned tasks; to achieve optimization, nothing should be reactive to events. Whether the tasks are carried out by the SCM or by the team is immaterial so long as overall accountability and responsibilities are clear.

All contracts must be robust in terms of serviceability, reliability and maintainability clauses. The tools infrastructure must be unimpeachable in its ability to monitor and flag issues in advance of problems. It is likely to be the integration of the tools architecture-particularly for global organizations - that will hinder full optimization, because of the cost and because of the sheer scale of change needed.

10.4.3 Other key roles and functions in the optimization process

It is impossible to achieve optimization without involving every one of the other ITIL roles/managers in the work. Optimization of Availability Management in particular of the ITIL disciplines is beyond the scope of any organization unless every other process is perfectly harmonized and managed.

10.4.4 Future impact of this process on the Support Center

A successful implementation will cause the business community to align more closely with IT because they will depend on the high availability achieved. The SC will find that high availability will be a double-edged sword of opportunity, where ever-higher availability levels will be sought yet the business will not expect to pay more for the privilege.

[To top of Page]

10.5 - Measurement, costing and management reporting

10.5.1 - Implementing: Benefits and Costs

Why implement this process and what can be gained Without Availability Management, it is unlikely that IT services with a required level of availability will be delivered and managed to meet the target. Availability Management helps to deliver IT services, at a known and justified cost, to a predetermined level of quality and security that is in-line with business requirements.

Without Availability Management, it is nearly impossible to underpin SLAs that are measurable, comprehensible and relevant. Contractual serviceability conditions will not be monitored unless Availability Management is in place. There are gains in:

Cost elements for implementation
Besides the usual people cost, the principal costs arise in selecting and installing the spectrum of software tools needed to support the function; as mentioned throughout, a single tool is not yet available and custom solutions are generally required.

You may need to produce a detailed cost plan (and it is strongly recommended that you do this so that everyone is aware in advance of the investment required). Consult the cost manager (or financial manager if there is no IT cost manager), for advice about how to go about preparing cost data, to ensure that Availability Management financial plans follow the same standards, depreciation criteria and same terms as other organization financial plans. The cost manager may have templates, or perhaps may have collected data that will support Availability Management planning or reduce the overhead of collecting information again.

Costs will break down into the usual categories of material, labor and overhead. Make sure that the labor cost covers all of the time needed by all of the people needed throughout the planning and implementation phases and that the operations phase is properly costed to illustrate ongoing cost. The materials should cover equipment (hardware) and software and overheads should at least cover accommodation and costs that can be transferred (i.e. goods or services that can be attributed and transferred from one functional group or department to another). It is another strong recommendation that the cost of project management is clearly and unambiguously defined, and where possible, the labor costs are clearly defined by phase and milestone.

Making the business case to implement
Availability Management is no different from any of the ITIL processes. Hard data on cost and cost benefit is lacking. In the case of Availability Management, evidence is possibly even harder to find than it is for other processes such as service desk implementations, or Problem Management. This is because service delivery processes generally are `second wave'; most organizations begin with Service Support processes and the available evidence is based on those early implementations.

The business case should focus on what Availability Management can do to support the business; lack of availability will cost money, it will cost customers, it will ultimately lead to the business challenging the ability of the IT department to deliver.

Metrics and Key Performance Indicators
The measurements and KPIs put in place should be relevant beyond implementation, throughout ongoing operations. Depending on your organization you may want to combine component availability metrics with customer metrics.

For example, percentage measures of availability (or unavailability) are commonly used; '99.8%' available ----or `0.2%' unavailable, being the opposite side of the coin. The percentages can be converted into actual time. However, the true impact of unavailability may not be obvious. The customer type metrics would typically include the frequency and impact of failures (hence a measure of reliability), duration of downtime and impact and the scope of the downtime. A failure that inconvenienced one customer for six hours may actually be worse than one that the entire organization suffered for six minutes.

Some organizations require an analysis of impact on business transactions that could not be processed, or overtime that had to be worked in order to `catch up'. Make sure the business requirements have been met when deciding what to measure.

Management reporting
The quality and effectiveness of the Availability Management function depends on the reports produced. Typically, reports include information to illustrate non-contractual causes of unavailability, compliance with SLAs and compliance of suppliers to serviceability criteria especially regarding their contribution to downtime.

Regular reporting should be incorporated in service level reporting (possibly via the SCM); an exception reporting procedure should be agreed with customers, the purpose of which is to inform them of substantial deviations from the agreed requirements.

The Availability Management plan should be reviewed annually. Also consider the frequency of reporting; it may be useful to have weekly, monthly and perhaps quarterly reports that summarize things in different ways for different audiences.

It is advisable to limit the recipients in order to be able to establish the need for detailed reports and to guarantee the quality of reports.

10.5.2 Ongoing operations

Cost elements for ongoing operations Other than the points that the cost of managing the project and any capital investment no longer apply, the advice provided earlier applies. See the section on implementation.

Metrics and Key Performance Indicators
See the section on implementation. However, keep in mind that improvements should be seen over time.

Management reporting
See the section on implementation. However, keep in mind that improvements should be seen over time.

10.5.3 Optimization: benefits and costs

The benefit of full optimization is an IT infrastructure that runs forever, never breaks down and has no outages---scheduled or otherwise-that affect customers. It is also likely to be very costly and complex. Continuous improvement using British Standard 15000 as the underpinning model is a more modest and achievable goal and a more pragmatic target.

Cost elements for optimization
It is beyond the scope of an overview of Availability Management to collect together all of the elements (and issues) that would need to be discussed.

Making the business case to optimize
Optimization of a single process is not possible because of the inter-dependent nature of all support processes. Lack of availability of an IT service may be related to loss of revenue and/or lost productivity, and both may result in increased costs to the organization. The cost required to optimize must be weighed against the benefits it will provide to the organization. Consideration must be given to how optimizing this area meets (or does not meet) strategic business objectives.

10.5.4 Tools

Tools for Availability Management are not generally available as unique, standalone items. For the most part, organizations use a combination of performance management software used in capacity planning, data collection, performance tracking and simulation software, resource accounting and utilization software, Service Level Management/monitoring software and reporting software.

The performance software is interfaced to event automation tools that automate system response to non-scheduled system and application events. These include event-action engines, global event management applications and console automation products.

It is not uncommon to find tailored combinations of software that, for example, combine a suite of automated software applications that allow reporting to be carried out on the Availability Management of individual systems, to a monitor program. That program may be configured to categorize and analyze system crashes, sending the data to a central system; the central system then accumulates the data, processes it graphically and in text format and presents it for management consumption.

Annex A10.1 Availability Management sub-processes checklist Use these annex materials to determine what needs to be in place for each of the Availability Management sub-processes and what data should be collected by the toolset.

[To top of Page]

Annex Documents

Overview

[To top of Page]

Annex A10.1 - Availability management Sub-process Checklistt

A10.1.1 Record Configuration Item failures

*Objective: To record Configuration Item failures in order to identify unreliable components

Associated sub-processes

Data stores

Tool specification implications

Management information

A10.1.2 Monitor availability

Objective: This process monitors the actual availability of IT services provided to end-users.

Associated sub-processes

Data Stores

Tool specification implications
Specific Availability Management software is not always what it appears to be. Availability Management needs can generally be met using a combination of tools for other process domains (e.g. Problem Management) and in-house developments based on proprietary software and obtaining availability data from systems monitors.

Management information
Service availability report: actual service availability against target.

Analyze service availability
Objective: To analyze the registered availability and failure rates on Configuration Items to identify where improvements can be made in IT system availability.

Associated sub-processes

Data Stores

Tool Specification Implications
See ` Monitor Availability' above

Management information

A10.1.4 Monitor contracted service support

Objective: This process monitors the performance of suppliers who have an IT service contract to ensure that they are meeting their contractual obligations.

Associated Sub Processes

Data Stores

Tool Specification Implications
No specific requirements.

Management information
For Configuration Items serviced by external suppliers:

A10.1.5 Manage Availability

Objective: This process initiates changes that are intended to improve the availability of an IT service.

Associated sub-processes
See ` Analyze Service Availability'

Data Stores

Required Data Stores

Tool specification implications
No specific requirements.

Management information

A10.1.6 Forecast service availability

Objective: This process examines Configuration Item availability (reliability) and the relationship with IT services to determine the ranges of availability possible, which can be used by service level Management to negotiate service level agreements, and which form the basis of an availability plan wherein actions to ensure future improvements in availability are described.

Associated sub-processes

Data Stores

Tool specification implications

Management information

A10.1.7 Improve IT system resilience

Objective: This process examines the current IT infrastructure to identify cost-justified changes, which would improve the availability of IT services through improving the IT infrastructure resilience.

Associated sub-processes

Data Stores

Tool specification implications

Management information

A10.1.8 Manage data backup and recovery

Objective: This process manages the backing up and recovery of corporate data to ensure business continuity in the event of an IT contingency.

Associated sub-processes

Data Stores

Tool specification implications
No specific requirements.

Management information

A10.1.9 Maintain security

Objective: Security is comprised of three major aspects: availability, integrity and confidentiality. The purpose of this process is to maintain the security of the IT services and infrastructure in order to ensure the availability of the IT services.

Associated sub-processes

Data Stores

Tool specification implications

Management information
[To top of Page]



Visit my web site