HDI - Implementing Availability Management
10.1 Overview
10.1.1 Description
10.1.1 Description
Availability Management is the control (and continuous improvement) of the availability and reliability of IT services and the supporting IT infrastructure and organization. Availability Management ensures that the requirements of the business are met.
Availability Management entails systematically undertaking preventative and corrective maintenance of IT services, within justifiable cost. Technical, organizational, procedural, security and contractual aspects have an important role in this process.
This chapter examines Availability Management from the perspective of the Support Center. It is not a guide to traditional Availability Management; nor is it definitive in scope (as an example, none of the statistical analyses that are essential to Availability Management are discussed or included).
Availability Management is a complex, technology-led process that underpins much of IT Service Management. This chapter focuses on the issues that should be known to the Support Center and discusses interfaces that the Support Center could be integral to facilitating.
10.1.2 Relationships with other processes
The responsibility for ensuring that the data required within an IT service is available to endusers is that of the process Availability Management. The organizational function that actually carries out the tasks involved can vary from a representative of end-users to the Operations unit.
10.1.3 Key inputs and outputs to the process
Availability Management is the control (and continuous improvement) of the availability and reliability of IT services and the supporting IT infrastructure and organization. Availability Management ensures that the requirements of the business are met.
Availability Management entails systematically undertaking preventative and corrective maintenance of IT services, within justifiable cost. Technical, organizational, procedural, security and contractual aspects have an important role in this process.
This chapter examines Availability Management from the perspective of the Support Center. It is not a guide to traditional Availability Management; nor is it definitive in scope (as an example, none of the statistical analyses that are essential to Availability Management are discussed or included).
Availability Management is a complex, technology-led process that underpins much of IT Service Management. This chapter focuses on the issues that should be known to the Support Center and discusses interfaces that the Support Center could be integral to facilitating.
10.1.2 Relationships with other processes
The responsibility for ensuring that the data required within an IT service is available to endusers is that of the process Availability Management. The organizational function that actually carries out the tasks involved can vary from a representative of end-users to the Operations unit.
10.1.3 Key inputs and outputs to the process

Availability Management is at the center of a spider's web of activities, as described below.
Configuration Management
- Manages the Configuration Management Database where information on Configuration Item failures are stored
- Obtains and provides information on the current IT infrastructure Configuration Items and mean-time between failures
Network Services Management
- Records incidents against Configuration Items
Computer Operations Management
- Records incidents against Configuration Items
- Issues requests for preventative maintenance
Support Center
- Records user-reported Configuration Item failures
- Records incidents against Configuration Items
- Provides information on incidents, problems and Configuration Items, which are the root cause of the failure
- Provides information on end-user complaints of IT service and non-availability
- Restores data as a measure to bring the IT service back after failure
- Obtains information on Configuration Items and mean time between failures
- Issues request to restore data as a measure to bring the IT service back after failure
Problem Management
- Identifies which Configuration Item is the root cause of the incident.
- Provides information on IT service downtime.
- Obtains information on Configuration Items and mean time between failures.
- Communicates the need for change, or for preventative maintenance, as a pro-active measure
- Provides information on incidents, problems and Configuration Items, which are the root cause of the failure
- Issues request to restore data as a measure to bring the IT service back after failure
Procurement
- Identifies when service contracts are not being met
Change Management
- Issues a request for change to satisfy recommendations for improvements in IT service availability
- Processes Request For Change
- Evaluates proposed changes
Capacity Management
- Ensures that the availability plan takes into account trends in system usage
- Ensures that the system monitors record Configuration Item failures automatically
Finance
- Provides financial authorization
- Provides charging information
IT Contingency Planning (and /or Business Continuity Management)
- Ensures continued availability or at least insure minimized interruption and proper restoration of IT services (either on or off site) in the case of an extended outage or disaster.
Operations
- Covers all relevant procedures including backup, restoration and security
Development
- Ensures that IT service availability is an issue considered within the development lifecycle
Testing
- Ensures that software Availability Management requirements are being met
Security
- Establishes and maintains physical and logical security
| Description | Source | Importance
|
| INPUTS
|
| Business requirements | Customer
| High
|
| Impact assessment of requirements | Availability Mgt
| High
|
| IT requirements (e.g. reliability, maintainability) |
Various | High
|
| Incident, problem, change and config. data |
Various | High
|
| Monitoring event data
|
Systems | Medium
|
| SLA
|
SLM | High
|
| OUTPUTS
|
| Design criteria for recovery |
SCM | High
|
| Availability Management techniques |
Availability Mgt | High
|
| Availability Management targets |
Availability Mgt | High
|
| Monitoring requirements |
Availability Mgt | High
|
| Availability Management plan |
Availability Mgt | High
|
10.1.4 Availability Manager
The Availability Manager calculates the actual IT service availability (using service targets), correlates system-detected errors and errors reported through incident records, and validates IT service availability depending on which data source provided the availability information.
The Availability Manager's responsibilities are to:
- plan customer awareness campaign for Availability Management
- define and obtain agreement on the scope and objectives of the Availability Management function within IT, and its integration and interface with business and technical groups
- define and obtain agreement on the interface between Availability Management and other processes e.g. Change Management, Problem Management, security, Service Level Management etc
- propose, introduce and oversee standards and procedures to meet availability requirements
- create availability plan
- decide on availability monitoring and reporting structure
- set up communication structure for availability reviews
- conduct post-implementation reviews of Availability Management implementation.
Role implementation
The Availability Manager role can be taken by a single individual, or by a team of individuals, not necessarily organizationally collected into one unit.
It is possible to combine the roles of Availability Manager (AM) and Service Level Manager (SLM), but the roles of (proactive) Problem Manager and Availability Manager should not be merged. Note that the above does not describe the role of the Support Center Manager (SCM), but it is included to clarify that the SCM role and AM role should be properly delineated.
10.1.5 Possible problems and issues
Possible problems
Commitment: it is rare to find that Availability Management goals are shared and conflicting priorities do occur. Senior management commitment should be sought at an early stage. Tools: specific tools are hard to find and not always as described. Research the tools market carefully to ensure that the range of tools needed to support the function and to interface with other disciplines is available and cost effective.
Supplier dependency: serviceability requirements may not have been defined because of the reluctance of suppliers' commitment to provide data. Make sure all new contracts include serviceability in the requirements specification.
Quick wins
Try to establish Availability Management through evidence of rapid improvement. Problem Management and other similar roles will be the best source of such data.
Quality issues
Availability Management is a process that underpins good quality provision of IT services. By its very nature, analysis of points of vulnerability, risk assessment and building-in of redundancy, Availability Management ensures that customers are provided with first-rate service.
Security issues
Confidentiality, Integrity and Availability (CIA) are the fundamental building blocks of IT security. IT security was defined by OGC as `balanced security in depth'; justifiable countermeasures are in place to ensure continued IT service within secure parameters. The Availability Management function has a closer relationship than most with the IT security management function.
The major security issues of Availability Management are:
- products, data and services only available to authorized personnel
- products, data and services must be recoverable within acceptable (and secure) parameters following failure
- service contracts must adhere to security policy
- countermeasures must be available to meet identified risks.
![[To top of Page]](../images/up.gif)
10.2 - Implementation
10.2.1 The implementation process
The major support function to Availability Management arises in the activities of the Support Center (SC) team. The SCM has a more limited role.

In Figure 10.2 it is assumed that the SCM and team are peripheral to both Availability Management and SLM, and are coordinating activities.
10.2.2 Support Center Manager's role
Responsibilities and activities
Unless the SCM also fills the Availability Management role, their contribution is restricted to defining appropriate SLA and SLR criteria with the customer community and creating a monitoring function to ensure compliance to agreed targets of availability. The SCM should ensure that Incident Management processes are followed to contribute ticket data effectively to Availability Management.
A role to assist with design of policy and procedures and to contribute to ARCI/RACI matrices is recommended.
Deliverables
- SLA(s)
- SLRs
- Service plans for the education of the Availability Management function
Competencies
The SCM does not require specific competencies in order to liaise with the Availability Management function.
Key Performance Indicators (KPIs)
- Compliance with SLA targets
- Compliance with SLR criteria
- Positive customer feedback
10.2.3 Support Center Function's role
Responsibilities and activities
Availability Management covers the entire lifecycle of IT service components, from initial design to decommissioning, and meets the availability requirements stipulated by the business.
Support Center activities to be carried out in support of Availability Management include:
- establishing the availability ranges of which the IT infrastructure is technically capable (and hence scoping the service level management negotiations with the end-user representative / customer)
- planning for IT service availability
- monitoring and reporting on IT service availability
- monitoring adherence to contracts by suppliers and maintainers.
Availability of an IT service could be monitored up to 24 hours per day, seven days per week or according to Availability Management and Service Level requirements in place. From a security perspective, it is important that the IT service is only available according to required specifications to the end-users specified in agreements between IT and the business (and to those involved in the Availability Management process and software development process), and to representatives of the end-users.
Deliverables
The information that has been collected by Availability Management is now examined and evaluated to identify ways in which availability can be improved (for example weak Configuration Items identified, changes to procurement policy or the IT infrastructure initiated etc.).
A well-documented Availability Management plan is a key deliverable.
Competencies required
The SC does not require specific competencies in order to liaise with the Availability Management function.
KPIs
- Compliance with SLA targets
- Compliance with SLR criteria
- Positive customer feedback
10.2.4 Other key roles and functions in the implementation process
A detailed list of the functions (and therefore the roles) having major impact on Availability Management was included under `inputs and outputs', together with information about activities and deliverables. Note that with regard to Availability Management, the SCM role is one of coordination rather than a specific task.
10.2.5 Planning for implementation
Steps to take
- Obtain management commitment. Before anything else, ensure that senior management is committed to the project. Availability Management is neither cheap nor quick to get underway.
- Develop an Implementation Plan. As with any IT Infrastructure Library discipline, planning for implementation of Availability Management is vital. It is recommended that you use a recognized method such as PRINCE2 or PMI. The principal tasks are: project design, project plan, resource allocation, development of cost models, monitoring and plans for future review.
- Determine the Availability Management requirements. These are derived from business requirements. Processes and procedures must be in place to obtain all relevant requirements of all the IT services required. These requirements must be agreed before full scale planning takes place.
- Design for Availability Management. The primary task is to ensure that availability of IT services does not fall below the management requirements, as Availability Management is integral to the change process and to the IT development processes.
- Design for security. As mentioned earlier, see CIA.
- Produce Availability Management plan. The plan should be produced and updated periodically and should focus on changes in Availability Management requirements, IT architecture, technology and demand.
Groups to contact
If the SCM is coordinating the Availability Management implementation, the following
groups must be contacted:
- Configuration Management
- Network Services Management to record incidents against Configuration Items
- Computer Operations Management
- Support Center
- Problem Management
- Procurement
- Change Management
- Capacity Management
- IT Finance
- Development
- Security
- IT Contingency Planning (and /or Business Continuity Management)
- Operations
- Testing.
10.2.6 Support Center Manager's role
A liaison role is the only requirement if SCM is not coordinating activities.
Necessary resources and relationships
Necessary information and data
- Monitor IT service availability
- Monitor supplier compliance to serviceability requirements
- Assess reliability and maintainability of components produced or maintained by IT services
- Assess the effects of changes on the IT infrastructure
- Compare planned availability with actual results
- Downtime data for any item with distinct contractual conditions
Measurements that should be in place
- The key inputs and outputs to the process along with the necessary information and data elements above should form part of the measurement system.
- The SLA(s) will provide the architecture for a full and comprehensive list.
10.2.7 Implementing key process activities: hints and tips What to implement first
To implement an Availability Management function successfully, there are two main elements that should be developed concurrently:
- procedures, because the majority of the work is to be performed regularly
- support tools to support the function.
A number of process/procedural components must be in place (covered in Annex A10.1 in more detail):
- record Configuration Item failures: this process records Configuration Item failures in order to identify unreliable components
- monitor availability: this process monitors the actual availability of IT services provided to end-users
- analyze service availability: this process analyzes the registered availability and failure rates on Configuration Items to identify where improvements can be made in IT system availability
- monitor contracted service support: this process monitors the performance of suppliers who have an IT service contract, to ensure that they are meeting their contractual obligations
- manage availability: this process initiates changes that are intended to improve the availability of an IT service
- forecast service availability: this process examines Configuration Item availability (reliability) and the relationship with IT services to determine the ranges of availability possible. This can be used by Service Level Management to negotiate service level agreements; it forms the basis of an availability plan wherein actions to ensure future improvements in availability are described
- improve IT system resilience: this process examines the current IT infrastructure to identify cost-justified changes that would improve the availability of IT services through improving the IT infrastructure resilience
- manage data backup and recovery: this process manages the back-up and recovery logistics of corporate data to ensure business continuity in the event an IT contingency is required
- maintain security: security consists of three major aspects: availability, integrity and confidentiality. The purpose of this process is to maintain the security of the IT services and infrastructure in order to ensure the availability of the IT services .
Things that always work
If senior management is committed, then implementation will be (relatively) simple. Keep everyone involved but keep the decision-making apparatus simple. Work assiduously to persuade your critics and your managers that `staying the course' is the only way that the project will be successful. Do not underestimate the tendency of one or more participants to find the process complex and time consuming.
If you do only one thing 'by-the-book' make sure it is project management.
Little things that deliver big returns
Train everyone involved. It is often overlooked! Be sure to keep management informed, especially when things are going well; even bad news, so long as it is advance knowledge and not a surprise, can work to your advantage if delivered in the right way.
Little things that always get forgotten
Make sure you have enough time to carry out the activities. Even in a small organization, Availability Management data will have similar volume and complexity to Configuration Management data. Determination of business requirements alone can take many weeks.
10.2.8 Methods and techniques
Other than the tried and tested methods of managing projects, communicating and obtaining commitment, the rest is down to skill, or luck. Availability Management is often underresourced and underrated; successful implementations cost a lot of time and money. If you can find a friendly organization that will offer a site visit, you can achieve much by taking along your executive sponsors to find out the benefits first hand.
10.2.9 Audits for effectiveness
Availability Management is unusual in that three types of audit are necessary:
- for efficiency and effectiveness of IT Availability processes
- for compliance of IT Availability Management processes with procedures
- to ensure compliance with security policy.
A project evaluation review should be carried out once Availability Management has been implemented to determine if budgets/timescales were adhered to - then prepare a more formal post implementation report (PIR).
The PIR should represent the final phase of the project and cover whether the objectives were achieved as well as lessons learned.
Reviewers should check that the following indicators of an effective Availability Management function have been met:
- reports on deviation from agreed contractual terms regarding serviceability are correct and on time
- RFCs are promptly and correctly assessed for Availability Management impact
- SLAs regarding availability are met
- forecasts on Availability Management are correct
- Availability Management compliance to procedures
- the Availability Management plan is published and appropriate
- actual Availability Management data is collected and recorded according to procedures
- interfaces to other functions are effective
- Availability Management reports are timely
- compliance with security policy.
Other items to be included are more general and include the number of service failures resulting in downtime, quality of products and services from external suppliers, as well as customer and management satisfaction, all of which are indicators of project success--or otherwise.
![[To top of Page]](../images/up.gif)
10.3 - Ongoing Operation
10.3.1 The ongoing process
The SCM and team are at the center of activities in the Availability Management process as described in this chapter. They are not actively involved in Availability Management, unless the Availability Management role has been assigned to them.
10.3.2 Support Center Manager's role
Responsibilities and activities
Unless the SCM is coordinating Availability Management activities, the role is restricted to managing the SCM team and liaising with the Availability Manager to ensure service levels are maintained.
Deliverables
None.
Competencies required
None, other than the generally expected management competencies.
KPIs
Compliance with agreed service levels.
10.3.3 Support Center Function's role
Responsibilities and activities
Depending on where the Availability Management role is located (inside or outside SCM), the responsibilities and activities to be carried out by the appropriate person are:
- calculate the actual IT service availability using service targets
- correlate system-detected errors and errors reported through incident records
- validate IT service availability depending on which data source provided the availability information
- note that the KPI here is to analyze the registered availability and failure rates on Configuration Items to identify where improvements can be made in IT system availability
- examine the IT service and item (Configuration Item) availability data and identify weak Configuration Items, and trends in IT service availability
- identify measures which can be taken to improve IT service availability
- review information from support groups such as Operations, Network Management, Support Center and Problem Management on the actual performance of the contracted external organization and compare it to the contract with that organization
- calculate the probable mean time between failures of each of the IT services, based on the service's dependency on resources and Configuration Items, and the reliability of those Configuration Items
- use data on expected ` failure to restore' times (i.e. maintainability), along with information on alternative resource, to calculate the expected availability. Maintainability covers automated failure detection, call-handling procedures, diagnostic scripts and recovery procedures to reduce restoration time. Serviceability is also relevant, e.g. an engineer from a vendor or supplier is contracted to be on site within a certain time period to begin diagnosis of a failure
- use cost / benefit analysis to evaluate various scenarios of IT infrastructure configuration, procedure changes and supplier contracts for benefit to Availability Management
- publish chosen proposals as an availability plan
- examine the current IT infrastructure using information on incidents and problems that are affecting the business, in order to identify potential changes in the IT infrastructure configuration (e.g. disk mirroring, duplexing communication lines) which would improve the resilience of the IT infrastructure to failure
- make a case for an investment in and/or change to the IT infrastructure
- forward a Request For Change to Change Management for evaluation and authorization before implementation
- establish data backup requirements for services for which he or she is accountable as part of security maintenance
- arrange for the backup data to be restored periodically, in a test of contingency procedures
- as part of the Problem Management process, in the event of a service becoming unavailable or otherwise impaired, the Support Center Manager assists with direction of data recovery actions
- interface with software and system development (generally via operability standards) to ensure that when new Configuration Items are handed over into the live environment, they have been designed and tested to meet the availability criteria required as described in the plan
- evaluate changes planned for the live environment with a view to availability of IT services
- maintain records covering which user has what level of access to which services. Procedures for this task cover authorization and creation of new user IDs, blocking user IDs for which security violations have occurred and deleting user IDs that are no longer required according to established security procedures
- pass user information to Cost Management for charging purposes, and to Capacity Management for identifying usage trends
- establish and maintain the logical security within the IT infrastructure (network access, modem dial-back, application access, terminal/PC access, and logical security levels / structures within applications) by liaising with technical domains (e.g. technical specialists, software designers, external suppliers, end-users and appropriate Technical Support Partners (TSPs).
Deliverables
Management reports and updates, documentation, and a continually updated database (or databases) are the deliverables of these activities.
Competencies required
- Statistical analysis
- Communication skills
- IT expertise (the Availability Management role is suited to a person with deep technical understanding of hardware)
- Some understanding of contractual issues is desirable and an understanding of how IT and business are aligned; business understanding can be the liaison role offered by the SCM and team.
KPIs
- Compliance with agreed SLAs
Steps and tips for maintaining this process
- Establish a Configuration Management Database. Detailed knowledge of the IT infrastructure is the most important maintenance tip; ensure a Configuration Management Database is in place. If not, take stock of all Configuration Items (CIs), determine the relationships between them and record all information about component availability.
- Determine a process for registering incidents. The processing paths, the connections between CIs that make for understanding how a single failure can create a `domino' effect need to be determined. Registration of incidents and problems must then be set up and documented.
- Regular management updates. Positive results following introduction of Availability Management often take time to reflect; therefore keep management apprised of any improvements through regular reporting.
![[To top of Page]](../images/up.gif)
10.4 - Optimization
10.4.1 - The optimization process
Optimization (CMM level five) of processes will occur only when all the ITIL disciplines are being practiced to a level of integration at least to CMM level four. Some experts argue that optimization can never be achieved because best practices are themselves continually updated and improved. A good example is the gradual `standardization' of the ITIL best practices into the BS 15000 organizational standard. As the ITIL changes, so will BS 15000, though clearly over a different timescale; either way, organizations will find that they are constantly updating their own best practices and such updating will be easier if all processes are under control (CMM level three).
Full integration of the ITIL processes is time consuming, expensive and often difficult to costjustify. You must think hard about the value of the investment if the desire is to go beyond full control into integration and optimization.
[Note]
10.4.2 Support Center Manager's role
The demarcation lines between SCM and Availability Management must be clear; then the appropriate person can be assigned to one or more, (or all) of the following tasks:
- maintain availability plan: review, assess and revise
- provide input to the design and development of new IT services to ensure availability requirements are met
- provide input to the Change Management process to ensure that proposed changes to IT infrastructure and procedures are not detrimental to IT service availability
- provide input to maintenance of security policy and procedures
- establish what range of availability is available at what cost (e.g. Help Desk can open at a time between 06:30 and 09:00, and can close at a time between 17:00 and 24:00) and provide this information to the Service Level Management process for negotiation with the customer
- monitor compliance to the availability requirements as expressed in service level agreements (e.g. data from Help Desk and system monitors)
- monitor vendor compliance to serviceability criteria in (underpinning) contracts
- collect data from Configuration Management on Configuration Item reliability and from Problem Management on downtime for analysis, possibly leading to the generation of a change request for the IT infrastructure.
All of the above must be planned tasks; to achieve optimization, nothing should be reactive to events. Whether the tasks are carried out by the SCM or by the team is immaterial so long as overall accountability and responsibilities are clear.
All contracts must be robust in terms of serviceability, reliability and maintainability clauses. The tools infrastructure must be unimpeachable in its ability to monitor and flag issues in advance of problems. It is likely to be the integration of the tools architecture-particularly for global organizations - that will hinder full optimization, because of the cost and because of the sheer scale of change needed.
10.4.3 Other key roles and functions in the optimization process
It is impossible to achieve optimization without involving every one of the other ITIL roles/managers in the work. Optimization of Availability Management in particular of the ITIL disciplines is beyond the scope of any organization unless every other process is perfectly harmonized and managed.
10.4.4 Future impact of this process on the Support Center
A successful implementation will cause the business community to align more closely with IT because they will depend on the high availability achieved. The SC will find that high availability will be a double-edged sword of opportunity, where ever-higher availability levels will be sought yet the business will not expect to pay more for the privilege.
![[To top of Page]](../images/up.gif)
10.5 - Measurement, costing and management reporting
10.5.1 - Implementing: Benefits and Costs
Why implement this process and what can be gained
Without Availability Management, it is unlikely that IT services with a required level of availability will be delivered and managed to meet the target. Availability Management helps to deliver IT services, at a known and justified cost, to a predetermined level of quality and security that is in-line with business requirements.
Without Availability Management, it is nearly impossible to underpin SLAs that are
measurable, comprehensible and relevant. Contractual serviceability conditions will not be
monitored unless Availability Management is in place.
There are gains in:
- service quality
- cost effectiveness
- manageability
- security
- overall planning for the IT infrastructure.
Cost elements for implementation
Besides the usual people cost, the principal costs arise in selecting and installing the spectrum of software tools needed to support the function; as mentioned throughout, a single tool is not yet available and custom solutions are generally required.
You may need to produce a detailed cost plan (and it is strongly recommended that you do this so that everyone is aware in advance of the investment required). Consult the cost manager (or financial manager if there is no IT cost manager), for advice about how to go about preparing cost data, to ensure that Availability Management financial plans follow the same standards, depreciation criteria and same terms as other organization financial plans. The cost manager may have templates, or perhaps may have collected data that will support Availability Management planning or reduce the overhead of collecting information again.
Costs will break down into the usual categories of material, labor and overhead. Make sure that the labor cost covers all of the time needed by all of the people needed throughout the planning and implementation phases and that the operations phase is properly costed to illustrate ongoing cost. The materials should cover equipment (hardware) and software and overheads should at least cover accommodation and costs that can be transferred (i.e. goods or services that can be attributed and transferred from one functional group or department to another). It is another strong recommendation that the cost of project management is clearly and unambiguously defined, and where possible, the labor costs are clearly defined by phase and milestone.
Making the business case to implement
Availability Management is no different from any of the ITIL processes. Hard data on cost and cost benefit is lacking. In the case of Availability Management, evidence is possibly even harder to find than it is for other processes such as service desk implementations, or Problem Management. This is because service delivery processes generally are `second wave'; most organizations begin with Service Support processes and the available evidence is based on those early implementations.
The business case should focus on what Availability Management can do to support the business; lack of availability will cost money, it will cost customers, it will ultimately lead to the business challenging the ability of the IT department to deliver.
Metrics and Key Performance Indicators
The measurements and KPIs put in place should be relevant beyond implementation, throughout ongoing operations. Depending on your organization you may want to combine component availability metrics with customer metrics.
For example, percentage measures of availability (or unavailability) are commonly used; '99.8%' available ----or `0.2%' unavailable, being the opposite side of the coin. The percentages can be converted into actual time. However, the true impact of unavailability may not be obvious. The customer type metrics would typically include the frequency and impact of failures (hence a measure of reliability), duration of downtime and impact and the scope of the downtime. A failure that inconvenienced one customer for six hours may actually be worse than one that the entire organization suffered for six minutes.
Some organizations require an analysis of impact on business transactions that could not be processed, or overtime that had to be worked in order to `catch up'. Make sure the business requirements have been met when deciding what to measure.
Management reporting
The quality and effectiveness of the Availability Management function depends on the reports produced. Typically, reports include information to illustrate non-contractual causes of unavailability, compliance with SLAs and compliance of suppliers to serviceability criteria especially regarding their contribution to downtime.
Regular reporting should be incorporated in service level reporting (possibly via the SCM); an exception reporting procedure should be agreed with customers, the purpose of which is to inform them of substantial deviations from the agreed requirements.
The Availability Management plan should be reviewed annually. Also consider the frequency of reporting; it may be useful to have weekly, monthly and perhaps quarterly reports that summarize things in different ways for different audiences.
It is advisable to limit the recipients in order to be able to establish the need for detailed reports and to guarantee the quality of reports.
10.5.2 Ongoing operations
Cost elements for ongoing operations
Other than the points that the cost of managing the project and any capital investment no longer apply, the advice provided earlier applies. See the section on implementation.
Metrics and Key Performance Indicators
See the section on implementation. However, keep in mind that improvements should be seen over time.
Management reporting
See the section on implementation. However, keep in mind that improvements should be seen over time.
10.5.3 Optimization: benefits and costs
The benefit of full optimization is an IT infrastructure that runs forever, never breaks down and has no outages---scheduled or otherwise-that affect customers. It is also likely to be very costly and complex. Continuous improvement using British Standard 15000 as the underpinning model is a more modest and achievable goal and a more pragmatic target.
Cost elements for optimization
It is beyond the scope of an overview of Availability Management to collect together all of the elements (and issues) that would need to be discussed.
Making the business case to optimize
Optimization of a single process is not possible because of the inter-dependent nature of all support processes. Lack of availability of an IT service may be related to loss of revenue and/or lost productivity, and both may result in increased costs to the organization. The cost required to optimize must be weighed against the benefits it will provide to the organization.
Consideration must be given to how optimizing this area meets (or does not meet) strategic business objectives.
10.5.4 Tools
Tools for Availability Management are not generally available as unique, standalone items. For the most part, organizations use a combination of performance management software used in capacity planning, data collection, performance tracking and simulation software, resource accounting and utilization software, Service Level Management/monitoring software and reporting software.
The performance software is interfaced to event automation tools that automate system response to non-scheduled system and application events. These include event-action engines, global event management applications and console automation products.
It is not uncommon to find tailored combinations of software that, for example, combine a suite of automated software applications that allow reporting to be carried out on the Availability Management of individual systems, to a monitor program. That program may be configured to categorize and analyze system crashes, sending the data to a central system; the central system then accumulates the data, processes it graphically and in text format and presents it for management consumption.
Annex A10.1 Availability Management sub-processes checklist
Use these annex materials to determine what needs to be in place for each of the Availability Management sub-processes and what data should be collected by the toolset.
![[To top of Page]](../images/up.gif)
Annex Documents
Overview
![[To top of Page]](../images/up.gif)
Annex A10.1 - Availability management Sub-process Checklistt
A10.1.1 Record Configuration Item failures
*Objective: To record Configuration Item failures in order to identify unreliable components
Associated sub-processes
- Collect Details. The purpose of this process is to collect details of an incident, not of a Configuration Item failure for the purpose of an availability analysis.
- Plan Resolution. This process does not formally record Configuration Item failures, but data collated is theoretically sufficient for Availability Management purposes.
Data stores
- Configuration Management Records
- Configuration Item Availability
- Known Error Records
Tool specification implications
- System management: automatic failure detection and alerts.
- Ability to extract information on Configuration Items from Configuration Management Database / incident log / problem log / known error log:
- failure time, restoration time.
- number of failures in a time period.
- A link to service catalogue is both possible and desirable.
Management information
- Knowledge of the dependency of the IT service on the operational status of its components (service catalogue linked to Configuration Management Database).
- Downtime data on individual components: time of component failure; time of component restoration.
- Number of failures per component in a given time period.
A10.1.2 Monitor availability
Objective: This process monitors the actual availability of IT services provided to end-users.
Associated sub-processes
- Collect Availability Data
- Analyze Data and issue report
Data Stores
- Service Catalog
- Service Target
- Service Availability Report
- Problem Management Records
- Availability (Performance) Database
Tool specification implications
Specific Availability Management software is not always what it appears to be. Availability Management needs can generally be met using a combination of tools for other process domains (e.g. Problem Management) and in-house developments based on proprietary software and obtaining availability data from systems monitors.
Management information
Service availability report: actual service availability against target.
Analyze service availability
Objective: To analyze the registered availability and failure rates on Configuration Items to identify where improvements can be made in IT system availability.
Associated sub-processes
- Reliability: the capability of an IT component to perform a required function under stated conditions for a stated period of time.
- Maintainability: the capability of an IT component or IT service to be retained in, or restored to, a state in which it can perform its required functions.
- Serviceability: a contractual term used to define the availability of IT components as agreed with external organizations supplying and maintaining these components. Security: providing access to IT components or IT services under secure conditions.
Data Stores
- CMDB or purpose built availability database.
- Service Availability Report
- Service Level Agreement
- Configuration Management Records
- Item Availability - Configuration Item Failure Rate
Tool Specification Implications
See ` Monitor Availability' above
Management information
- Technical evaluation of failing Configuration Items.
- Evaluation of impact of preventative measures.
- Technical and procedural recommendations for improvements in IT service availability.
A10.1.4 Monitor contracted service support
Objective: This process monitors the performance of suppliers who have an IT service contract to ensure that they are meeting their contractual obligations.
Associated Sub Processes
- Manage Network Vendors
- Manage Suppliers
- Manage Underpinning Contracts
Data Stores
- Call details
- Help Desk Procedures
- Vendor Contract
- Reports
- Configuration Management Records - IT infrastructure maintenance records
- Problem Management Records - incident data relevant to service contract
- Serviceability Contracts
Tool Specification Implications
No specific requirements.
Management information
For Configuration Items serviced by external suppliers:
- call-out time
- time of component restoration
- contractual information e.g. time of arrival of engineer
- call out frequency
- maintenance schedules.
A10.1.5 Manage Availability
Objective: This process initiates changes that are intended to improve the availability of an IT service.
Associated sub-processes
See ` Analyze Service Availability'
Data Stores
Required Data Stores
- Service Availability Report
- Computer Schedule
Tool specification implications
No specific requirements.
Management information
- Number and type of change requests issued.
- Details of preventative maintenance.
A10.1.6 Forecast service availability
Objective: This process examines Configuration Item availability (reliability) and the relationship with IT services to determine the ranges of availability possible, which can be used by service level Management to negotiate service level agreements, and which form the basis of an availability plan wherein actions to ensure future improvements in availability are described.
Associated sub-processes
Data Stores
- Service Availability Report
- Service / Configuration Item Relationship
Tool specification implications
- Performance monitoring tools are necessary..
Management information
- Probable mean time between failures for critical components.
- Probable availability per IT service.
- Availability plan.
A10.1.7 Improve IT system resilience
Objective: This process examines the current IT infrastructure to identify cost-justified changes, which would improve the availability of IT services through improving the IT infrastructure resilience.
Associated sub-processes
- Provide IT System Resilience
Data Stores
- Infrastructure Data
- Technical Infrastructure
Tool specification implications
- Performance monitoring tools are necessary.
Management information
- Proposed changes to the IT infrastructure.
- Cost-benefit justification for the proposed changes.
A10.1.8 Manage data backup and recovery
Objective: This process manages the backing up and recovery of corporate data to ensure business continuity in the event of an IT contingency.
Associated sub-processes
- Back Up Data
- Recover Data
Data Stores
- Server File store
- Data Vault
- Mainframe File store
- Computer Schedule
Tool specification implications
No specific requirements.
Management information
- Reports on (the success of) back-up schedules.
- Reports on (the success of) test restore schedules.
- Requests for data restore following contingency.
- Reference to incident / problem.
- Configuration Item details of associated system.
- Details of (success of) restoration.
- Number of requests per time period.
A10.1.9 Maintain security
Objective: Security is comprised of three major aspects: availability, integrity and confidentiality. The purpose of this process is to maintain the security of the IT services and infrastructure in order to ensure the availability of the IT services.
Associated sub-processes
- Provide Physical Security
- Provide Logical Security
Data Stores
- Security Policy
- Critical Data Register
- Service Priority Register
Tool specification implications
- Authorization control.
- Authentication control
- Control /audit logging of access to security information.
- Control /audit logging of access to directories and information databases.
- Control setting of threshold levels and accounting tables.
- Prioritized access to requested network resources.
- Event logging.
- Monitoring usage and users of security related resources.
- Control of distribution of information.
- Control of printing of classified information (application level security?).
- Maintaining user profiles, usage profiles for specific resources.
- Reporting security violations.
- Virus detection.
- Modeling tools, ability to judge impact of new development / system on current availability levels.
Management information
- Number of security breaches (physical, logical) in a specified period.
- Report on all changes (are tested and verified against availability criteria) before being introduced into live environment.
![[To top of Page]](../images/up.gif)