Continual Service Improvement
CSI activities will require software tools to support the monitoring and reporting on IT services as well as to underpin the ITSM processes. These tools will be used for data gathering, monitoring, analysis, reporting for services and will also assist in determining the efficiency and effectiveness of IT service management processes. The longer-term benefits to be gained are cost savings and increased productivity, which in turn can lead to an increase in the quality of the IT service provision.
From a service perspective the use of tools enables an organization to gain the ability to understand the health of its services from an end-to-end perspective. Even if an organization is not able to monitor end-to-end services it should be able to monitor, identify trends and perform analyses on the key components that make up an IT service.
From a process perspective the use of tools enables centralization of key processes and automation and integration of core service management processes. The raw data collected in the databases can be analysed resulting in the identification of trends. Preventive
measures can then be implemented thereby increasing the stability, reliability and availability of the IT infrastructure.
The ITSM software tools of today have expanded their scope from mere 'point' solutions focusing on the Service Desk or Change Management to complete, fully integrated solution suites. Current tools represent a paradigm shift into the new era of ERP (enterprise resource planning) systems for IT. For decades, IT has provided systems to run the business; now there are systems to run IT.
7.1 TOOLS TO SUPPORT CSI ACTIVITIES
As part of the assessment of 'Where do we want to be?' the requirements for enhancing tools need to be addressed and documented. These requirements vary depending on both the process and technology maturity. Technology specifically means systems and service management toolsets used for both monitoring and controlling the systems and infrastructure components and for managing process-based workflows, such as Incident Management.
Without question, service management tools are indispensable. However, good people, good process descriptions, and good procedures and working instructions are the basis for successful service
management. The need and the sophistication of the tools required depend on the business need for IT services and, to some extent, the size of the organization.
In a very small organization a simple in-house developed database system may be sufficient for logging and controlling incidents. However, in large organizations, very sophisticated distributed and integrated service management tools may be required, linking all the processes with systems management toolsets. While tools can be important assets, in today's IT-dependent organizations, they are a means, not an end in themselves. When implementing service management processes, look at the way current processes work. Each organization's unique need for management information should always be its starting point. This will help define the specifications for the tools best suited to that organization.
There are many tools that support the core ITSM processes and others that support IT governance as a whole which will require integration with the ITSM tools. Information from both of these toolsets typically needs to be combined, collated and analysed collectively to provide the overall business intelligence required to effectively improve on the overall IT service provision.
These tools can be defined into broad categories that support and annotate different aspects of the systems and service management domains.
7.1.1 IT Service Management Suites
|Figure 7.1 Configuration Management System|
The success of ITIL within the industry has encouraged software vendors to provide tools and suites of tools that are very compatible with the ITIL process framework providing significant levels of integration between the processes and their associated record types. This functionality creates a rich source of data and creates many of the inputs to CSI including:
- Incidents that capture the service or the Configuration Item (CI) affected are a prime input to CSI enabling an understanding of the issues that are affecting the overall service provision and related support activities. Incident matching functionality allows the Service Desk to quickly relate like issues and create master records that highlight common situations that are affecting the users with associated resolution data to enhance problem identification and reduce the mean time to restore service (MTRS).
- Problems are defined with integrated links to the associated incidents that confirmed their existence. Using the configuration data from the CMS to understand the relationships, Problem Management now has a source of related data to enable the Root Cause Analysis process including change and Release history of the affected Cl or service.
- Changes are often the first area of investigation following a service failure, again using the integration capabilities of the ITSM tool suite; it can be easier to trace the changes that have been made to a service or a Cl. The Change Schedule and projected service availability can be automated using calendaring capabilities to ensure visibility of changes and calculated impacts to the Service Level Agreements. Recent improvements in the ITSM tools now allow for automated risk assessment and prioritization of changes, highlighting potential conflicts and reducing the administrative overhead for the Change Advisory Board.
Tool functionality in support of Configuration Management and the CMS has never been more advanced with extensive discovery and service dependency mapping capabilities. The CMS is the foundation for the integration of all ITSM tool functionality and is a critical data source for the CSI mission. While the service provider must still define the overall Configuration Management process and create the data model associated with their specific environment, the tools to establish and manage the CMS and the overall service delivery architecture have become very powerful. Key functionality includes: discovery and reconciliation capabilities to capture Cl's within the environment, visualization of the hierarchy and Cl relationships for ease of understanding and support, audit tools to streamline the verification activities and the ability to federate data sources where appropriate.
The ability to coordinate Releases and manage the contents of these Releases are also more mature with native support for the definitive libraries and key integration points to the CMS and to specialized version control software packages. Functionality typically includes support for Release records that consolidate and contain
Release contents enabling the attachment of related objects and documents pertaining to the Release.
Integration is normally provided to enable hyperlinking to the associated Change records that are part of a Release and the related Incident, Problem or Service Request records that were the catalyst for the original RFC. Release versions are also supported with predefined naming and numbering standards that enhance the understanding of the overall process. Overall reporting of Release status and associated performance metrics are required as inputs to CSI ensuring that the deployment of new services are of the highest possible quality.
Service Level Management functionality is also well supported within the ITSM tool suites of today enabling the linkage of incidents, problems, changes and releases to associated Service Level Management records such as SLAs, OLAs and UCs. Most tool suites support automated SLAM charts (Service Level Agreement Monitoring) highlighting which agreements are within tolerance, are threatened or have been broken. This automation is driven by the ability to define key SLA criteria and use related operational support records to trigger thresholds (e.g. a priority one incident is about to break the one-hour resolution target time or a change has caused a longer downtime than was agreed). CMS functionality can also support the concept of prioritized CIs that underpin specific service levels highlighting a greater impact if a failed component supports a critical service or business process. Some suites also provide the ability to trigger Availability impacts to SLAs by capturing incident data related to service outages. Many of the suites also facilitate the definition of the service portfolio and the Service Catalogue while managing the workflow associated with the fulfilment of Service Requests. Some standalone point solutions support specialized functionality in this area (see below).
Reporting is one of the key benefits of an integrated ITSM suite with the ability to provide management information in a common format utilizing the combined data from all operational areas of the service lifecycle. This is of significant benefit enabling analysis of the relationships between service management events (e.g. incidents that result in problems, changes that cause incidents, releases that encapsulate certain changes) and all of the associated performance metric data that will feed the overall CSI initiatives.
7.1.2 Systems And Network Management
These tools are typically specific to the technology platforms that are under management and are used to administer the various domains but can provide a wide
variety of data in support of the service management mission. These tools generate error messages for event management and correlation that ultimately feed the Incident Management and Availability Management processes. Utilization data from these platforms is the prime source for Capacity and Performance Management and the most accurate method for establishing true availability of components that will support improvements in the area of MTRS and MTBF. As the dynamic, real-time view of the current state of the service delivery chain this information can be integrated with the known service dependencies within the CMS to give enhanced visibility into the service provision to the end-user. Many of these tools also support technology proprietary methods for software deployment within their domains (e.g. Release of patches, pushing of firmware upgrades to remote components on the network) and, as such, can provide metric data in support of CSI for Change and Release Management and dynamic updates to the CMS.
7.1.3 Event Management
Events are status messages that are generated from systems, network and application management platforms. These events are created when one of the above tools senses a threshold has been met or an error condition is discovered. The major issue with this capability is the significant volume of messages that are created from both the actual event and the up- and down-stream impact which can make it difficult to determine the real issue.
Specialized event management software can perform event correlation, impact analysis and root cause analysis to separate out these false-positive messages. Events are captured and assessed by rules-based, model-based and policy-based correlation technologies that can interpret a series of events and derive, isolate and report on the true cause and impact. These technologies support the CSI mission by providing information regarding availability impacts and performance thresholds that have been exceeded related to capacity or utilization. Well-correlated event management data provides a cost-effective method to improve the reliability, efficiency and effectiveness of the cross-domain IT infrastructure that supports the provision of business services.
7.1.4 Automated Incident/Problem Resolution
There are many products in the marketplace that support the automation of the traditional manual, labour-intensive and error-prone process of incident and problem discovery and resolution. Utilizing data from proactive detection monitors, any component or service outage generates an
.alert that automatically triggers diagnosis and repair procedures. These procedures then identify the root cause and resolve the issue using pre-programmed and scripted self-healing techniques reducing the MTRS of many common causes of incidents and in some cases preventing service outages completely. These tools also document audit-related information within the incident or problem record for future analysis and identification of other potential proactive CSI opportunities.
7.1.5 Knowledge Management
There are specialist tools available that support and streamline the discipline of Knowledge Management. Providing efficient and accurate access to previous cases with proven resolution data, these tools address the symptoms associated with the current incident or problem. Capturing data throughout the Incident and Problem Management lifecycles enables a Knowledge Management engine to assign related keywords and service relationships that will enhance the search process providing a high percentage of hits, thus speeding up the overall resolution process. KM tools also generate significant metrics aimed at measuring the improvement process itself. Key CSI data adds transparency to incident recurrence and frequency, utilization rates, the effectiveness of the stored resolutions and the impact KM has on the efficacy of the overall support function.
7.1.6 Service Request and Fulfilment (Service Catalogue and workflow)
As mentioned in the ITSM section, there are specialized tools that deal with Service Catalogue definition, request management and the workflow associated with the fulfilment of these requests. Some of these tools provide the workflow engines and some rely heavily on the capabilities of the companion ITSM suite. These tools provide the technology required to define the services within a catalogue structure in conjunction with the business customers and create a service portal (normally web-based) that allows users to request services. The request is then managed through the workflow engine assigning resources according to a defined process of tasks and related activities for each request type. These tools typically also capture related cost information to be fed to the financial systems for later charging activities. This functionality does much to support IT's integration with the business, defining services that underpin their mission and streamlining the delivery of commodity services that so often become a source of customer frustration. As in other tools, the true CSI benefit is the data that is collected and reported relate to the quality of the services
delivered, any bottlenecks encountered, and the ability to track the achievement of related service levels.
7.1.7 Performance Management
Performance management tools allow for the collection of availability, capacity and performance data from a multitude of domains and platforms within the IT infrastructure environment. This data is used to populate the Availability and Capacity Management Information Systems (AMIS and CMIS) giving IT organizations a historical, current and future view of performance, resource and service usage for offline analysis and modelling activities. Capabilities of these tools generally include:
- • Analysis of responsiveness, transaction and traffic throughput and utilization levels supporting the balancing of resources to optimize performance of the IT services
- • Workload assessment with predictive trend analysis of future growth and required capacity for each of the IT services being provided
- • The construction of performance, resource and data usage profiles enabling the comparison of actual utilization with planned models
- • Predictive performance technology enabling the evaluation of tuning alternatives for systems, networks, databases and applications that support modelling of the expected outcomes
- • Generation of the data required to report on SLAs and provide input to service Improvement plans.
There are many tools in the marketplace that support the overall CSI initiative across many aspects of performance management including business, service and resource capacity planning, feasibility analysis, modelling, solution development, implementation, management and on-going monitoring of the IT service provision.
7.1.8 Application And Service Performance Monitoring
There has always been a challenge related to understanding the true user experience related to service provision. Recognizing this need, many vendors provide tools that monitor the end-to-end delivery of services, using either active or passive technologies, to fully instrument and probe the many components of the service delivery chain. The software provides key metrics such as availability, transaction throughput, transaction response time, network latency, server efficiency, database I/O and SQL effectiveness. This data provides system, application, Availability and Capacity Managers and Service
.Owners the ability to analyse the delivery of services at all key points in the chain and look for potential improvements to streamline the overall delivery mechanisms. Usage trend data is vital for the Availability and Capacity Management processes providing the information required to assess current performance and plan for future growth. This capability also enhances Service Level Management's ability to accurately track conformance to SLAs and identify candidates for the service improvement process.
7.1.9 Statistical Analysis Tools
Most of the tools that are available to support the service management and systems management environments provide reporting capabilities but this is typically not enough to support robust Availability and Capacity Management capabilities. Raw data from many of the above tools needs to be captured into a single repository for collective analysis. This is the data that will provide input to the Availability and Capacity processes and support the analysis of MTRS, MTBFs, SFA, Demand Management, workload analysis, service modelling, application sizing and their related opportunities for improvement. This type of software provides the functionality to logically group data, model current services and enable predictive models to support future service growth utilizing a wide array of analysis techniques.
7.1.10 Software Version Control/software Configuration Management
These tools support the control of all mainframe, open systems, network and applications software providing a Definitive Media Library type repository for the development environment. Version information must seamlessly integrate with the CMS and Release Management.
7.1.11 Software Test Management
These tools support the testing activities of Release Management and deployment activities providing development, regression testing, user acceptance testing and pre-production QA testing environments. Typically, there is additional functionality to support testing of specific functional requirements that were captured early in the development lifecycle. These tools should integrate with Incident Management to capture testing-related incidents that may affect the production version of the same software.
7.1.12 Security Management
These tools support and protect the integrity of the network, systems and applications, guarding against intrusion and inappropriate access and usage. As in the systems and network management area, all security-related hardware and software solutions should generate alerts that will trigger the auto-generation of incidents for management through the normal processes.
7.1.13 Project And Portfolio Management
These tools support the registration, decision support, costing, resource management, portfolio visibility and project management of new business functionality and the services and systems that underpin them. These tools are typically used to manage the business-related aspects of IT. Integration points generally include: task assignments for development activities, change and release build information based on the agreed portfolio, capture of resource data from ITSM, TCO of portfolio and resource utilization data to Financial Management, request management linkage to ITSM etc. This tool is typically utilized to underpin the Management Board approval process related to strategic or major change projects.
7.1.14 Financial Management
Financial Management is a critical component of the IT services mission to ensure that there are enough financial resources to maintain and develop the IT infrastructure and professional capabilities in support of the current and future needs of the business. A balanced budget in IT through the recovery of IT costs, with a solid understanding of the fiscal aspects of their operations, will enable IT executives to justify their expenses in terms of the business services being supported.
In an increasing number of IT organizations this requires keeping track of resource and service utilization for the purpose of billing and chargeback of the shared IT resources. The costing and resource consumption measurement becomes critical to effectively and accurately charge business customers in an equitable, visible and auditable way.
Financial Management tools collect raw metering data from a variety of sources including operating systems, databases, middleware and applications associating this usage to users of services from specific departments. Data collectors gather critical usage metrics for each of the technologies being measured, links in the costing information from accounting software and then reports, analyses and allocates costs, enabling customers to evaluate the information in many dimensions.
Most tools will interface with the CMS to manage costs for each CI and resource to generate data related to billing, reporting, chargeback and cost analysis. These tools will typically federate with the organization's Financial Management applications and ERP system to acquire and share aggregate costs. Interfaces are also normally supported with project and portfolio management tools to facilitate the overall portfolio of investments.
Effective cost management is a basic requirement for the IT organizations of today, Financial Management tools will be required to ensure that customers can not only understand the IT costs of their business operations, but also more accurately budget and enable IT to evaluate the overall effectiveness of the services provision. Successful implementation and usage of these tools will support the continual improvement of cost management and drive ever-increasing IT value to the business.
7.1.15 Business Intelligence/Reporting
In addition to the statistical analysis environment that requires a toolset to support technical data, there is also a need for a common repository of all service information and business-related data. Often these tools are provided by the same vendors who support the statistical analysis software but the focus in this instance is on providing business-related data from all of the above toolsets representing a guide to direct the activities of IT as a whole in support of the business customer.
As the technology used to deliver IT services becomes increasingly complex, the distribution of services expands and the amount of centralized control we can apply is diminished, there will be a growing reliance on tools and software functionality to administer, manage, improve and ensure overall governance of IT service provision. As stated earlier, best-practice process should determine what support functionality is required but we can be assured that the software industry will continue to develop a wide and varied set of tools that can reduce the administrative overhead of managing our processes and improve the overall quality of IT service provision.
|Figure 7.2 Service-centric view of the IT enterprise|
For effective CSI it is important for organizations to view their tool requirements from an enterprise perspective as shown below. Tools for CSI should support the key operational activities of the 7-Step Improvement Process: data gathering, data processing, data analysis and data presentation. Tools must provide for monitoring of each level of the service hierarchy: services, systems and components, as well as support the reporting activities for SLAs, OLAs and UCs.