Service Operations

7. Technology Considerations

Each function and process is defined in the relevant section in Chapters 4 and 6. This chapter brings all technology requirements together to define the overall requirement of an integrated set of Service Management technology for Service Operation.

The same technology, with some possible additions, should be used for the other phases of ITSM - Service Strategy, Service Design, Service Transition and Continual Service Improvement - to give consistency and allow an effective ITSM Lifecycle to be properly managed.

The main requirements for Service Operation are as set out in this chapter.

7.1 Generic Requirements

An integrated ITSM technology (or toolset, as some suppliers sell their technology as 'modules' whereas some organizations may choose to integrate products from alternative suppliers) is needed that includes the following core functionality.

7.1.1 Self-Help

Many organizations find it beneficial to offer 'Self-Help' capabilities to their users. The technology should therefore support this capability with some form of web front-end allowing web pages to be defined offering a menu-driven range of Self-Help and Service Requests - with a direct interface into the back-end process-handling software.

7.1.2 Workflow Or Process Engine

A workflow or process control engine is needed to allow the pre-definition and control of defined processes such as an Incident Lifecycle, Request Fulfilment Lifecycle, Problem Lifecycle, Change Model, etc.

This should allow responsibilities, activities, timescales, escalation paths and alerting to be pre-defined and then automatically managed.

7.1.3 Integrated CMS

The tool should have an integrated CMS to allow the organization's IT infrastructure assets, components, services and any ancillary CIs (such as contracts, locations, licences, suppliers etc. - anything that the IT organization wishes to control) to be held, together with all relevant attributes, in a centralized location - and to allow relationships between each to be stored and maintained, and linked to Incident, Problem, Known Error and Change Records as appropriate.

7.1.4 Discovery/Deployment/Licensing Technology

In order to populate or verify the CMS data and to assist in Licence Management, discovery or automated audit tools will be required. Such tools should be capable of being run from any location on the network and allow interrogation and recovery of information relating to all components that make up, or are connected to, the IT Infrastructure.

Such technology should allow 'filtering' so that the data being carried forward can be vetted and only required data extracted. It is also very helpful if 'changes only' since the last audit can be extracted and reported upon.

The same technology can often be used to deploy new software to target locations - this is an essential requirement for all Service Operation teams or departments, to allow patches, transports etc. to be distributed to the correct users. An interface to 'Self Help' capabilities is desirable to allow approved software downloads to be requested in this way but automatically handled by the deployment software.

Tools that allow automatic comparison of software licences' details held (in the CMS, ideally) and actual licence numbers deployed - with reporting of any discrepancies - are extremely desirable.

7.1.5 Remote Control

It is often helpful for the Service Desk Analysts and other support groups to be able to take control of the user's desk-top (under properly controlled security conditions) so as to allow them to conduct investigations or correct settings, etc. Facilities to allow this level of remote control will be needed.

7.1.6 Diagnostic Utilities

It could be extremely useful for the Service Desk and other support groups if the technology incorporated the capability to create and use diagnostic scripts and other diagnostic utilities (such as, for example, case-based reasoning tools) to assist with earlier diagnosis of incidents. Ideally, these should be 'context sensitive' and presentation of the scripts automated so far as possible.

7.1.7 Reporting

There is no use in storing data unless it can be easily retrieved and used to meet the organization's purposes. The technology should therefore incorporate good reporting capabilities, as well as allow standard interfaces which can be used to input data to industry-standard reporting packages, dashboards, etc. Ideally, instant, onscreen as well as printed reporting can be provided through the use of context-sensitive 'top ten' reports.

7.1.8 Dashboards

Dashboard-type technology is useful to allow 'see at a glance' visibility of overall IT service performance and availability levels. Such displays can be included in management-level reports to users and customers - but can also give real-time information for inclusion in IT web pages to give dynamic reporting, and can be used for support and investigation purposes. Capabilities to support customized views of information to meet specific levels o' interest can be particularly useful.

However, they sometimes represent a technical rather than service view of the infrastructure and in such cases they may be of less interest to customers and users.

7.1.9 Integration with Business Service Management

There is a trend within the IT industry to try to bring together business-related IT with the processes and disciplines of IT Service Management - some call this Business Service Management. To facilitate this, business applications and tools need to be interfaced with ITSM support tools to give the required functionality. This can be illustrated by this example:

An Eastern European telecoms company was able to interface its telephone cell-net monitoring and billing system to its Event Management, Incident Management and Configuration Management processes. In this way it was able to detect any unusual usage/billing patterns and interpret these such that it could identify, with a high degree of certainly, that a telephone had been stolen and was being used to make illicit calls.

It was able to raise events for such patterns and automate actions to suspend usage of the mobile phone devices and, in parallel, identify the exact location of the illicit user (using GPRS technology) and raise incidents so that the police had the capability of finding the suspected thief and recovering the device.

More advanced tools integration capabilities are needed to allow greater exploitation of this sort of business and IT integration.

7.2 Event Management

The following features are desirable for any Event Management technology:

Multi-environmental, open interface to allow monitoring and alerting across heterogeneous services and an organization's entire IT Infrastructure.
Easy to deploy, with minimal set up costs.
'Standard' agents to monitor most common environments/components/systems.
Open interfaces to accept any standard (e.g. SNMP) event input and generation of multiple alerting.
Centralized routing of all events to a single location, programmable to allow different location(s) at various times.
Support for design/test phases - so that new applications/services can be monitored during design/test phases and results fed back into the design and transition.
Programmable assessment and handling of alerts depending upon symptoms and impact.
The ability to allow an operator to acknowledge an alert, and if no response is entered within a defined timeframe, to escalate the alert.
Good reporting functionality to allow feed-back into design and transition phases as well a meaningful management information and business user 'dashboard'.

Such technology should allow a direct interface into the organization's Incident Management processes (via entry into the Incident Log), as well as the capability to escalate to support staff, third-party suppliers, engineers etc. via email, SMS messaging, etc.

Specialist facilities, or perhaps separate specialist tools, will be required for website monitoring. Such facilities must be able to simulate customer traffic onto the website and to report on availability and performance in relation to the 'customer experience'.

7.3 Incident Management

7.3.1 Integrated ITSM Technology

Integrated ITSM technology is required that has the following functionality:

An integral CMS to allow automated relationships to be made and maintained between incidents, service requests, problems, Known Errors and all other configuration items.
The CMS that can be used to assist in determining priority and aid in investigation and diagnosis.
A process flow engine to allow processes to be predefined (including pre-defined incident models, see paragraph 3.2.1.5) and automatically controlled - with flexible internal routing to all relevant support groups and external e-mail/SMS interfaces.
Automated alerting and escalation capabilities to prevent an incident being overlooked or delayed.
Open interfacing to Event Management tools, so that any failures can be automatically raised as incidents.
A web interface to allow self-help and service requests to be input via Internet/Intranet screens.
An integrated KEDB so that diagnosed and/or resolved incident/problems can be recorded and searched to help in speeding future incident resolution.
Easy-to-use reporting facilities to allow incident metrics to be produced and to facilitate incident analysis for Problem Management and Availability Management purposes.
Diagnostic tools (either integrated or interfaces to separate products), as already mentioned under Service Desk

7.3.2 Workflow And Automated Escalation

The target times should be included in support tools, which should be used to automate the workflow control and escalation paths.

If for example a second-line support group has not resolved an incident within a 60-minute agreed target, the incident must be automatically routed to the appropriate (determined by incident categorization) third-line support group - and any necessary hierarchic escalation should be automatically undertaken (e.g. SMS message to the Service Desk Manager, Incident Manager and/or IT Services Manager and perhaps to the user, if appropriate). The second-line support group must be informed of the escalation action as part of the automated process.

7.4 Request Fulfilment

Integrated ITSM technology is needed so that Service Requests can be linked to incidents or events that have initiated them (and been stored in the same CMS, which can be interrogated to report against SLAs). Some organizations will be content to use the Incident Management element of such tools and to treat Service Requests as a subset and defined category of incidents. Where an organization chooses to raise separate Service Requests, it will require a tool which allows this capability. Front-end Self-Help capabilities will be needed to allow users to submit requests via some form of web-based, menu-driven selection process.

In all other respects the facilities needed to manage Service Requests are very similar to those for managing incidents: pre-defined workflow control of Request Models, priority levels, automated escalation, effective reporting, etc.

7.5 Problem Management

7.5.1 Integrated Service Management Technology

An integrated ITSM tool is needed that differentiates between incidents and problems - so that separate Problem Records can be raised to deal with the underlying causes of incidents, but linked to the related incidents. The functionality of Problem Records should be similar to those needed for Incident Records and also allow for multiple incident matching against Problem Records.

7.5.2 Change Management

Integration with Change Management is very important, so that Request, Event, Incident and Problem Records can be related to RFCs that have caused problems. This is to evaluate the success of the Change Management process - as well as Incident and Known Error Records - and so that RFCs can be readily raised to control the activities needed to overcome problems that have been identified through Root-Cause Analysis or Proactive Trend Analysis.

7.5.3 Integrated CMS

It is also important to have an integrated CMS which allows Problem Records to be linked to the components affected and the services impacted - and to any other relevant CIs.

Configuration Management forms part of a larger SKMS which includes linkages to many of the data repositories used in Service Operations. The process and practices of Configuration Management and its underlying technologies requirements are included in the Service Transition publication.

7.5.4 Known Error Database

An effective KEDB will be as essential requirement, which should allow easy storage and retrieval of Known Error data. Good reporting facilities are needed to ease the production of management reports, allowing the data to be incorporated automatically without the need for rekeying of data - and to allow drill-down capabilities for Incident and Problem Analysis. Note: In some cases, components or systems being investigated by Problem Management may be provided by third-party vendors or manufacturers. To address this, vendors' support tools and/or KEDBs may also need to be used.

7.6 Access Management

Access Management uses a variety of technologies, mainly:

Human Resource Management technology, to validate the identity of users and to track their status
Directory Services Technology (see section 5.8 for a description of Directory Services). This technology enables technology managers to assign names to resources on a network and then provide access to those resources based on the profile of the user. Directory Services tools also enable Access Management to create roles and groups and to link these to both users and resources
Access Management features in Applications, Middleware, Operating Systems and Network Operating Systems
Change Management systems
Request Fulfilment technology (see section 7.4).

7.7 Service Desk

Adequate tools and technology support should be provided to enable Service Desk staff to perform their roles as efficiently and effectively as possible. This will include the following.

7.7.1 Telephony

Because a high percentage of incidents are likely to be raised by telephone calls from users, the Service Desk should be provided with good, modern telephony services. This should include: An automated call distribution (ACD) system to allow a single telephone number (or numbers if a distributed or segmented Service Desk is the preferred option) and group pick-up capabilities. Warning: If options are offered via the ACD, via keyboard or Interactive Voice Recognition (IVR) selection, do not use too many levels of options or offer ambiguous options. Also do not include any 'dead ends' or options which, once chosen, do not allow the caller to go back to previous menus.

Computer Telephony Interface (CTI) software to allow caller recognition (via the linked ACD) and automated population of the users' details into the incident record from the CMS.
VoIP - use of this technology can significantly reduce telephony costs when dealing with remote and international users
Statistical software to allow telephony statistics to be gathered and easily interrogated/printed for analysis - this should allow the following information to be obtained for any selected period:
Number of calls received, in total and broken down by any 'splits' - where any call-routing has been chosen and being provided by an IVR system/keypad response
Call arrival profiles and answer times
Call abandon rates
Call handling rates by individual Service Desk call handlers
Average call durations
Hands-free headsets, with dual-user access capabilities (on at least some of the headsets) for use during training of new staff, etc.

7.7.2 Support Tools

There are a range of free-standing Service Desk support tools available in the marketplace - and some organizations may choose to produce their own simple incident logging/management systems. If an organization seriously intends to implement ITSM then a fully integrated ITSM toolset will be required that has a CMS at the centre and provides integrated support for all the ITIL defined processes.

Specific elements of such a tool that will be particularly beneficial for the Service Desk include the following.

7.7.2.1 Known Error Database

An integrated KEDB should be used to store details of previous incidents/problems and their resolutions - so that any recurrences can be more quickly diagnosed and fixed.

.To facilitate this, functionality is needed to categorize and quickly retrieve previous Known Errors, using pattern matching and key word searching against symptoms. Management of the KEDB is the responsibility of Problem Management, but the Service Desk will use to help speed incident handling.

7.7.2.2 Diagnostic Scripts

Multi-level diagnostic scripts should be developed, stored and managed to allow Service Desk staff to pinpoint the cause of failures. Specialist support groups and suppliers should be asked to provide details of the likely failures and the key questions to be asked to identify exactly what has gone wrong - and for details of the resolution actions to be taken. These details should then be included in context-sensitive scripts that should appear on-screen, dependent upon the multi-level categorization of the incident, and should be driven by the user's answers to diagnostic questions.

7.7.2.3 Self-Help web Interface

It is often cost effective and expedient to provide some form of automated 'Self-Help' functionality, so users can seek and obtain assistance which will enable them to resolve their own difficulties. Ideally this should be via a 24/7 web interface that is driven by menu selection and might include, as appropriate:

Frequently asked questions (FAQs) and solutions.
'How to do' search capabilities - to guide users through a context-sensitive list of tasks or activities.
A bulletin-type service containing details of outstanding service issues/problems together with anticipated restoration times.
Password change capabilities - using secure password protection software to check identities, perform authorization and change passwords without the need for Service Desk intervention.
Software fix downloads (patches, service packs, bug fixes etc. where it is determined that the user has the wrong version or a fix is needed) - tools are available to automate the checking process, to compare the actual desktop image with the agreed 'standard' builds and to allow upgrades to be offered and accepted where necessary.
Software repairs - where it is detected that a corruption may have occurred, to allow software fixes, removal and/or re-installation.
Software removal requests - automatically completed with any licence being returned to the pool.
Downloads of additional software packages - tools are available to check a pre-defined software policy and to allow the download of additional software packages, if covered by the policy. This can include automated software licence checks and financial approvals as well as CMS updating.
Advanced notice of any planned downtime or services outages or degradations.

The self-help solution should include the capability for users to log incidents themselves, which can be used during periods that the Service Desk is closed (if not operating 24/7) and attended to by Service Desk staff at the start of the next shift. Some care has to be exercised to ensure that the Self-Help activities selected for inclusion are not too advanced for the average user, and that safeguards are included to prevent a 'little knowledge being a dangerous thing'! It may be possible to offer slightly more advanced Self-Help facilities to 'Super Users' who have had extra training. It is also necessary to be very careful about assumptions made when staffing a Service Desk about the amount of use that users will make of Self-Help facilities.^N.

7.7.2.4 Remote Control

As already stated, but repeated here for completeness, it is often helpful for the Service Desk Analysts to be able to take control of the user's desktop so as to allow them to conduct investigations or correct settings, etc. Facilities to allow this level of remote control will be needed.

7.7.3 IT Service Continuity Planning for ITSM Support Tools

Organizations are likely to become quickly dependent upon their ITSM tools and will find it difficult to work without them. A full Business Impact Analysis and Risk Analysis should be performed and plans then developed to ensure appropriate IT Service Continuity and resilience levels.