Service Operations

1Introduction 2Serv. Mgmt. 3Principles 4Process 5Activities 6Organization 7Consideration 8Implementation 9Issues AAppendeces

3. Service Operation Principles


When considering Service Operation it is tempting to focus only on managing day-to-day activities and technology as ends in themselves. However, Service Operation exists within a far greater context. As part of the Service Management Lifecycle, it is responsible for executing and performing processes that optimize the cost and quality of services. As part of the organization, it is responsible for enabling the business to meet its objectives. As part of the world of technology, it is responsible for the effective functioning of components that support services. The principles in this chapter are aimed at helping Service Operation practitioners to achieve a balance between all of these roles and to focus on effectively managing the day-to-day aspects while maintaining a perspective of the greater context.

3.1 Functions, Groups, Teams, Departments And Divisions

The Service Operation publication uses several terms to refer to the way in which people are organized to execute processes or activities. There are several published definitions for each term and it is not the purpose of this publication to enter the debate about which definition is best. Please note that the following definitions are generic and not prescriptive. They are provided simply to define assumptions and to facilitate understanding of the material. The reader should adapt these principles to the organizational practices used in their own organization.

[To top of Page]

3.2 Achieving Balance In Service Operations

Service Operation is more than just the repetitive execution of a standard set of procedures or activities. All functions, processes and activities are designed to deliver a specified and agreed level of services, but they have to be delivered in an ever-changing environment.

This forms a conflict between maintaining the status quo and adapting to changes in the business and technological environments. One of Service Operation's key roles is therefore to deal with this conflict and to achieve a balance between conflicting sets of priorities.

This section of the publication highlights some of the key tensions and conflicts and identifies how IT organizations can recognize that they are suffering from an imbalance by tending more towards one extreme or the other. It also provides some high-level guidelines on how to resolve the conflict and thus move towards a best-practice approach. Every conflict therefore represents an opportunity for growth and improvement.

3.2.1 Internal It View Versus External Business View
Figure 3.1 Achieving a balance between external and internal focus
Figure 3.1 Achieving a balance
between external and internal focus

The most fundamental conflict in all phases of the ITSM Lifecycle is between the view of IT as a set of IT services (the external business view) and the view of IT as a set of technology components (internal IT view).

 The external view of IT is the way in which services are experienced by its users and customers. They do not always understand, nor do they care about, the details of what technology is used to manage those services. All they are concerned about is that the services are delivered as required and agreed.

 The internal view of IT is the way in which IT components and systems are managed to deliver the services. Since IT systems are complex and diverse, this often means that the technology is managed by several different teams or departments - each of which is focused on achieving good performance and availability of 'its' systems.

Both views are necessary when delivering services. The organization that focuses only on business requirements without thinking about how they are going to deliver will end up making promises that cannot be kept. The organization that focuses only on internal systems without thinking about what services they support will end up with expensive services that deliver little value.

The potential for role conflict between the external and internal views is the result of many variables, including the maturity of the organization, its management culture, its history, etc. This makes a balance difficult to achieve, and most organizations tend more towards one role than the other. Of course, no organization will be totally internally or externally focused, but will find itself in a position along a spectrum between the two. This is illustrated in Figure 3.1:

Table 3.1 outlines some examples of the characteristics of positions at the extreme ends of the spectrum. The purpose of this table is to assist organizations in identifying to which extreme they are closer, not to identify real-life positions to which organizations should aspire.

 Extreme internal focusExtreme external focus
Primary focus
  • Performance and management of IT Infrastructure devices, systems and staff, with little regard to the end result on the IT service.
  • Achieving high levels of IT service performance with little regard to how it is achieved.
  • Focus on technical performance without showing what this means for services
  • Internal metrics (e.g. network uptime) reported to the business instead of service performance metrics.
  • Focus on External Metrics without showing internal staff how these are derived or how they can be improved
  • Internal staff are expected to devise their own metrics to measure internal performance.
Customer/user experience
  • High consistency of delivery, but only delivers a percentage of what the business needs.
  • Uses a 'push' approach to delivery, i.e. prefers to have a standard set of services for all business units.
  • Poor consistency of delivery
  • 'IT consists of good people with good intentions, but cannot always execute'
  • Reactive mode of operation.
  • Uses a 'pull' approach to delivery, i.e. prefers to deliver customized services upon request.
Operations strategy
  • Standard operations across the board
  • All new services need to fit into the current architecture and procedures.
  • Multiple delivery teams and multiple technologies
  • New technologies require new operations approaches and often new IT Operations teams.
Procedures and manual
  • Focus purely on how to manage the technology, not on how its performance relates to IT services
  • Focuses primarily on what needs to be done and when and less on how this should be achieved.
Cost strategy
  • Cost reduction achieved purely through technology consolidation
  • Optimization of operational procedures and resources
  • Business impact of cost cutting often only understood later
  • Return on Investment calculations are focused purely on cost savings or 'payback periods'.
  • Budget allocated on the basis of which business unit is perceived to have the most need
  • Less articulate or vocal business units often have inferior services as there is not enough funding allocated to their services.
  • Training is conducted as an apprenticeship, where new Operations staff have to learn the way things have to be done, not why
  • Training is conducted on a project-by-project basis
  • There are no standard training courses since operational procedures and technology are constantly changing.
Operations staff
  • Specialized staff, organized according to technical specialty
  • Staff work on the false assumption that good technical achievement is the same as good customer service.
  • Generalist staff, organized partly according to technical capability and partly according to their relationship with a business unit
  • Reliance on 'heroics', where staff go out of their way to resolve problems that could have been prevented by better internal processes.
Table 3.1 Examples of extreme internal and external focus

This does not mean that the external focus is unimportant The whole point of Service Management is to provide services that meet the objectives of the organization as a whole. It is critical to structure services around customers. At the same time, it is possible to compromise the quality of services by not thinking about how they will be delivered.

Building Service Operation with a balance between internal and external focus requires a long-term, dedicate. approach reflected in all phases of the ITSM Service Lifecycle. This will require the following:

3.2.2 Stability Versus Responsiveness
Figure 3.2 Achieving a balance between focus on stability and responsiveness
Figure 3.2 Achieving a balance between
focus on stability and responsiveness

No matter how good the functionality is of an IT service and no matter how well it has been designed, it will be worth far less if the service components are not available or if they perform inconsistently.

This means that Service Operation needs to ensure that the IT Infrastructure is stable and available as designed. At the same time, Service Operation needs to recognize that business and IT requirements change.

Some of these changes are evolutionary. For example, the functionality, performance and architecture of a platform may change over a number of years. Each change brings with it an opportunity to provide better levels of service to the business. In evolutionary changes, it is possible to plan how to respond to the change and thus maintain stability while responding to the changes.

Many changes, though, happen very quickly and sometimes under extreme pressure. For example, a Business Unit unexpectedly wins a contract that requires additional IT services, more capacity and faster response times. The ability to respond to this type of change without impacting other services is a significant challenge.

Many IT organizations are unable to achieve this balance and tend to focus on either the stability of the IT Infrastructure or the ability to respond to changes quickly.

Extreme focus on stabilityExtreme focus on responsiveness
Primary focus
  • Technology
  • Developing and refining standard IT management techniques and processes.
  • Output to the business
  • Agrees to required changes before determining what it will take to deliver them.
Typical problems experienced
  • IT can demonstrate that it is complying with SOPS and Operational Level Agreements (OLAs), even when there is clear misalignment to business requirements
  • IT staff are not available to define or execute routine tasks because they are busy on projects for new services
Technology growth strategy
  • Growth strategy based on analyzing existing demand on existing systems
  • New services are resisted and Business Units sometimes take ownership of ' their own' systems to get access to new services.
  • Technology purchased for each new business requirement
  • Using multiple technologies and solutions for similar solutions, to meet slightly different business needs.
Technology used to deliver IT services
  • Existing or standard technology to be used; services must be adjusted to work within existing parameters
  • Over-provisioning. No attempt is made to model the new Service on the existing infrastructure. New, dedicated technology is purchased for each new project.
Capacity Management
  • Forecasts based on future business activity for each service individually and do not take into account IT activity or other IT services Existing workloads not relevant.
  • Forecasts based on projections of current workloads
  • System performance is maintained at consistent levels through tuning and demand management, not by workload forecasting and management.
Table 3.2 Examples of extreme focus on stability and responsiveness

Table 3.2 outlines some examples of the characteristics of positions at extreme ends of the spectrum. The purpose of this table is to assist organizations in identifying to which extreme they are closer, not to identify real-life positions to which organizations should aspire.

Building an IT organization that achieves a balance between stability and responsiveness in Service Operation will require the following actions:

3.2.3 Quality Of Service Versus Cost Of Service
Figure 3.3 Balancing service quality and cost
Figure 3.3 Balancing service quality and cost

Service Operation is required consistently to deliver the agreed level of IT service to its customers and users, while at the same time keeping costs and resource utilization at an optimal level.

In Figure 3.3, an increase in the level of quality usually results in an increase in the cost of that service, and vice versa. However, the relationship is not always directly proportional:

While this may seem straightforward, many organizations are under severe pressure to increase the quality of service while reducing their costs. In Figure 3.3, the relationship between cost and quality is sometimes inverse. It is possible (usually inside the range of optimization) to increase quality while reducing costs. This is normally initiated within Service Operation and carried forward by Continual Service Improvement. Some costs can be reduced incrementally over time, but most cost savings can be made only once. For example, once a duplicate software tool has been eliminated, it cannot be eliminated again for further cost savings.

Achieving an optimal balance between cost and quality (shown between the dotted lines in Figure 3.3) is a key role of Service Management. There is no industry standard for what this range should be, since each service will have a different range of optimization, depending on the nature of the service and the type of business objective being met. For example, the business may be prepared to spend more to achieve high availability on a mission-critical service, while it is prepared to live with the lower quality of an administrative tool.

Determining the appropriate balance of cost and quality should be done during the Service Strategy and Service Design Lifecycle phases, although in many organizations it is left to the Service Operation teams - many of whom do not generally have all the facts or authority to be able to make this type of decision.

.Unfortunately, it is also common to find organizations that are spending vast quantities of money without achieving any clear improvements in quality. Again, Continual Service Improvement will be able to identify the cause of the inefficiency, evaluate the optimal balance for that service and formulate a corrective plan.

Achieving the correct balance is important. Too much focus on quality will result in IT services that deliver more than necessary, at a higher cost, and could lead to a discussion on reducing the price of services. Too much focus on cost will result in IT delivering on or under budget, but putting the business at risk through substandard IT services.

Special note: just how far is too much?
Over the past several years, IT organizations have been under pressure to cut costs. In many cases this resulted in optimized costs and quality. But, in other cases, costs were cut to the point where quality started to suffer. At first, the signs were subtle - small increases in incident resolution times and a slight increase in the number of incidents. Over time, though, the situation became more serious as staff worked long hours to handle multiple workloads and services ran on ageing or outdated infrastructure.

There is no simple calculation to determine when costs have been cut too far, but good SLM is crucial to making customers aware of the impact of cutting too far, so recognizing these warning signs and symptoms can greatly enhance an organization's ability to correct this situation.

Figure 3.4 Achieving a balance between focus on cost and quality
Figure 3.4 Achieving a balance between focus on cost and quality

Service Level Requirements - together with a clear understanding of the business purpose of the service and the potential risks - will help to ensure that the service is delivered at the appropriate cost. They will also help to avoid 'over sizing' of the service just because budget is available, or 'under sizing' because the business does not understand the manageability requirements of the solution. Either result will cause customer dissatisfaction and even more expense when the solution is re-engineered or retro-fitted to the requirements that should have been specified during Service Design.

Table 3.3 outlines some examples of the characteristics of positions at extreme ends of the cost/quality spectrum. The purpose of this table is to assist organizations in identifying to which extreme they are closer, not to identify real-life positions to which organizations should aspire.

Achieving a balance will ensure delivery of the level of service necessary to meet business requirements at an optimal (as opposed to lowest possible) cost. This will require the following:

Extreme focus on qualityExtreme focus on cost
Primary focus
  • Delivering the level of quality demanded by the business regardless of what it takes
  • Meting budget and reducing costs.
Typical problems experienced
  • Escalating budgets
  • IT services generally deliver more than is necessary for business success
  • Escalating demands for higher quality services
  • IT limits h quality of services based upon their budget available
  • Escalaions from the business to get more serrice from IT.
Financial Management
  • IT usually does not have a method of communicating the cost of IT services. Accounting methods are based on an agregrated method (eg., cost of IT per user)
  • Financial reporting is done purely on budgeted amounts. There is no way of linking activities in IT to the delivery of IT services.
Table 3.3 Examples of extreme focus on quality and cost

3.2.4 Reactive Versus Proactive
A reactive organization is one which does not act unless it is prompted to do so by an external driver, e.g. a new business requirement, an application that has been developed or escalation in complaints made by users and customers. An unfortunate reality in many organizations is the focus on reactive management mistakenly as the sole means to ensure services that are highly consistent and stable, actively discouraging proactive behaviour from operational staff. The unfortunate irony of this approach is that discouraging effort investment in proactive Service Management can ultimately increase the effort and cost of reactive activities and further risk stability and consistency in services.

A proactive organization is always looking for ways to improve the current situation. It will continually scan the internal and external environments, looking for signs of potentially impacting changes. Proactive behaviour is usually seen as positive, especially since it enables the organization to maintain competitive advantage in a changing environment. However, being too proactive can be expensive and can result in staff being distracted. The need for proper balance in reactive and proactive behaviour often achieves the optimal result.

Generally, it is better to manage IT services proactively, but achieving this is not easily planned or achieved. This is because building a proactive IT organization is dependent on many variables, including:

Figure 3.5 Achieving a balance between being too reactive or too proactive
Figure 3.5 Achieving a balance between
being too reactive or too proactive

From a maturity perspective, it is clear that newer organizations will have different priorities and experiences from a more established organization - what is best practice for a mature organization may not suit a younger organization. Therefore an imbalance could result from an organization being either less or more mature. Consider the following:

While proactive behaviour in Service Operation is generally good, there are also times where reactive behaviour is needed. The role of Service Operation is therefore to achieve a balance between being reactive and proactive. This will require:

Table 3.4 outlines some examples of the characteristics of positions at extreme ends of the spectrum. The purpose of this table is to assist organizations in identifying to which extreme they are closer, not to identify real-life positions to which organizations should aspire.

 Extremely ReactiveExtremely Proactive
Primary focus
  • Responds to business needs and incidents only after they are reported.
  • Anticipates business requirements before they are reported and problems before they occur
Typical problems experienced
  • Preparing to deliver new services takes a long time because each project is dealt with as if it is the first
  • Similar incidents occur again and again, as there is no way of trending them
  • Staff turnover is high and morale is generally low, as IT staff keep moving from project to project without achieving a lasting, stable set of IT services.
  • Money is spent before the requirements are stated. In some cases IT purchases items that will never be used because they anticipated the wrong requirements or because the project is stopped
  • IT staff tend to have been in the organization for a long time and tend to assume that they know the business requirements better than the business does.
Capacity Planning
  • Wait until there are capacity problems and then purchase surplus capacity to last until the next capacity-related incident
  • Anticipate capacity problems and spend money on preventing these - even when the scenario is unlikely to happen.
IT Service Continuity Planning
  • No plans exist until after a major event or disaster
  • IT Plans focus on recovering key systems, but without ensuring that the business can recover its processes
  • Over-planning (and over-spending) of IT Recovery options. Usually immediate recovery is provided for most IT services, regardless of their impact or priority.
Change Management
  • Changes are often not logged, or logged at the last minute as Emergency Changes
  • Not enough time for proper impact and cost assessments
  • Changes are poorly tested and controlled, resulting in a high number of incidents
  • Changes are requested and implemented even when there is no real need, i.e. a significant amount of work done to tix items that are not broken
Table 3.4 Examples of extremely reactive and proactive behaviour

[To top of Page]

3.3 Providing Services

All Service Operation staff must be fully aware that they are there to 'provide service' to the business. They must provide a timely (rapid response and speedy delivery of requirements), professional and courteous service to allow the business to conduct its own activities - so that the commercial customer's needs are met and the business thrives.

It is important that staff are trained not only in how to deliver and support IT services, but also in the manner in which that service should be provided. For example, staff that are capable and deliver service effectively may still cause significant customer dissatisfaction if they are insensitive or dismissive. Conversely, no amount of being nice to a customer will help if the service is not being delivered.

A critical element of being a proficient service provider is placing as much emphasis on recruiting and training staff to develop competency in dealing with and managing customer relationships and interactions as they do on technical competencies for managing the IT environment.

[To top of Page]

3.4 Operations Staff Involvement In Service Design And Service Transition

It is extremely important that Service Operation staff are involved in Service Design and Service Transition and potentially also in Service Strategy where appropriate.

One key to achieving balance in Service Operation is an effective set of Service Design processes. These will provide IT Operations Management with:

The nature of IT Operations Management involvement should be carefully positioned. Service Design is a phase in the Service Management Lifecycle using a set of processes, not a function independent of Service Operation. As such, many of the people who are involved in Service Design will come from IT Operations Management.

This should not only be encouraged, but Service Operation staff should be measured on their involvement in Service Design activities - and such activities should be included in job descriptions and roles, etc. This will help to ensure continuity between business requirements and technology design and operation and it will also help to ensure that what is designed can also be operated. IT Operations Management staff should also be involved during Service Transition to ensure consistency and to ensure that both stated business and manageability requirements are met.

Resources must be made available for these activities and the time required should be taken into account, as appropriate.

[To top of Page]

3.5 Operational Health

Many organizations find it helpful to compare the monitoring and control of Service Operation to health monitoring and control. In this sense, the IT Infrastructure is like an organism that has vital life signs that can be monitored to check whether it is functioning normally. This means that it is not necessary to monitor continuously every component of every IT system to ensure that it is functioning.

Operational Health can be determined by isolating a few important 'vital signs' on devices or services that are defined as critical for the successful execution of a Vital Business Function. This could be the bandwidth utilization on a network segment, or memory utilization on a major server. If these signs are within normal ranges, the system is healthy and does not require additional attention. This reduction in the need for extensive monitoring will result in cost reduction and operational teams and departments that are focused on the appropriate areas for service success.

However, as with organisms, it is important to check systems more thoroughly from time to time, to check for problems that do not immediately affect vital signs. For example a disk may be functioning perfectly, but it could be nearing its Mean Time Between Failures (MTBF) threshold. In this case the system should be taken out of service and given a thorough examination or 'health check'. At the same time, it should be stressed that the end result should be the healthy functioning of the service as a whole. This means that health checks on components should be balanced against checks of the 'end-to-end' service. The definition of what needs to be monitored and what is healthy versus unhealthy is defined during Service Design, especially Availability Management and SLM.

.Operational Health is dependent on the ability to prevent incidents and problems by investing in reliable and maintainable infrastructure. This is achieved through good availability design and proactive Problem Management. At the same time, Operational Health is also dependent on the ability to identify faults and localize them effectively so that they have minimal impact on the service. This requires strong (preferably automated) Incident and Problem Management.

The idea of Operational Health has also led to a specialized area called 'Self Healing Systems'. This is an application of Availability, Capacity, Knowledge, Incident and Problem Management and refers to a system that has been designed to withstand the most severe operating conditions and to detect, diagnose and recover from most incidents and Known Errors. Self Healing Systems are known by different names, for example Autonomic Systems, Adaptive Systems and Dynamic Systems. Characteristics of Self Healing Systems include:

While the concept of Operational Health is not a core concept of Service Operation, it is often a helpful metaphor to assist in determining what needs to be monitored and how frequently to perform preventive maintenance.

What and when to monitor for operational health should be determined in Service Design, tested and refined during Service Transition and optimized in Continual Service Improvement, as necessary.

[To top of Page]

3.6 Communication

Good communication is needed with other IT teams and departments, with users and internal customers, and between the Service Operation teams and departments themselves. Issues can often be prevented or mitigated with appropriate communication. This section is aimed at summarizing the communication that should take place in Service Operation. This is not a separate process, but a checklist of the type of communication that is required for effective Service Operation.

An important principle is that all communication must have an intended purpose or a resultant action. Information should not be communicated unless there is a clear audience. In addition, that audience should have been actively involved in determining the need for that communication and what they will do with the information.

A detailed description of the types of communication typical in Service Operation is contained in Appendix B of this publication, together with a description of the typical audience and the actions that are intended to be taken as a result of each communication. These include:

Please note that there is no definitive medium for communication, nor is there a fixed location or frequency. In some organizations communication has to take place in meetings. Other organizations prefer to use e-mail or the communication inherent in their Service Management tools.

There should therefore be a policy around communication within each team or department and for each process. Although this should be formal, the policy should not be cumbersome or complex. For example, a manager might require that all communications regarding changes must be sent by e-mail. As long as this is specified in the department's SOPs (in whatever form they exist), there is no need to create a separate policy for it.

Although the typical content of communication is fairly consistent once processes have been defined, the means of communication are changing with every new introduction of technology. The list of alternatives is growing and, today, includes:

3.6.1 Meetings
Different organizations communicate in different ways. Where organizations are distributed, they will tend to rely on e-mail and teleconferencing facilities. Organizations that have more mature Service Management processes and tools will tend to rely on the tools and processes for communication (e.g. using an Incident Management tool to escalate and track incidents, instead of requesting email or telephone calls for updates).

Other organizations prefer to communicate using meetings. However, it is important not to get into the mode whereby the only time work is done, or management is involved, is during a meeting. Also, faceto-face meetings tend to increase costs (e.g. travel, time spent in informal discussions, refreshments, etc.), so meeting organizers should balance the value of the meeting with the number and identity of the attendees and the time they will spend in, and getting to, the meeting.

The purpose of meetings is to communicate effectively to a group of people about a common set of objectives or activities. Meetings should be well controlled and brief, and the focus should be on facilitating action. A good rule is not to hold a meeting if the information can be communicated effectively by automated means.

A number of factors are essential for successful meetings. Although these may seem to be common sense, they are sometimes neglected:

Examples of typical meetings are given below: The Operations Meeting
Operations meetings are normally held between the managers of the IT operational departments, teams or groups, at the beginning of each business day or week. The purpose of this type of meeting is to make staff aware of any issue relevant to Operations (such as change schedules, business events, maintenance schedules, etc.) and to provide an opportunity for staff to raise any issues of which they are aware. This is an opportunity to ensure that all departments in a data centre are synchronized. In geographically dispersed organizations it may not be possible to have a single daily Operations meeting. In these cases it is important to coordinate the agenda of the meetings and to ensure that each meeting has two components: 1 The first part of the meeting will cover aspects that apply to the organization as a whole, e.g. new policies, changes that affect all regions and business events that span all regions.

2 The second part of the meeting will cover aspects that apply only to the local region, e.g. local operations schedules, changes to local equipment, etc.

The Operations meeting is usually chaired by the IT Operations Manager or a senior Operations Manager and attended by all managers and supervisors (except those whose shifts are not on duty). It is also helpful to have at least one representative from the Service Desk at the meeting so that they are aware of any situations that could give rise to incidents.

Opportunities to improve services or processes should be captured, if raised, and forwarded to the team responsible for Continual Service Improvement. Department, Group Or Team Meetings
These meetings are essentially the same as the Operations meeting, but are aimed at a single IT department, group or team. Each manager or supervisor relays the information from the Operations meeting that is relevant to their team.

Additionally, these meetings will also cover the following: Customer Meetings
From time to time it will be necessary to hold meetings with customers, apart from the regular Service Level Review meetings. Examples include:

[To top of Page]

3.7 Documentation

IT Operations Management and all of the Technical and Application Management teams and departments are involved in creating and maintaining a range of documents. These are detailed in Chapters 4, 5 and 6 of this publication and include the following:

[To top of Page]

Visit my web site