Capacity Management
Table of Contents
  The continuing provision of consistent, acceptable service levels, at a known and controlled cost.

Capacity Management is the focal point for all IT performance and Capacity issues in the organization. It's goal is 'to ensure that cost justifiable IT Capacity always exists and that it is matched to the current and future identified needs of the business' Capacity Management focuses on the procedures and systems, including specification, implementation, monitoring, analysis, and tuning of IT resources and their resulting service performance. Capacity requirements are based on qualitative and quantitative standards set by the service level management process and specified within the provisions of a SLA or OLA.

This process needs to be re-written for confomance with ITIL Version 3. This involves removal of the Demand Management functions/sub-processes to the Service Strategy module..

More...

Visit my web site

Introduction to Capacity Management

Capacity Management is a process under Service Design in the ITIL Version 3 concept.

Click to whole Service livecycle

The capacity of an organization denotes its' ability to perform work. This ability is traditionally delineated into five primary 'entities':

An organization will combine the above five types in different ways to get work done. Combining these entities into 'operations' creates new organizational competencies. For example, Labour capacity can be increased by having ready access to a knowledge base. In this case, the capacity of the individual is the same, but the capability and competencies have increased through the combination.

In Essentials of Capacity Management, Thomas Yu-Lee suggests that the five entities can combine three-fold to produce function groupings which turn them into a 'process' capable of performing some type of work. And, Yu-Lee suggests that the quaternary, or operational level, bring together many processes to define the overall capabilities of the organization.Ref

Thus be can speak of process capacity as a key ability of the organization to achieve competence in using its' capacity wisely. This is particularly true with reference to the relationship of Capacity Management with Availability Management.

In at least one sense, the two processes have conflicting goals. Capacity Management aims to achieve an optimal usage of existing capacity. However, the need for high availability designs may lead the organization towards maintaining system redundancies and backups to ensure continuous or nearly-continuous operations. In this respect, this 'spare' capacity is like equipment 'inventory'. Business Continuity and Disaster Recovery Planning represent the most extreme form of this because they require the ability to re-create operations following a disaster - this 'duplicate', often unused capacity is a form of insurance for the organization.

In essence, high availability application designs create additional capacity. Variations within that design will affect the probability of whether undue stress on the applications will have availability or performance ramifications (ie., system fails or results in lengthy response). The correct design is essentially a function of four application requirements:

The goal is to seek an optimum solution between availability and capacity. This solution is based upon the specific objectives to be achieved taking into account constraints which will limit the achievement of the objective. Solving the problem involves applying the constraints to the objective functions and implementing the solution. Component Failure Impact Analysis (CFIA) is a technique which can be used to determine this.

[To top of Page]

Capacity Management

Objectives Coverage Policies Scaling Concepts Roles Measuring Processes Appendix

Objectives

The primary goal of Capacity Management is to define, track and control IT service capabilities on an environmental level to ensure service workloads meet demands of customers at agreed performance levels.

"Organizations are finally starting to say that if they are going to spend money, they want to have a good ROI and TCO, not 50% utilization."

Computerworld, April 12, 2004

By creating a single point of accountability, IT capacity planning and monitoring should be improved through greater expert usage of capacity techniques, consolidation of resource considerations and better risk assessments recorded in RFCMs and reflected in change management procedures.

The following are some key objectives for Capacity Management:

Critical Success Factors
The following factors expedite meeting the objectives for Capacity Management:

[To top of Page]

Process Coverage

Scope

The Capacity Management process should be the focal point for all IT performance and Capacity issues. Other technical domains, such as Network Support, may carry out the bulk of the relevant day-to-day duties but overall responsibility ties with the Capacity Management process.
Usually In Scope
The process should encompass, for both the operational and the development environment:

The Capacity Plan is the annual definitive document outlining the organization's plans with regard to needed capacity. It is an offshoot of the business plan and reflects the strategic intent of the organization over the medium and long term. The plan documents current levels of resource utilization and service performance. After consideration of business requirements, it forecasts future requirements for resource for IT services that support the business. The Capacity Plan recommends resource levels required and changes to accomplish operating level objectives in support of the SLA. It includes their cost, benefit, reports of their compliance to IT SLA, their priority and impact to the overall business and the IT infrastructure.

Capacity information may be retained in a capacity management database (CDB - not to be confused with the CMDB - though it might, conceivably, be a logical part of the CMDB). This database needs to be a logical entity containing necessary technical, business, and service level management detail data.

Capacity Management includes:

Usually Excluded

Discretionary

Assumptions

Relationship to Other Processes

Financial Management (FM)
CapM creates upgrade plans that are included in the budgeting process. Accurate cost information is vital in order to accurately budget capacity upgrades. Planning for capacity management entails the planning for new hardware and software. These costs should be incorporated into the annual budget. Costs may be the restraining factor in some decisions and affects SLA negotiation. By effectively estimating the cost of service availability and optimizing capacity, IT weighs risk versus cost to decide the countermeasures they can afford to implement and those reserved as contingency plan scenarios. Sometimes the return on investment (ROI) for a requested change may need to be demonstrated. Capacity management must ensure the necessary resources are acquired and implemented in a cost-effective manner.

Service Level Management (SLM)
CapM helps define OLAs that result from service level objectives. IT must prioritize service alerts and countermeasures to prevent degradation of performance before it affects availability. CapM interacts closely with service level, availability, service continuity, and financial management staff to decide on the cost justified proactive measures to improve the "quality" of service.

Optimization of service performance implies monitoring the application's end-to-end response times. In mature organizations, performance levels are forecast and the monitoring system sets threshold alarms to trigger alerts before the customer of the service is aware of an issue. Automated monitoring tracks performance levels of an IT service. Threshold alarms allow response for out of range conditions.

Availability Management (AM)
AM ensures optimal availability of IT services with the correct use of resources, methods, and technology. Capacity management has a very close tie to this process, since optimal use of IT resources to meet performance levels at a justifiable cost highly correlates with higher service availability. Shared reports should highlight trends indicating capacity and performance issues and management information tools will typically provide monitoring information required for both processes. The Availability Plan needs to be coordinated with the capacity planning process as the same technology solution can often meet the needs of both plans. Some solutions that cannot be cost-justified for one plan may be justified in combination with the other.

Service Continuity Management (SCM)
SCM copes with, and recovers from, unplanned situations in which the period of IT service disruption is considered unacceptable and normal availability countermeasures have not succeeded. There is a difference in scope of affect and the unacceptable nature of the disruption. Availability Management deals more practically with what IT can effectively deal with as part of its routine operation, but principles, approach, and concerns are similar. Both processes depend on CapM input to judge the level of performance when the countermeasures are enacted.

Change Management (CM)
CapM assesses the impact of changes on existing capacity and identifies additional resource requirements based on the change in demand. Changes required for capacity management are implemented typically through planning and recommendations that result from capacity, availability and service continuity management. Daily operational changes may surface as job scheduling accommodates the more routine capacity changes. In all cases, any change to the IT service environments is channeled through change management as an RFC.

Configuration Management (CFM)
Changes made to IT resources, also known as configuration items (CIs), and service level objectives for these resources need to be reflected in the configuration management database (CMDB). Service level agreement (SLA) availability and capacity data from the CMDB allows more proactive measurement of performance based on SLA compliance. This data is an important input to capacity management. Associated demand and workload requirements, resulting performance, and resource metrics are recorded in the capacity management database (CDB). Effective coordination and correlation of related elements between these logical databases are required for timely information and on-going capacity recommendation and planning.

Problem Management (PM)
Problem management deals with determining the root cause of problems. A problem is defined as one or more incidents exhibiting similar symptoms. Capacity management interfaces to problem management to investigate known errors that have affected performance levels of an IT service. CapM also provides a specialist infrastructure role to identify and diagnose capacity or performance related problems. Capacity management provides ongoing feedback and recommended changes resulting from incidents traced to known errors effected by or causing degraded performance levels of the service.

Service Desk
Incident frequency and statistics with respect to service performance levels may be reported through problem management, CMDB record, or involvement of capacity management specialist to address identify known errors relating to performance and storage capacity. Ideally, IT resource performance is recorded and managed by service desk and maintained by configuration management in the CMDB for historical retrieval and analysis. Workload, performance, and demand management activities may reference CMDB records resulting from incident escalation, failover and recovery capacity issues, or other tracked incident reports and trends.

[To top of Page]

Policies and Guidelines

Capacity Management should:

[To top of Page]

How the Process Scales

"True capacity planning is a process that requires a high level of IT maturity."

Computerworld, April 12, 2004

In most organizations, formal Capacity Management is largely guess work. While some basic trend analysis may be perform to determine straight-line projections of future bandwidth, storage and mainframe CPU usages, attempts at more meaningful analytic models often involve complicated data extractions and predictive modeling beyond the capacity of the organization. Simpler methods and solutions may need to be used initially to demonstrate an initial Capacity Management benefit or to get the program started. The emphasis is one reduced effort and less available expertise. Simplistic approaches usually require the involvement of more staff with a diversity of knowledge and expertise with the resulting additional cost of staff time analysis and meetings.

Capacity Planning will usually originate at the Operational level as individual support units periodically assess the need for additional capacity. Most often, this exercise will be tied into the introduction of new workload associated with a new application's introduction or upgrade. Tactical Capacity Planning is often difficult until the organization has defined it's services (in a Service Catalogue, in SLA or in defined process repositories). Business Capacity analysis will often be undertaken annually following the release of a Business Plan but it's intent is more often to identify operational requirements, than strategic IT investment or service delivery needs.

In early implementations there is a reliance on "snapshot" forecasts and resource predictions. There is no formal ongoing Capacity Management program and data and statistics used for other components, particularly performance, double for Capacity Management purposes.

As business volumes increase 'peak usage' periods can place operational parameters beyond normal limits subjecting parts of the infrastructure to unplanned stresses. Planning for these "spikes" means building in the needed capacities at acceptable costs. There is a need for more complicated analysis including postulates involving ROI calculations the organization may not yet have considered.

Organizationally Capacity issues are considered within their respective infrastructure enclaves. There is unlikely to be a centrally controlled focal point for Capacity issues - though there may be consultations when capacity issues in one area affect those in another.

[To top of Page]

Concepts

Capacity Management, as a separate and distinct set of processes in IT Service delivery, can benefit an organization by:

These benefits can accrue to the organization at the strategic, the tactical or the operational level according to how the organization is viewed.

Organizational Views of Capacity Management

Capcity Mgmt is a balance between three factors Capacity Management is all about managing the relationships between three inter-connected variables - resources, workload and service levels. Any one of these elements cannot be altered without affecting at least one of the other two elements. It is essentially the job of Capacity Management to take any pair of variables and derive the third:

These considerations can be viewed from one of three distinct organizational vantage points:

These three levels are linearly related to each other - that is, business processes will direct services which, in turn, establish the boundaries for operational procedures. All three approaches should be covered in the Capacity Plan and the data elements should be rationalized within a Capacity database.

"The major difference between the sub-processes is in the data that is being monitored and collected, and the perspective from which it is analyzed. For example, the level of utilization of individual components in the Infrastructure is of interest in Resource Capacity Management, while the transaction throughput rates and response times are of interest in Service Capacity Management. For Business Capacity Management, the transaction throughput rates for the on-line service need to be translated into business volumes, for example, in terms of sales invoices raised or orders taken."

ITIL Service Delivery - Section 6.3

Business Capacity Planning
Why do IT folks have such a hard time reflecting required capacity in their budgets? Why is it so difficult for them to bridge the gap between business requirements and ensuing IT capacity needs?...

There is a failure to understand the relationship and the cascading effect from business requirements to capacity requirements.

In most companies, the budget is a carefully crafted political compromise. When established, it determines the capacity that is being used by the IT operations. The capacity then determines the services and the corresponding service levels.

The provided services then often fall short of the expectations and the needs of the business users. This is the world upside down! Capacity requirements should determine the budget, and not vice versa.

The Missing Link; Capacity Management and Business Requirements, ITSMWatch.com August 16, 2004

The driving force for Capacity Management should be the business requirements of the organization. Capacity Management has a close relationship with the business strategy and planning processes within an organization. On a regular basis, the long-term strategy of an organization is encapsulated in an update of the business plans. These plans encapsulate the organization's understanding of the environment in which the business operates. Capacity Management needs to understand these motivational factors and combine them with information on the latest ideas, trends and technologies being developed by the suppliers of computing hardware and software. The emphasis is on identifying technological 'drivers' for business success.

The organization's business plans dictate the specific IT/IS strategy and business plans, the contents of which Capacity Management needs to be familiar with, and to which Capacity Management needs to have had an input. In the IT/IS specific business plans, particular technologies, hardware and software are identified, together with some indication of the timescale in which they are to be implemented.

The Capacity Management process must be responsive to changing requirements for processing Capacity. New services will be required to underpin the changing business. Existing services will require modification to provide extra functionality old services will become obsolete, freeing up spare Capacity.

As a result, the ability to satisfy the Customers' requirements will be affected. Capacity Management needs to predict these Changes and adjust for them.

These new requirements may come to the attention of Capacity Management from many different sources and for many different reasons. They may be generated by the business or may originate from the Capacity Management process itself.

Service Capacity Management
Capacity Management also needs to understand IT Services, their use of resource, working patterns, peaks and troughs, and to contribute to the service meeting its' SLA targets. The focus is on managing service performance, as determined by the targets contained in the SLAs or SLRs.

When the business requirements for a service will have come through the Business Capacity Management sub-process, and as the service has become operational, then the Service Capacity Management sub-process is responsible for ensuing that it meets the agreed service targets. The monitored service provides data that can identify trends from which normal service levels can be established. By regular monitoring and comparison with these levels, exception conditions can be defined, identified and reported upon. Therefore Capacity Management informs SLM of any service breaches or near misses.

There will be occasions when Incidents and Problems are referred to Capacity Management from other Service Management processes, or it is identified that a service could fail to meet its SLA targets. On some of these occasions the cause of the potential failure may not be resolved by Resource Capacity Management. For example, when the failure is analyzed it may be found that there is no lack of resource, or no individual component is over-utilized. However the design or programming of the application is inefficient, and so the service performance needs to be managed, as well as individual hardware or software resources.

The key to successful Service Capacity Management is to preempt difficulties, wherever possible. So this is another sub-process that has to be proactive and anticipatory rather than reactive. However, there are times when it has to react to specific performance Problems. From a knowledge and understanding of the performance requirements of each of the services being run, the effects of Changes in the use of services can be estimated, and actions taken to ensure that the required service performance can be achieved.

Resource Capacity Management
Operational CapM oversees the utilization of each of the component parts in the IT Infrastructure to ensures their optimal usage. All hardware components and many software components in the IT Infrastructure have a finite Capacity as measured by its' estimated MTBF, which, when approached or exceeded, can lead to performance problems.

This sub-process is concerned with resources such as processors, memory, disks, network bandwidth, network connections etc. so information on resource utilization needs to be collected on an iterative basis. Monitors should be installed on the individual hardware and software components, and then configured to collect the necessary data.

As in Tactical Capacity Management the key to successful Operational Capacity Management is to pre-empt difficulties, wherever possible. Therefore this sub-process has to be proactive and anticipatory. However, there are times when it has to react to specific problems that are caused by a lack of resource, or the inefficient use of resource.

From a knowledge and understanding of the use of resource by each of the services being run, the effects of Changes in the use of services can be estimated. Then hardware or software upgrades can be budgeted and planned. Alternatively, services can be balanced across the existing resource to make most effective use of the resource currently available.

[To top of Page]

Capacity Activities

There are a number of activities which are undertaken within one or more of these views of the organization, as illustrated below.

Capacity Mgmt activities according to organizational views

There are four main activities in Capacity Management which support the creation and update of the Capacity Plan:

In addition, a fifth activity involves the maintenance and usage of key capacity information. The Capacity database is a data repository used for this purpose.

[To top of Page]

Roles and Responsibilities

Capacity Manager

Capacity Management Team

Senior Leadership

Operations Management

Line(s) of Business

[To top of Page]

Performance Measurement

Key Goal and Performance Indicators

The key metrics in capacity Management are:

  Example Demand calculation
Total demand = number of concurrent users x single user demand unit

  Example Workload Calculation
Workload A = total demand x workload per demand unit for application A

Workload B = total demand x workload per demand unit for application B

Example Resource Calculation

MB storage needed = (workload A x MB per workload A unit) + (workload B x MB per workload B unit)

CPU power needed = (workload A x CPU power per workload A unit) + (workload B x CPU power needed per workload B unit)

Network bandwidth needed = (workload A x network bandwidth per workload A Unit) + (workload B x network bandwidth per workload B Unit)

Measurement Issues

Repeatable, predictable results require a real-time knowledge of the resources available for the application. Without appropriate safeguards, the very automation that reduces the amount of human intervention required to improve the service can also result in over-utilization of key resources such as storage, CPU, and memory. Provisioning an organization with a given number of users with a specific service level requires a predictable amount of resources in the data center. The ability to monitor the available resources, model application, and associated service level objectives, and choose the most efficient set of resources based on these parameters is the role of resource management.

[To top of Page]

Processes

Capacity process Summary

Controls
  • Agreements
  • Business strategies and plans
  • Financial plans and budgets
  • IT strategies, plans and investments
Inputs
  • New technology suppliers
  • Performance caused Incident/problems
  • Service agreement achievements
  • Change Calendar
  • CMDB
  • Work schedules
Activities
  • Capacity Plan maintenance
  • Monitoring
  • Analysis
  • Tuning
  • Implementing additional capacity
  • CDB recording
  • Demand Management
  • Modelling
  • Application Sizing
Outputs
  • Upgrade needs
  • Verification of SLRs
  • Budgeting cash flow
  • RFCs
Mechanisms
  • Customr Relationship Management
  • Project Management
  • IT/Business Alignment
  • Measurement
  • Continuous Improvement
  • Risk Management
  • Change Management
  • Problem Management
  • Security Management
  • Supplier Management

[To top of Page]

Inputs

[To top of Page]

Controls

[To top of Page]

Mechanisms

[To top of Page]

Outputs

[To top of Page]

Process Activities

Click for process description CapM1 - Business Capacity Management

A prime objective of the Business Capacity Management sub-process is to ensure that the future business requirements for IT Services are considered and understood, and that sufficient Capacity to support the services is planned and implemented in an appropriate timescale.

The Capacity Management process must be responsive to changing requirements for processing Capacity. New services will be required to underpin the changing business. Existing services will require modification to provide extra functionality Old services will become obsolete, freeing up spare Capacity.

"Organizations are finally starting to say that if they are going to spend money, they want to have a good ROI and TCO, not 50% utilization." IT people are wrapped up in trying to get the upper-hand of their reactive behavior and attempting to get into a proactive mode. As a result, they are not following the logical decision-making path when thinking about capacity.

The way to break through this vicious cycle is to:

  • get a timely understanding of the business requirements by having IT management involved with or at least thoroughly informed of the business decisions;
  • break the business requirements down into the needed IT services by establishing a Service Catalog, have Service Level Management in place to correctly identify service levels, and have the IT architecture designed from a "services provided" point of view;
  • make sure good Configuration Management is in place with the proper linkages of the Configuration Items, so that the increase in service levels (e.g., availability and performance) are matched by a corresponding increase in capacity;
  • have the Financial Management process tie in to Configuration Management to show the immediate budget adjustments.

August 16, 2004, Jan Vromant, The Missing Link; Capacity Management and Business Requirements

As a result, the ability to satisfy the Customers' SLRs will be affected. It is the responsibility of Capacity Management to predict these Changes and cater for them. These new requirements may come to the attention of Capacity Management from many different sources and for many different reasons. They may be generated by the business or may originate from the Capacity Management process itself. Such examples could be a recommendation to upgrade to take advantage of new technology, or the implementation of a tuning activity to resolve a performance Problem.

CapM1.1 - IT Strategic Planning
The IT Strategic Plan (or the organization's equivalent document) sets the foundation for the Capacity Planning process. The plan reflects an understanding of how the Business utilizes I.T. Services to enable key business processes. With an understanding of the interrelationships among I.T. Services, Systems and business processes, CapM is in a good position to estimate capacity needs.

CapM translates these business requirements into capacity 'language' - that is services and component needs. This is done for each line of business and then aggregated in ways which identifies commonalities and areas where resources might be efficiently shared without presenting unacceptable risks to a business area from that sharing.

CapM1.2 - Review Resource Performance
CapM ensures the monitoring of the utilization of the underlying resources. All of the collected data is recorded, analyzed, and reported. As necessary, CapM ensures that the performance of the solutions meet the business requirements.

CapM1.3 - Review SLA Performance
The SLA should include details of the anticipated service throughputs and the performance requirements. Capacity Management provides SLM with targets that have the ability to be monitored and upon which the service design has been based. Confidence that the service design will meet the SLRs and provide the ability for future growth can be gained by using modelling.

CapM1.4 - Develop Capacity Plan
The production and update of a Capacity Plan should occur at pre-defined intervals. It is, essentially, an investment plan and should therefore be published annually, in line with the business or budget lifecycle, and completed before the start of negotiations on future budgets.

The Capacity Plan needs to show what capacity is needed in the future and at what cost. It should also predict what hardware upgrades or additional equipment would be needed to meet future service level objectives. It needs to also include information on sizing of any new systems proposed. It needs to reflect cost constraints and availability or reliability requirements.

The plan should discuss current utilization rates and service performance. It should be based upon business strategies and explicitly recognize business strategic and operational plans and forecasts in its' estimates of future requirements. Recommendations should include estimates of necessary resources, relevant impacts, associated costs and benefits, etc.

A Capacity Plan template
is included in the Appendix.

CapM1.5 - Review Capacity Plan
A quarterly re-issue of the updated plan may be necessary to take into account changes in business plans, to report on the accuracy of forecasts and to make or refine recommendations.

CapM1.6 - Review SLA and Service Catalogue
Capacity Management should assist SLM in understanding the Customers' Capacity requirements, for example in terms of required response times, expected throughput and pattern of usage, terminal population. Capacity Management should help in the negotiation process by providing possible solutions to a number of scenarios. For example, if the terminal population is less than 20 then response times can be guaranteed to be less than two seconds. If more than 20 Users connect then extra network bandwidth is needed to guarantee the required response time. Modelling or Application Sizing may be employed here.

[To top of Page]

CapM2 - Service Capacity Management

CapM2.1 - Review Service Catalogue and OLAs
The definitive descriptive source of services should be the organization's Service Catalogue and associated processes in each service area. Service Capacity Management will, with the assistance of Service Level Management (who have overall responsibility for the integrity of the Service Catalogue and Operational Level Agreements) review service chains and the respective Operational Level Objectives which drive overall performance targets. CapM reviews the associated metrics for each of the key service chain participants who contribute to overall service level objectives (SLOs).

The service catalogue should be reviewed for service capacity business impact analysis (BIA), a return on investment (ROI) analysis, capacity implications for IT service continuity planning, and as an initial baseline when for workload related issues and demand management feedback.

CapM2.2 - Demand Management
Long-term demand management may be required when it is difficult to cost-justify an expensive upgrade. For example, many applications have CPU usage higher for a few hours each day, typically for mid-morning and mid-afternoon. Within these periods, the processor may be over-loaded for only one or two hours. After normal business hours, the same system may have very low overall CPU utilization, so the resource is under-utilized. If it is possible to justify the cost of an upgrade if it provides additional resources for a few hours of the day. IT can sometimes influence the demand and spread the requirement for resources throughout the day, thereby avoiding the need for the upgrade.

The influence on the services that are running could be exercised by:

CapM2.3 - Workload Management
To produce a set of forecasts that indicate estimated resource usage for the planning period.

Identifying trends are difficult without a large volume of statistics, so the data must be collected over a good length of time. The types include online, batch, and network, to effectively translate both current and proposed customer demand into workloads. Classification of workload types is generally called the "workload catalogue." The next step is to analyze and understand the trends of each workload, discovering when peaks occur and why they happen. This investigation should encompass short, medium and long-term trends. Workload catalog, peak load analysis, and operating level requirements all contribute to the production of the forecast report(s).

[To top of Page]

CapM3 - Operational Capacity Management

CapM3.1 - Maintenance - Monitoring
ensure the optimum use of the hardware and software resources, that all agreed service levels can be achieved, and that business volumes are as expected.

Most monitoring tasks are near term in nature, and rely on underlying tools and principles for operations. The collected information must be recorded or sampled over a determined period. The amount of sampling and resources required to do so must be examined also. The capacity management database (CDB) should contain information points to identify historical trends and patterns.

Data needs to be gathered at total resource utilization level, but also at a more detail profile for the workload that each service places on each particular resource. This needs to be carried out across the whole infrastructure, host or server, the network, local server, application and client-side or workstation. Similarly, data needs to be collected for each service, for example, availability and a user screen response time.

Part of the monitoring activity is of a baseline or profiles of the normal operating levels. If thresholds beyond the norm are exceeded, alarms are raised and exception reports produced. These thresholds and baselines are determined from the analysis of previously recorded data, and can be set on:

The operating system, applications management, associated hardware agent, and systems management tools may dictate which monitors are most readily available. Business rules can correlate element data to service levels in many cases. Many monitors are included as part of the operating system, or free as part of a hardware and software vendor solution, while others form part of a larger systems management tool set and need to be evaluated and purchased separately. It is important that the monitors can collect all the data required by the capacity management process, for a specific component or service.

CapM3.2 - Maintenance- Analysis
Identification of areas for capacity improvement

Data monitored and collected is analyzed for identification and adjustment of thresholds and alarms. In reactive organizations these will trigger exception reports and/or which then need to be analyzed and reported upon, and corrective action taken. Ideally, all thresholds should be set below the level at which the resource is over-utilized or below the targets in the OLA or layered OLO. This enables capacity management to take corrective action before the targets in the OLAs have been breached, or the resource has become over-utilized and there has been a period of poor performance.

In proactive organizations, the data collected from the monitoring should be analyzed to identify trends from which the normal utilization and service level, or baseline, can be established. By regular monitoring and comparison with this baseline, exception conditions in the utilization of individual components or service thresholds can be defined, and breaches or near misses in the OLAs can be reported. In addition, the data can be used to predict future resource usage.

Analysis of the data may identify issues of:

The use of each resource and service needs to be considered over the short, medium, and long-term, and the minimum, maximum and average utilization for these periods recorded. Over time, the trend in the use of the resource by the various IT services becomes apparent.

One key to determining whether a solution is operating at an acceptable level is latency, or the length of time a user has to wait for a response once a request for information is complete. Heavy workload on a server might create unacceptable wait times even though the server may be capable of handing every request. As a rule try to isolate components that have repeatable, high percentage contribution to performance levels and report them at varying workloads.

It is important to understand the utilization in each of these periods, so that changes in the use of any service can be related to predicted changes in the level of utilization of individual resources. The ability to identify the specific hardware or software resource on which a particular IT service depends, is improved greatly by an accurate, up-to-date and comprehensive CMDB. Any relevant detail performance information should be related or reside and maintained in the capacity database (CDB).

When the utilization of a particular resource is considered, it is important to understand both the total level of utilization and the utilization by individual services of the resource.

The analysis and tuning activities may also benefit from general observations and guidelines in the Guidelines for Effective Capacity Management and Designing Information Technology Solutions for scalabilitysections in this document.

CapM3.3 - Maintenance - Tuning
Better utilization of the system resource or improvement to the performance of the particular component.

The analysis of the monitored data may identify areas of the configuration that could be tuned to better utilize the system resource or improve the performance of the particular service.

Tuning techniques that are of assistance include:

Regarding the efficient use of memory, note that a process may utilize resources more efficiently if data is read into memory and manipulated there rather than a sequential read through files. Alternatively, many processes may be contending for memory resource. The excessive demands may lead to increased CPU utilization and delays while pages are swapped in and out of memory.

Before implementing any of the recommendations arising from the tuning techniques, it may be appropriate to consider using one of the on-going, or ad hoc activities to test the validity of the recommendation. For example, 'Can Demand Management be used to avoid the need to carry out any tuning?' or 'Can the proposed Change be modelled to show its effectiveness before it is implemented'.

CapM3.4 - Component Upgrade
to introduce to the live operation services any Changes that have been identified by the monitoring, analysis and tuning activities.

The implementation of any Changes arising from these activities must be undertaken through a strict, formal Change Management process. The impact of system tuning changes can have major implications on the Customers of the service. The impact and risk associated with these types of changes are likely to be greater than that of other different type of changes. Implementing the tuning Changes under formal Change Management procedures results in:

It is important that further monitoring takes place, so that the effects of the Change can be assessed. It may be necessary to make further Changes or to regress some of the original Changes.

CapM3.5 - Resource Demand Management
Influence the demand for computing resource and the use of that resource.

CapM identifies and quantifies resource usage. A good Capacity forecast will anticipate and plan for peak demand. Modeling assists in peak load analysis to create workload resource usage forecasts. Models of baseline versus actual data allow the informed reduction of the data to provide valuable information fed into the CDB for important outputs from capacity management.

To influence demand, the IT Provider may find it useful to implement charge back for IT services, so that different rates can be assessed to control demand and distribute resources more optimally. This activity can be carried out as a short-term requirement because there is insufficient current capacity to support the work being run, or, as a deliberate policy of IT management, to limit the required capacity in the long term.

Short-term demand management may occur when there has been a partial failure of a critical resource in the IT infrastructure. For example, if there has been a failure of part of the memory on a processor, it may not be possible to run the full range of services. However a limited subset of the services could be run.

CapM3.6 - Application Sizing
Ensure new applications are configured in context with their expected growth and in the context of total infrastructure capacity needs

Application sizing exercises are initiated at the Project Initiation stage for ew applications or when there is a major Change of an existing application, and are completed when the applications are accepted into the operational environment. During the initial systems analysis and design the required service levels must be specified. This enables the application development to employ the pertinent technologies and products, in order to achieve a design that meets the desired levels of service. It is much easier and less expensive to achieve the required service levels if the application design considers the required service levels at the very beginning of the application lifecycle, rather than at some later stage.

Other considerations in application sizing are the resilience aspects that it may be necessary to build into the design of the new application. Capacity Management is able to provide advice and guidance to the Availability Management process about the resources required to provide the required level of resilience.

The sizing of the application should be refined as the development process progresses. The use of modelling can be used within the application sizing process. The SLRs of the planned application developments should not be considered in isolation. The resources to be utilized by the application are likely to be shared with other services and potential threats to existing SLA targets must be recognized and managed. When purchasing software packages from external suppliers it is just as important to understand the resource requirements needed to support the application. Often it can be difficult to obtain this information from the suppliers, and it may vary, depending on throughput. Therefore, it is beneficial to identify similar Customers of the product and to gain an understanding of the resource implications from them. It may be pertinent to benchmark trial the product prior to purchase.

CapM3.7 - Maintaining Capacity Management Database
Easy availability of capacity information

The importance of the CDB is shown by its being the central repository for capacity and performance related information. The database collects much of the data input listed earlier. It collects information about workload, and performance, for example, of how heavily a customer relationship management database is being used, and data allowing trends for future forecasted growth of storage requirements. It provides information for producing reports, the capacity plan, monitoring performance, managing resources, and demand.

In the introduction of capacity management inputs, outputs and sub processes to capacity management were illustrated by diagram above. Most all inputs have associated detail that must be captured within the capacity management database in order to process and produce the required outputs.

Information retained in a CDB might include:

Capacity management issues can dramatically affect the business if they cause unplanned downtime of a vital business function. This requires considerations for capacity and availability management to be intertwined and solution designs consistent. Service continuity management is weighing risk versus cost for scenarios outside the normal availability design. Its contingency planning relies on capacity forecast and recommendations to move forward in documenting a chosen contingency measure. It follows, that the clear and distinct requirements of each process have correlated capacity and performance data identified and properly recorded in the CDB. It is important capacity detail data in the CDB relates to OLA and associated OLA and/or SLA information is tracked in the configuration management database (CMDB). Because there is an implied dependency of availability management information on the proper integration of performance and capacity measurement data, capacity and availability staff often shares common monitoring tools and management solutions.

[To top of Page]

Appendix

Terms Process Maturity Level Implementation Toolset Considerations Capacity Mgmt Plan Template

Terms

TermDefinition
AvailabilityAbility of a component or service to perform its required function at a stated instant or over a stated period of time. It is usually expressed as the availability ratio, i.e. the proportion of time that the service is actually available for use by the Customers within the agreed service hours.
Category, Type and Item (CTI)Method for Classification of a group
Capacity Planning and ManagementThe process by which measurements of current resource utilization are combined with projections of future resource requirements to allow management decisions to be made as to what computer and data communications resources will be required in the future, and how best to allocate existing resources so that they are used in the most efficient and effective manner.
Capacity DatabaseA data repository containing the capacity characteristics of the infrastructure. A Capacity database may included the Performance database.
Configuration Item (CI)Component of an infrastructure - or an item, such as a Request for Change, associated with an infrastructure - that is (or is to be) under the control of Configuration Management.CIs may vary widely in complexity, size and type, from an entire system (including all hardware, software and documentation) to a single module or a minor hardware component.
Configuration Management
Database (CMDB)
A database that contains all relevant details of each CI and details of the important relationships between CIs.
Core Business ProcessA process that relies on the unique knowledge and skills of the owner and that contributes to the owner’s competitive advantage.
Critical Success Factor (CSF)Critical Success Factors - the most important issues or actions for management to achieve control over and within its' IT processes.
CustomerPayer of a service; usually the Customer management has responsibility for the cost of the service, either directly through charging or indirectly in terms of demonstrable business need.
Demand ManagementInfluencing the use of IT capacity, perhaps by incentive or penalty, in circumstances where unmanaged demand is likely to exceed the ability to deliver. Demand Management is achieved by assigning resources according to priorities.
EnvironmentA collection of hardware, software, network and procedures that work together to provide a discrete type of computer service. There may be one or more environments on a physical platform e.g. test, production. An environment has unique features and characteristics that dictate how they are administered in similar, yet diverse, manners.
Performance DatabaseA data repository with historical information on the performance of infrastructure components and their expected MTBF
ProcessA connected series of actions, activities, Changes etc. performed by agents with the intent of satisfying a purpose or achieving a goal.
Process ControlThe process of planning and regulating, with the objective of performing a process in an effective and efficient way.
ReleaseA collection of new and/or changed CIs which are tested and introduced into the live environment together.
Request for Change (RFC)Form, or screen, used to record details of a request for a Change to any CI within an infrastructure or to procedures and items associated with the infrastructure.
RoleA set of responsibilities, activities and authorisations.
Service Level AgreementA written agreement between a service provider and Customer(s) that documents agreed services and the levels at which they are provided at various costs.
Service Level ManagementDisciplined, proactive methodology and procedures used to ensure that adequate levels of service are delivered to supported IT users in accordance with business priorities and at acceptable costs.
SystemAn integrated composite that consists of one or more of the processes, hardware, software, facilities and people, that provides a capability to satisfy a stated need or objective.
Utilization StatisticsMeasures that record the amount of computer and data communications resources used to provide data processing support.

[To top of Page]

COBIT Performance and Capacity Maturity Variations

0 Non-existentManagement has not recognized that key business processes may require high levels of performance from IT or that the overall business need for IT services may exceed capacity. There is no capacity planning process in place.
1 (Initial/Ad Hoc)Performance and capacity management is reactive and sporadic. Users often have to devise work-arounds for performance and capacity constraints. There is very little appreciation of the IT service needs by the owners of the business processes. IT management is aware of the need for performance and capacity management, but the action taken is usually reactive or incomplete. The planning process is informal.
2 (Repeatable but Intuitive)Business management is aware of the impact of not managing performance and capacity. For critical areas, performance needs are generally catered for, based on assessment of individual systems and the knowledge of support and project teams. Some individual tools may be used to diagnose performance and capacity problems, but the consistency of results is dependent on the expertise of key individuals. There is no overall assessment of the IT infrastructure’s performance capability or consideration of peak and worst-case loading situations. Availability problems are likely to occur in an unexpected and random fashion and take considerable time to diagnose and correct.
3 (Defined Process)Performance and capacity requirements are defined as steps to be addressed at all stages of the systems acquisition and deployment methodology. There are defined service level requirements and metrics that can be used to measure operational performance. It is possible to model and forecast future performance requirements. Reports can be produced giving performance statistics. Problems are still likely to occur and be time consuming to correct. Despite published service levels, end users will occasionally feel sceptical about the service capability.
4 (Managed and Measurable)Processes and tools are available to measure system usage and compare it to defined service levels. Up-to-date information is available, giving standardized performance statistics and alerting incidents such as insufficient capacity or throughput. Incidents caused by capacity and performance failures are dealt with according to defined and standardized procedures. Automated tools are used to monitor specific resources such as disk storage, network servers and network gateways. There is some attempt to report performance statistics in business process terms, so that end users can understand IT service levels. Users feel generally satisfied with current service capability and are demanding new and improved availability levels.
5 OptimizedThe performance and capacity plans are fully synchronized with the business forecasts and the operational plans and objectives. The IT infrastructure is subject to regular reviews to ensure that optimum capacity is achieved at the lowest possible cost. Advances in technology are closely monitored to take advantage of improved product performance. The metrics for measuring IT performance have been finetuned to focus on key areas and are translated into KGIs, KPIs and CFSs for all critical business processes. Tools for monitoring critical IT resources have been standardized, wherever possible, across platforms and linked to a single organization-wide incident management system. Monitoring tools increasingly can detect and automatically correct performance problems, e.g., allocating increased storage space or re-routing network traffic. Trends are detected showing imminent performance problems caused by increased business volumes, enabling planning and avoidance of unexpected incidents. Users expect 24x7x365 availability.

[To top of Page]

Process Implementation RecommendationsR

A clear set of objectives
A suggestion for an overall objective statement was given earlier: 'The continuing provision of consistent, acceptable service levels at a known and controlled cost.' This objective will need to be extended to incorporate the scope of the Capacity Management team activities, in particular, the range of systems and business processes that they will be required to cover.

Senior management commitment
This is an obvious, but vital step on the road. Without senior management commitment, the required personnel and tool resources will not be provided, and the organizational changes (particularly in terms of business information flow) will not be made.

Process/flow definition
This is a definition of the way in which the Capacity Management team will interface with the rest of the organization. It must include a definition of:

A realistic plan
The key word here is 'realistic'. While it would be nice to have a completely comprehensive, fully integrated Capacity Management function from day one, it just is not going to happen. The required organizational changes may well take a considerable amount of time to define and implement.

Recruit or retrain the right people
Despite the fact that mathematicians are amongst the most important and worthwhile people on the planet, capacity managers do not have to be mathematicians. Although a certain degree of numeric ability will be required, an ability to communicate with the business is equally important. Such abilities will be required not only to derive the required information from the organization, but also to communicate and present findings and recommendations effectively.

An effective capacity management team member really needs to have a foot on both the business and the technical community. Remember that Capacity Management is a business discipline with technical implications - not the other way round!

Acquire the right toolset
The precise toolset will depend on individual organization's circumstances, but a minimum starter set will include:

Walk before you start to run
Avoid the temptation to try and cover too many target systems or applications right from the outset. It will take some time for the organization to adapt to the additional disciplines required for Capacity Management and to start to benefit from its provision.

Iterative evolution
As I mentioned earlier, Capacity Management is and evolving process. Activities and achievements must continually be monitored and checked against the objectives and the plan. This is an ongoing iterative process of refinement and improvement.

[To top of Page]

Toolset Considerations

Monitoring Agents
There are many, many systems on the market today that claim to provide some level of "service level management." Most of these systems confine themselves to monitoring network infrastructure equipment. While this is a very valuable capability, it doesn't go far enough to be considered to be true SLM. It also doesn't lend itself to meaningful customer service level reporting.

Some types of problem tracking or call management systems market themselves as having service level management functionality. In some ways, this is correct: many businesses use standard help desk tools to report on network availability, overall system availability, and customer service levels based on the types of trouble tickets received. This kind of "service level management" is, by nature, purely reactive, relying as it does on after-the-fact information capture of service failures. These tools also provide no mechanism for tying performance to business goals.

Still other types of systems will be a little closer to a mature implementation of SLM by using test transactions to simulate customer/end user activity. Essentially, these systems send a 'synthetic transaction" through standard customer system, timing them and monitoring for difficulties. Obviously, these systems confine themselves to network and application service level management, although they can make some measure of business success by examining employee productivity (or lack thereof, due to system problems) or customer system experience.

Web monitoring systems are becoming more prevalent. These systems examine the service experienced by web site users, and may also use test transactions to measure web site access events. There are also service level management systems that focus on specific technologies, such as VOIP, or on particular industries (telecom, ISPs).

Management support tools
Effective capacity management requires the use of mature software to monitor and control the service solution and underlying platform. It is difficult to justify building this type of software internal to IT since there are software packages which leverage many years of vendor platform monitor and control elements. We call these IT management support tools. IT management support tools can be grouped by functionality based on three categories:

Enterprise management frameworks typically focus on infrastructure management, however they often include comprehensive product suites that integrate tools and information across these categories and may also integrate with other vendor products or feed information into products that also provide application and service management For more information see Service Monitoring and Control in the MOF Operations Guide. Capacity and availability management requirements for information and support tools are similar. Typically application management tools directly address these needs, but the processes also leverage information supplied by service management and infrastructure management support tool categories.

Service Management Tools
One of the main objectives of service management processes is the administration of information to manage the quality of IT services. The information is used to monitor and report information on services and customers. The service management processes ensure that services are defined and changed when necessary and provide support during exploitation of services. These are the IT management processes in which contact between the IT service provider and the customer is involved (incident management, change management, service level management). Incident, problem, and change management also define the workflow within the IT service management department. Service management tools are usually rated by their ability to support the process requirements for: incident management, problem management, change management, configuration management, change management, software control, service level management, and cost management.

Application Management Tools
Application management focuses on the application from the end-users perspective. It ensures that critical applications and data are always highly available and perform optimally. A representative of the end-users and the IT service provider agree on the service levels to be realized. These service levels are being monitored and reported by the IT service provider. Application management processes provide end-to-end information on the agreed and actual availability and performance of the application and the infrastructure resources used by this application. Application management tools are typically rated based on their ability to satisfy the needs of availability management, capacity management, and service level management.

Infrastructure management tools
Infrastructure management consists of systems, network, database and desktop management and software distribution. It is defined as management of the components or elements of the IT infrastructure, for example, servers, routers, hubs, databases, PC's and terminals. The infrastructure management tools typically fall into these categories: system management, network management, desktop management, software distribution, and database management.

[To top of Page]

Capacity Plan Template

Introduction
Any background information should be introduced in this section. This could include the organization's current levels of capacity, problems being experienced or anticipated due to over or under capacity, the degree to which service levels are being achieved, and what has changed since the last update and issue of the plan.

Scope of the plan
Ideally, all IT services and resources need to be outlined in the plan. This section needs to be specific in detail and outline what and how the elements of the IT infrastructure are to be addressed.

Methods used
The capacity plan uses information gathered by the sub-processes. This section therefore should contain details of how and when this information was obtained, for example business forecasts obtained from business plans, workload forecasts obtained from users, service level forecasts obtained by the use of modeling tools.

Management summary
The capacity plan necessarily contains technical detail that may not be of interest to all readers of the plan. The management summary needs to highlight the main issues, options, recommendations, and costs. It may be helpful to produce a separate executive summary document that contains the main points from each of the sections of the more detailed plan.

Strategic Business Scenarios
It is necessary to put the plan into the context of the current and future business environment. For example, a new CRM solution may currently be utilizing 60% of current processor and memory capacity for its back-end database. Capacity management is involved in monitoring the current system and is able to forecast the recommended additional CPU, memory and disk capacity to accommodate growth for the year. It is important to explicitly mention all known business forecasts so that readers can determine what is inside and outside the scope of the plan.

Tactical summary
A service profile should be provided for each service delivered. This should include resource utilization for a given transaction response time or throughput rate. For example, usage levels for processor, memory, storage, and network, with short, medium, and long-term trends presented in this section.

Forecasted service levels
The business plans should provide the capacity manager with details of the new services planned, and the growth or contraction of existing services. This section should report on new services and the demise of legacy systems.

Operational summary
This section concentrates on the resulting resource usage by the services. It reports again, on the short, medium and long-term trends in resource usage, broken down by hardware platform. This information needs to be gathered and analyzed by the sub processes of management of resources and service performance and so should be readily available.

Resource forecasts
This section forecasts the likely resource usage resulting from the service forecasts. Each business scenario mentioned above should be addressed here. For example, a new Internet storefront project plan may have a corresponding forecast of specific network bandwidth requirements in anticipation of transaction levels and response time for an the secured debit transaction.

Options for service improvement
Building on the results of the previous section, this section outlines the possible options for improving the effectiveness and efficiency of service delivery. It could contain options for merging different services on a single processor, upgrading the network to take advantage of technological advances, tuning the use of resource or service performance, rewriting legacy systems, purchasing new hardware or software and so on.

Cost model
The costs associated with these options should be documented here. In addition, the current and forecast cost of providing IT services needs to be included. In practice, the capacity manager obtains much of this information from the financial management process.

Recommendations
The final section of the plan should contain a summary of the recommendations made in the previous plan and their status, for example rejected, planned, or implemented. Any new recommendations should be made here, for example, which of the options mentioned in the plan is preferred.

The recommendations should quantify business benefits to be expected, potential impact of carrying out the recommendations, risks involved, resources required, and both startup and ongoing costs.

Typical reports or capacity recommendations need to address the following areas:

[To top of Page]


Visit my web site