|
|
Capacity Management Table of Contents
|
|
The continuing provision of consistent, acceptable service levels, at a known and controlled cost.
Capacity Management is the focal point for all IT performance and Capacity issues in the organization. It's goal is 'to ensure that cost justifiable IT Capacity always exists and that it is matched to the current and future identified needs of the business' Capacity Management focuses on the procedures and systems, including specification, implementation, monitoring, analysis, and tuning of IT resources and their resulting service performance. Capacity requirements are based on qualitative and quantitative standards set by the service level management process and specified within the provisions of a SLA or OLA.
This process needs to be re-written for confomance with ITIL Version 3. This involves removal of the Demand Management functions/sub-processes to the Service Strategy module..
More...
|
|
|
Introduction to Capacity Management
Capacity Management is a process under Service Design in the ITIL Version 3 concept.

The capacity of an organization denotes its' ability to perform work. This ability is traditionally delineated into five primary 'entities':
- Space - the physical location availability in which to perform the work. For IT there must be sufficient space in which to perform work.
- Labour - the sum of the individuals in the organization and the hours that each individual works.
- Equipment - machines and machine technologies used to make products - usually measured at a component level by the top speed times the hours of operation available.
- Information Technology - the capacity of an organization's computing resources to perform various types of data and information-related functions.
- Material - the inventory that an organization has to been anticipated demand.
An organization will combine the above five types in different ways to get work done. Combining these entities into 'operations' creates new organizational competencies. For example, Labour capacity can be increased by having ready access to a knowledge base. In this case, the capacity of the individual is the same, but the capability and competencies have increased through the combination.
In Essentials of Capacity Management, Thomas Yu-Lee suggests that the five entities can combine three-fold to produce function groupings which turn them into a 'process' capable of performing some type of work. And, Yu-Lee suggests that the quaternary, or operational level, bring together many processes to define the overall capabilities of the organization.Ref
Thus be can speak of process capacity as a key ability of the organization to achieve competence in using its' capacity wisely. This is particularly true with reference to the relationship of Capacity Management with Availability Management.
In at least one sense, the two processes have conflicting goals. Capacity Management aims to achieve an optimal usage of existing capacity. However, the need for high availability designs may lead the organization towards maintaining system redundancies and backups to ensure continuous or nearly-continuous operations. In this respect, this 'spare' capacity is like equipment 'inventory'. Business Continuity and Disaster Recovery Planning represent the most extreme form of this because they require the ability to re-create operations following a disaster - this 'duplicate', often unused capacity is a form of insurance for the organization.
In essence, high availability application designs create additional capacity. Variations within that design will affect the probability of whether undue stress on the applications will have availability or performance ramifications (ie., system fails or results in lengthy response). The correct design is essentially a function of four application requirements:
- how critical is the application to customers ? - if it is critical they may seek a replacement or develop one of their own.
- how long can the customer tolerate unavailability ? - can a short period of unavailability be tolerated by the customer without incurring significant costs of inconveniences
- are there replacement products on the market ? - can the customer find alternate sources or has the business an essential monopoly on the provision of the product.
- the cost determinants in achieving higher availability ? - generally-speaking high availability designs are decreasing in cost in line with the decreasing costs of machinery.
The goal is to seek an optimum solution between availability and capacity. This solution is based upon the specific objectives to be achieved taking into account constraints which will limit the achievement of the objective. Solving the problem involves applying the constraints to the objective functions and implementing the solution. Component Failure Impact Analysis (CFIA) is a technique which can be used to determine this.
![[To top of Page]](../images/up.gif)
Capacity Management
The primary goal of Capacity Management is to define, track and control IT service capabilities on an environmental level to ensure service workloads meet demands of customers at agreed performance levels.
"Organizations are finally starting to say that if they are going to spend money, they want to have a good ROI and TCO, not 50% utilization."
Computerworld, April 12, 2004
|
By creating a single point of accountability, IT capacity planning and monitoring should be improved through greater expert usage of capacity techniques, consolidation of resource considerations and better risk assessments recorded in RFCMs and reflected in change management procedures.
The following are some key objectives for Capacity Management:
- Identify IT capacity requirements to meet current and projected workloads in sufficient time to justify and program requirements in the budget process and to acquire and install computing resources before service levels deteriorate.
- Ensure that existing computer and data communications resources are used in the most efficient and effective manner.
- Capacity is measured, and monitored to fully support service level management.
- Shortfalls in the provision of the required levels of capacity and performance are recognized and appropriate corrective actions identified and implemented.
- The frequency and duration of IT failures, due to capacity, is reduced over time.
- IT support organization mindset moves from error correction to service enhancement.
Critical Success Factors
The following factors expedite meeting the objectives for Capacity Management:
- Adherence to a discrete set of standards which simplify the complexity of the IT infrastructure
- High reliability of the IT Infrastructure components and environment
- Organizational maturity of the IT support organization to maintain and support the IT Infrastructure
- Quality maintenance provided by suppliers
- Deployment of operational process and procedures
- Bypass/circumvention and recovery/restart procedures exist for critical applications
![[To top of Page]](../images/up.gif)
Scope
The Capacity Management process should be the focal point for all IT performance and Capacity issues. Other technical domains, such as Network Support, may carry out the bulk of the relevant day-to-day duties but overall responsibility ties with the Capacity Management process.
Usually In Scope
The process should encompass, for both the operational and the development environment:
- hardware - from PCs, through file servers, up to mainframes and super-computers
- networking equipment (LANs, WANs, bridges, routers etc.)
- peripherals (bulk storage devices, printers etc.)
- software - operating system and network software, in-house developments and purchased packages
- human resources, but only where a lack of human resources could result in a delay in end-to-end response time (e.g. overnight data backups not completed in time because no operators were present to load tapes) - in general human resource management is a line management responsibility, though the staffing of a Service Desk might well use identical Capacity Management techniques.
The Capacity Plan is the annual definitive document outlining the organization's plans with regard to needed capacity. It is an offshoot of the business plan and reflects the strategic intent of the organization over the medium and long term. The plan documents current levels of resource utilization and service performance. After consideration of business requirements, it forecasts future requirements for resource for IT services that support the business. The Capacity Plan recommends resource levels required and changes to accomplish operating level objectives in support of the SLA. It includes their cost, benefit, reports of their compliance to IT SLA, their priority and impact to the overall business and the IT infrastructure.
Capacity information may be retained in a capacity management database (CDB - not to be confused with the CMDB - though it might, conceivably, be a logical part of the CMDB). This database needs to be a logical entity containing necessary technical, business, and service level management detail data.
Capacity Management includes:
- Demand Management - to ensure that the future business requirements for IT services are considered, planned, and implemented in a timely manner,
- Workload Management - for translating customer demands into workloads put upon components of IT solutions (the various applications used to create the actual solution), and,
- Resource Management - ensuring that all resources are acquired and implemented in a timely and cost-effective manner.
Usually Excluded
- issues of service continuity or availability are considered under separate processes.
- short-term 'fire fighting' activities (eg., tuning, optimizing, debugging, tracing) or daily/weekly maintenance procedures. These are the responsibility of Operations.
Discretionary
- Major Incident Assistance - direct involvement when required in the restoration of Mission Critical services when capacity issues are deemed to be involved
- all resources in the organization ultimately have capacity elements associated with them. This includes staffing, facilities and materials. While some will clearly be excluded there are areas which the organization might wish to explicitly cover under the umbrella offered by capacity Management. The number of agents and working conditions of the Service Desk is a possible candidate.
Assumptions
- the organization has the competence to utilize capacity management.
Relationship to Other Processes
Financial Management (FM)
CapM creates upgrade plans that are included in the budgeting process. Accurate cost information is vital in order to accurately budget capacity upgrades. Planning for capacity management entails the planning for new hardware and software. These costs should be incorporated into the annual budget. Costs may be the restraining factor in some decisions and affects SLA negotiation. By effectively estimating the cost of service availability and optimizing capacity, IT weighs risk versus cost to decide the countermeasures they can afford to implement and those reserved as contingency plan scenarios. Sometimes the return on investment (ROI) for a requested change may need to be demonstrated. Capacity management must ensure the necessary resources are acquired and implemented in a cost-effective manner.
Service Level Management (SLM)
CapM helps define OLAs that result from service level objectives. IT must prioritize service alerts and countermeasures to prevent degradation of performance before it affects availability. CapM interacts closely with service level, availability, service continuity, and financial management staff to decide on the cost justified proactive measures to improve the "quality" of service.
Optimization of service performance implies monitoring the application's end-to-end response times. In mature organizations, performance levels are forecast and the monitoring system sets threshold alarms to trigger alerts before the customer of the service is aware of an issue. Automated monitoring tracks performance levels of an IT service. Threshold alarms allow response for out of range conditions.
Availability Management (AM)
AM ensures optimal availability of IT services with the correct use of resources, methods, and technology. Capacity management has a very close tie to this process, since optimal use of IT resources to meet performance levels at a justifiable cost highly correlates with higher service availability. Shared reports should highlight trends indicating capacity and performance issues and management information tools will typically provide monitoring information required for both processes. The Availability Plan needs to be coordinated with the capacity planning process as the same technology solution can often meet the needs of both plans. Some solutions that cannot be cost-justified for one plan may be justified in combination with the other.
Service Continuity Management (SCM)
SCM copes with, and recovers from, unplanned situations in which the period of IT service disruption is considered unacceptable and normal availability countermeasures have not succeeded. There is a difference in scope of affect and the unacceptable nature of the disruption. Availability Management deals more practically with what IT can effectively deal with as part of its routine operation, but principles, approach, and concerns are similar. Both processes depend on CapM input to judge the level of performance when the countermeasures are enacted.
Change Management (CM)
CapM assesses the impact of changes on existing capacity and identifies additional resource requirements based on the change in demand. Changes required for capacity management are implemented typically through planning and recommendations that result from capacity, availability and service continuity management. Daily operational changes may surface as job scheduling accommodates the more routine capacity changes. In all cases, any change to the IT service environments is channeled through change management as an RFC.
Configuration Management (CFM)
Changes made to IT resources, also known as configuration items (CIs), and service level objectives for these resources need to be reflected in the configuration management database (CMDB). Service level agreement (SLA) availability and capacity data from the CMDB allows more proactive measurement of performance based on SLA compliance. This data is an important input to capacity management. Associated demand and workload requirements, resulting performance, and resource metrics are recorded in the capacity management database (CDB). Effective coordination and correlation of related elements between these logical databases are required for timely information and on-going capacity recommendation and planning.
Problem Management (PM)
Problem management deals with determining the root cause of problems. A problem is defined as one or more incidents exhibiting similar symptoms. Capacity management interfaces to problem management to investigate known errors that have affected performance levels of an IT service. CapM also provides a specialist infrastructure role to identify and diagnose capacity or performance related problems. Capacity management provides ongoing feedback and recommended changes resulting from incidents traced to known errors effected by or causing degraded performance levels of the service.
Service Desk
Incident frequency and statistics with respect to service performance levels may be reported through problem management, CMDB record, or involvement of capacity management specialist to address identify known errors relating to performance and storage capacity. Ideally, IT resource performance is recorded and managed by service desk and maintained by configuration management in the CMDB for historical retrieval and analysis. Workload, performance, and demand management activities may reference CMDB records resulting from incident escalation, failover and recovery capacity issues, or other tracked incident reports and trends.
![[To top of Page]](../images/up.gif)
Capacity Management should:
- provide leadership to analyze requirements into measurable transactions for each business unit and its' delivered service
- provide regular updates of Demand, Workload, Performance and Resource Management and the Capacity Management Plan
- define measures for key business unit transactions and the required internal metrics to support those transactions
- provide business units with reports on their key transaction volumes
- provide historical data for both business and internal measures to support trend analysis and the forecasting of additional capacity needs
- provide technical support for the analysis of any measure which exceeds its' limits
- provide leadership for Service Improvement Plans (SIPs) which identify opportunities to balance business capacity needs with their associated costs
![[To top of Page]](../images/up.gif)
In most organizations, formal Capacity Management is largely guess work. While some basic trend analysis may be perform to determine straight-line projections of future bandwidth, storage and mainframe CPU usages, attempts at more meaningful analytic models often involve complicated data extractions and predictive modeling beyond the capacity of the organization. Simpler methods and solutions may need to be used initially to demonstrate an initial Capacity Management benefit or to get the program started. The emphasis is one reduced effort and less available expertise. Simplistic approaches usually require
the involvement of more staff with a diversity of knowledge and expertise with the resulting additional cost of staff time analysis and meetings.
Capacity Planning will usually originate at the Operational level as individual support units periodically assess the need for additional capacity. Most often, this exercise will be tied into the introduction of new workload associated with a new application's introduction or upgrade. Tactical Capacity Planning is often difficult until the organization has defined it's services (in a Service Catalogue, in SLA or in defined process repositories). Business Capacity analysis will often be undertaken annually following the release of a Business Plan but it's intent is more often to identify operational requirements, than strategic IT investment or service delivery needs.
In early implementations there is a reliance on "snapshot" forecasts and resource predictions. There is no formal ongoing Capacity Management program and data and statistics used for other components, particularly performance, double for Capacity Management purposes.
As business volumes increase 'peak usage' periods can place operational parameters beyond normal limits subjecting parts of the infrastructure to unplanned stresses. Planning for these "spikes" means building in the needed capacities at acceptable costs. There is a need for more complicated analysis including postulates involving ROI calculations the organization may not yet have considered.
Organizationally Capacity issues are considered within their respective infrastructure enclaves. There is unlikely to be a centrally controlled focal point for Capacity issues - though there may be consultations when capacity issues in one area affect those in another.
![[To top of Page]](../images/up.gif)
Capacity Management, as a separate and distinct set of processes in IT Service delivery, can benefit an organization by:
- Focusing awareness for proactive considerations of capacity matters.
- Distinguishing longer term, strategic activities from day-to-day operational tasks.
- Establishing resources with accountability for the Capacity issues.
- Grouping Capacity issues together in a manner which identifies the interdependencies amongst constituent elements and the trade-offs which facilitate optimal capacity within specified cost parameters.
These benefits can accrue to the organization at the strategic, the tactical or the operational level according to how the organization is viewed.
Organizational Views of Capacity Management
Capacity Management is all about managing the relationships between three inter-connected variables - resources, workload and service levels. Any one of these elements cannot be altered without affecting at least one of the other two elements. It is essentially the job of Capacity Management to take any pair of variables and derive the third:
- Given a requirement to support a given workload at a given service level, what are the required resources ?
- Given a set of resources and a workload level, what level of service can be provided ?
- Given a set of resources and a service level requirement what workloads can be supported ?
These considerations can be viewed from one of three distinct organizational vantage points:
- Business (Strategic): Strategic capacity management is done when decisions to expand or contract the infrastructure are made due to expected changes in demand by the business (addition of new applications, new infrastructure, etc.).
- Service (Tactical): Tactical capacity management is done when new services are added into the infrastructure (how network devices are configured for services, who can use the capacity, etc.).
- Resource (Operational): Operational capacity management is done by real-time monitoring and adjustments to capacity as needed.
These three levels are linearly related to each other - that is, business processes will direct services which, in turn, establish the boundaries for operational procedures. All three approaches should be covered in the Capacity Plan and the data elements should be rationalized within a Capacity database.
"The major difference between the sub-processes is in the data that is being monitored and collected, and the perspective from which it is analyzed. For example, the level of utilization of individual components in the Infrastructure is of interest in Resource Capacity Management, while the transaction throughput rates and response times are of interest in Service Capacity Management. For Business Capacity Management, the transaction throughput rates for the on-line service need to be translated into business volumes, for example, in terms of sales invoices raised or orders taken."
ITIL Service Delivery - Section 6.3
|
Business Capacity Planning
Why do IT folks have such a hard time reflecting required capacity in their budgets? Why is it so difficult for them to bridge the gap between business requirements and ensuing IT capacity needs?...
There is a failure to understand the relationship and the cascading effect from business requirements to capacity requirements.
In most companies, the budget is a carefully crafted political compromise. When established, it determines the capacity that is being used by the IT operations. The capacity then determines the services and the corresponding service levels.
The provided services then often fall short of the expectations and the needs of the business users. This is the world upside down! Capacity requirements should determine the budget, and not vice versa.
The Missing Link; Capacity Management and Business Requirements, ITSMWatch.com
August 16, 2004
|
The driving force for Capacity Management should be the business requirements of the organization. Capacity Management has a close relationship with the business strategy and planning processes within an organization. On a regular basis, the long-term strategy of an organization is encapsulated in an update of the business plans. These plans encapsulate the organization's understanding of the environment in which the business operates. Capacity Management needs to understand these motivational factors and combine them with information on the latest ideas, trends and technologies being developed by the suppliers of computing hardware and software. The emphasis is on identifying technological 'drivers' for business success.
The organization's business plans dictate the specific IT/IS strategy and business plans, the contents of which Capacity Management needs to be familiar with, and to which Capacity Management needs to have had an input. In the IT/IS specific business plans, particular technologies, hardware and software are identified, together with some indication of the timescale in which they are to be implemented.
The Capacity Management process must be responsive to changing requirements for processing Capacity. New services will be required to underpin the changing business. Existing services will require modification to provide extra functionality old services will become obsolete, freeing up spare Capacity.
As a result, the ability to satisfy the Customers' requirements will be affected. Capacity Management needs to predict these Changes and adjust for them.
These new requirements may come to the attention of Capacity Management from many different sources and for many different reasons. They may be generated by the business or may originate from the Capacity Management process itself.
Service Capacity Management
Capacity Management also needs to understand IT Services, their use of resource, working patterns, peaks and troughs, and to contribute to the service meeting its' SLA targets. The focus is on managing service performance, as determined by the targets contained in the SLAs or SLRs.
When the business requirements for a service will have come through the Business Capacity Management sub-process, and as the service has become operational, then the Service Capacity Management sub-process is responsible for ensuing that it meets the agreed service targets. The monitored service provides data that can identify trends from which normal service levels can be established. By regular monitoring and comparison with these levels, exception conditions can be defined, identified and reported upon. Therefore Capacity Management informs SLM of any service breaches or near misses.
There will be occasions when Incidents and Problems are referred to Capacity Management from other Service Management processes, or it is identified that a service could fail to meet its SLA targets. On some of these occasions the cause of the potential failure may not be resolved by Resource Capacity Management. For example, when the failure is analyzed it may be found that there is no lack of resource, or no individual component is over-utilized. However the design or programming of the application is inefficient, and so the service performance needs to be managed, as well as individual hardware or software resources.
The key to successful Service Capacity Management is to preempt difficulties, wherever possible. So this is another sub-process that has to be proactive and anticipatory rather than reactive. However, there are times when it has to react to specific performance Problems. From a knowledge and understanding of the performance requirements of each of the services being run, the effects of Changes in the use of services can be estimated, and actions taken to ensure that the required service performance can be achieved.
Resource Capacity Management
Operational CapM oversees the utilization of each of the component parts in the IT Infrastructure to ensures their optimal usage. All hardware components and many software components in the IT Infrastructure have a finite Capacity as measured by its' estimated MTBF, which, when approached or exceeded, can lead to performance problems.
This sub-process is concerned with resources such as processors, memory, disks, network bandwidth, network connections etc. so information on resource utilization needs to be collected on an iterative basis. Monitors should be installed on the individual hardware and software components, and then configured to collect the necessary data.
As in Tactical Capacity Management the key to successful Operational Capacity Management is to pre-empt difficulties, wherever possible. Therefore this sub-process has to be proactive and anticipatory. However, there are times when it has to react to specific problems that are caused by a lack of resource, or the inefficient use of resource.
From a knowledge and understanding of the use of resource by each of the services being run, the effects of Changes in the use of services can be estimated. Then hardware or software upgrades can be budgeted and planned. Alternatively, services can be balanced across the existing resource to make most effective use of the resource currently available.
![[To top of Page]](../images/up.gif)
Capacity Activities
There are a number of activities which are undertaken within one or more of these views of the organization, as illustrated below.

There are four main activities in Capacity Management which support the creation and update of the Capacity Plan:
- Iterative Activities: the monitoring, analysis and tuning of devices and services,
- Demand Management: influencing the demand for and usage of computing resources
- Modeling: predicting the behaviour of IT Services under a given volume and variety of work.
- Application Sizing: estimating the resource requirements to support a proposed application Change or new application.
In addition, a fifth activity involves the maintenance and usage of key capacity information. The Capacity database is a data repository used for this purpose.
![[To top of Page]](../images/up.gif)
Capacity Manager
- Owns the Capacity Management process
- Ensures process procedures are followed
- Represents interests of Capacity Management at meetings and functions
- Receives and acquires Senior Leadership support as needed
- Verifies and adjusts parameters used in the planning process
- Adds and reviews system life cycle performance design criteria and production acceptance tests
- Recommends resource re-allocations
- Ensures that the Capacity Plan is current and that it tracks business growth and trends
Capacity Management Team
- Produces Capacity Plan with a forecast far enough ahead to take account of changes in IT capacity that comply with business plans and IT plans
- Documents the need for HW, SW and resource upgrades or additional equipment, based on SLRs, cost constraints, reliability and availability
- Produce regular management reports including current usage of resources, expected trends and forecasts
- Responsible for Capacity Management documentation
Senior Leadership
- Approve resources to meet capacity requirements
- Review and recommend capacity improvement recommendations
- Review summary performance information describing capacity
- Act as the final point of escalation for capacity issues
Operations Management
- provide ongoing maintenance of CIs which affects their operation and institute measures to keep them operating at peak performance
- assist in describing the service chains associated with service provisioning
- participate in discussions of capacity
- meeting Operational Level Objectives and explaining variances from targets.
Line(s) of Business
- negotiate capacity targets and sign-off on targets in Operational and/or Service Level Agreements
- review capacity performance against target and participate in preparation and recommendation of any remedial actions
- participate in preparation and review of Capacity Plan
- provide information on capacity
![[To top of Page]](../images/up.gif)
Key Goal and Performance Indicators
The key metrics in capacity Management are:
- Response time
- Throughput
- Utilization
The capacity model has three principal categories of capacity use: productive, nonproductive and idle. Rated capacity is 100 per cent or 24 hours a day, seven days a week. If a particular resource is being measured, it is the full productive capacity without allowance for repair, setup and other downtime.
Rated capacity = productive capacity + non-productive capacity + idle capacity
|
- Productive capacity is the resource performing its function fully in support of the IT service.
- Non-productive capacity includes time not in one of the specific idle definitions. These activities are: installation, maintenance, stand-by or other "out of service" conditions.
- Idle capacity includes capacity that is unused, not required, or unavailable due to technical, contractual or business concerns.
-
Example Demand calculation
Total demand = number of concurrent users x single user demand unit
|
-
Example Workload Calculation
Workload A = total demand x workload per demand unit for application A
Workload B = total demand x workload per demand unit for application B
|
Example Resource Calculation
MB storage needed = (workload A x MB per workload A unit) + (workload B x MB per workload B unit)
CPU power needed = (workload A x CPU power per workload A unit) + (workload B x CPU power needed per workload B unit)
Network bandwidth needed = (workload A x network bandwidth per workload A Unit) + (workload B x network bandwidth per workload B Unit)
|
Measurement Issues
Repeatable, predictable results require a real-time knowledge of the resources available for the application. Without appropriate safeguards, the very automation that reduces the amount of human intervention required to improve the service can also result in over-utilization of key resources such as storage, CPU, and memory. Provisioning an organization with a given number of users with a specific service level requires a predictable amount of resources in the data center. The ability to monitor the available resources, model application, and associated service level objectives, and choose the most efficient set of resources based on these parameters is the role of resource management.
![[To top of Page]](../images/up.gif)
Capacity process Summary
| Controls
- Agreements
- Business strategies and plans
- Financial plans and budgets
- IT strategies, plans and investments
|
|
Inputs
- New technology suppliers
- Performance caused Incident/problems
- Service agreement achievements
- Change Calendar
- CMDB
- Work schedules
| Activities
- Capacity Plan maintenance
- Monitoring
- Analysis
- Tuning
- Implementing additional capacity
- CDB recording
- Demand Management
- Modelling
- Application Sizing
|
| Outputs
- Upgrade needs
- Verification of SLRs
- Budgeting cash flow
- RFCs
|
| Mechanisms
- Customr Relationship Management
- Project Management
- IT/Business Alignment
- Measurement
- Continuous Improvement
- Risk Management
- Change Management
- Problem Management
- Security Management
- Supplier Management
|
|
![[To top of Page]](../images/up.gif)
Inputs
New technology suppliers
These are requirement of various businesses for capacity. The requirements will be based upon the line of business's usage of enterprise applications, the network support for their application base and support of COTS software products. The requirements will be expressed in Service and Operational Level Agreements (SLAs, OLAs). When a service described in the service catalog is changed or a new service is devised, an assessment of the ongoing availability requirements for the service needs to be undertaken. This should form part of the definition of Service Level Requirements (SLRs). Changing business needs and consumer demand may require the levels of availability provided for an IT Service to be reviewed. Such reviews should form part of the regular service reviews with the business undertaken by SLM.
Incident and Problem Data
Incident and problem information as recorded in Online Services provide base information for the identification of trends in service and, through correlation analysis, assessments of troublesome products and devices (eg. high failure rates).
Service Level Achievements
Service Level achievements provide summary information on how established Availability targets are faring over time. Shortages will trigger consideration by Service Level Management of the need for remedial action. Successes will be noted for consideration as best practices and for emulation in other service areas.
Change Calendar
Capacity Management will reference the Change Calendar for change events which may have a detrimental impact on capacities. Capacity Management should review this schedule with an eye to types of changes which might carry risks associated with such things as run-away programs which might gobble up system and network resources thereby affecting the response times of large communities or target audiences. Capacity Management should ensure that these risks are mitigated to the greatest feasible extent and that contingencies are developed as part of the Change Plan to call upon capacity reserves or contingency actions in the event of trouble with the implementation(s).
Configuration and Monitoring Data
Configuration and anciliary monitoring data provide important information on the architecture of a system. They permit the identification of single points of failure and performance trouble spots which can potentially result in service outages. Monitoring data provides base information for trend analysis to identify problem areas for more intense inspection.
![[To top of Page]](../images/up.gif)
Controls
![[To top of Page]](../images/up.gif)
Mechanisms
Customer Relationship Management
Capacity Management negotiates with Application Support and lines of business the capacity requirements of corporate service support for the costs identified as charge-back for response metrics agreed-upon. Negotiations are re-enterred when capacity short-falls are identified and capacity improvement initiatives recommended.
Integrated Project Management
Many Capacity improvement activities are undertaken using structured Project Management methodologies. The maturity of the Project Management process will directly affect how well the Capacity Management process will works. Supporting activities for this mechanism include:
- project management framework: a general project management framework which defines the scope and boundaries of managing projects, as well as the project management methodology to be adopted and applied to each project undertaken is established and periodically reviewed. The methodology should cover, at a minimum, allocation of responsibilities, task breakdown, budgeting of time and resources, milestones, check points and approvals. Develop project plans which:
- system quality assurance plan: The implementation of a new or modified system include the preparation of a quality plan which is then integrated with affected project plans and formally reviewed and agreed to by all parties concerned,
- formal project risk management: a formal project risk management program for eliminating or minimizing risks associated with individual projects (i.e. identifying and controlling the areas or events that have the potential to cause unwanted change) is used.
IT-Business Alignment
This mechanism is used to ensure that IT directions and plans conform to business directions. Alignment is achievement through a strategic planning processes. These processes should include:
- Information Technology as part of the organization's long- and short-range plans: Developing and implementing long- and short-range plans that fulfill the organization's mission and goals. Information technology issues as well as opportunities are adequately are assessed and reflected in the organization's long- and short-range plans and conveyed consistently throughout the entire organization.
- Information Technology Long-Range Plan: Information technology long-range plans supporting the achievement of the organization's overall missions and goals are regularly updated. A structured long-range planning process is used.
- Information Technology Long Range Plan Changes: a structured approach regarding the long-range planning process is applied. resulting in a high-quality plan which covers the basic questions of what, who and how. Aspects which need to be taken into account and adequately addressed during the planning process are organizational changes, technological evolution, regulatory requirements, business process re-engineering, staffing, in- or out-sourcing, etc. The plan will refer to other plans such as the organization quality plan and the information risk management plan.
- Short Range Planning for the Information Services Function: the information technology long- range plan is regularly translated into information technology short range plans. these short-range plans ensure that appropriate information services function resources are allocated on a basis consistent with the information technology long-range plan. The short range plans are re-assessed periodically and amended as necessary in response to changing business and information technology conditions. The timely performance of feasibility studies should ensure that the execution of the short range plans is adequately initiated,
- Assessment of Existing Systems: Existing information systems are assessed in terms of degree of business automation, functionality, stability, complexity, costs, strengths and weaknesses, in order to determine the degree to which the existing systems support the organization's business requirements.
Measurement and Reporting
Define relevant performance indicators, the systematic and timely reporting of performance and prompt acting upon deviations to ensure the achievement of the performance objectives set for the IT processes. It includes:
- Collecting Monitoring Data: relevant capacity indicators (e.g., benchmarks) from both internal and external sources have been defined and data collected for the creation of management information reports and exception reports regarding these indicators,
- Assessing Performance: measure capacity and compare it with target levels. Perform assessments of capacity on a continuous basis,
- Assessing Customer Satisfaction: Measure customer satisfaction with response, workload, etc at regular intervals to identify shortfalls in service levels and establish improvement objectives,
- Management Reporting: Provided management reports for senior management's review of the organization's progress toward capacity targets. Upon review, appropriate management action is initiated.
Continuous Improvement
Quality management standards and systems are maintained by providing for distinct development phases, clear deliverables and explicit responsibilities. Capacity needs to be continually advanced through processes of continuous improvement so that organizational learning can be captured for the benefit of the enterprise.
Six Sigma is a useful measurement-centric approach and methodology for eliminating defects in any process. DMAIC (define, measure, analyze, improve, control) is an improvement sub-methodology within Six Sigma which can be applied to existing processes to sponsor continuous improvement. It requires the organization to establish availability performance baselines (which should be negotiated in SLA/OLA,UC), analyze and determine the root cause(s) of defects, and modify processes to reduce defects.
Risk Management
Threat-Risk Assessments of the relevant information risks to the achievement of the business objectives form a basis for determining how the risks should be managed to an acceptable level. These should provide for risk assessments at both the global level and system specific levels (for new projects as well as on a recurring basis) and should ensure regular updates of the risk assessment information with results of audits, inspections and identified incidents.
Change Management
Change Management procedures may be the subject of recommendations contained in Capacity Improvement initiatives. In addition, any changes to procedures, hardware and software identified as part of an internal availability or recovery review or any recommendations designed to improve overall availability must be subject to internal Change Management processes.
In addition, Change Management promotes enhancements to capacity by:
- Maintaining Change Schedule: An accessible record of all past and upcoming changes is maintained and the impact amongst changes occurring within an established timeframe considered for risks to overall availability ,
- Authorized Maintenance: Ongoing maintenance with established maintenance periods is approved by Change Management. This minimizes overall system disruption by restricting it to non essential or low transaction volume periods. Maintenance personnel have specific assignments and that their work is properly monitored with established deadlines for its' completion within the time period. In the event that problems occurring there are established procedures for handling extensions to the window or backing out the changes to restore the previous environment while system unavailability still has controllable impacts.
Problem Management
The Problem Management process may select to investigate infrastructure faults which impact User response times. Capacity Management may participate in Root Cause Analyses initiated and coordinated by Problem Management.
Integrated Supplier Management
Existing agreements and procedures are reviewed for their effectiveness and compliance with capacity goals procedures. This will include:
- identifying third-party providers' services and documenting technical and organizational interfaces with them,
- maintaining quality relationships with suppliers,
- defining specific procedures (based on required processing levels, security, monitoring and contingency requirements, and other stipulations as appropriate) to ensure that a fair Underpinning Contract is in existence with the supplier,
- ensure Service Continuity Management considers business risk related to the service providers in terms of legal uncertainties and the going concern concept,
- ensure that security agreements (e.g. non-disclosure agreements) are identified and explicitly stated and agreed to, and conform to universal business standards in accordance with legal and regulatory requirements, including liabilities,
- monitor the service delivery of external Provider support to ensure the adherence to the contract agreements
![[To top of Page]](../images/up.gif)
Outputs
HW / SW Upgrades
A key physical output of Capacity Management is the introduction of new, more powerful, stable hardware and software products into the infrastructure. The following types of upgrades specifically address capacity 'tuning' issues.
- Balancing workloads - Transactions may arrive at the host or server at a particular gateway, depending where the transaction was initiated. Balancing the ratio of initiation points to gateways could provide tuning benefits.
- Balancing disk traffic - Storing data on disk efficiently and strategically, for example, striping data across many spindles may reduce data contention.
- Definition of an accepted locking strategy - Specifies when locks are necessary and the appropriate level, for example, database, page, file, record, and row. Delaying the lock until an update is necessary may provide benefits.
- Efficient use of memory - May include looking to utilize more or less memory depending upon the circumstances. A process may utilize resources more efficiently if data is read into memory and manipulated there rather than a sequential read through files. Alternatively, many processes may be contending for memory resource. The excessive demands may lead to increased CPU utilization and delays while pages are swapped in and out of memory.
Capacity Database/Plan Updated
Annually the Capacity Plan (should be coordinated with Availability Plan) will be updated to accommodate new financial, political and technological realities.
Service Level Requirements (SLR) Updated
Capacity Management activities may result in an update to the service Level requirements of one or more business lines. These changes will be taken into account by Service Level Management (SLM) during negotiation or re-negotiation of SLAs and the constituent OLAs and UCs which support them.
Financial and Strategic Plans
Capacity requirements may provide recommendations which need to be accommodated in financial and/or strategic plans.
RfC Updated for Capacity Impacts
Review of the Change schedule and specific RfC may result in revisions to the RfC to accommodate the concerns of Capacity Management.
![[To top of Page]](../images/up.gif)
Process Activities

CapM1 - Business Capacity Management
A prime objective of the Business Capacity Management sub-process is to ensure that the future business requirements for IT Services are considered and understood, and that sufficient Capacity to support the services is planned and implemented in an appropriate timescale.
The Capacity Management process must be responsive to changing requirements for processing Capacity. New services will be required to underpin the changing business. Existing services will require modification to provide extra functionality Old services will become obsolete, freeing up spare Capacity.
"Organizations are finally starting to say that if they are going to spend money, they want to have a good ROI and TCO, not 50% utilization."
IT people are wrapped up in trying to get the upper-hand of their reactive behavior and attempting to get into a proactive mode. As a result, they are not following the logical decision-making path when thinking about capacity.
The way to break through this vicious cycle is to:
- get a timely understanding of the business requirements by having IT management involved with or at least thoroughly informed of the business decisions;
- break the business requirements down into the needed IT services by establishing a Service Catalog, have Service Level Management in place to correctly identify service levels, and have the IT architecture designed from a "services provided" point of view;
- make sure good Configuration Management is in place with the proper linkages of the Configuration Items, so that the increase in service levels (e.g., availability and performance) are matched by a corresponding increase in capacity;
- have the Financial Management process tie in to Configuration Management to show the immediate budget adjustments.
August 16, 2004, Jan Vromant, The Missing Link; Capacity Management and Business Requirements
|
As a result, the ability to satisfy the Customers' SLRs will be affected. It is the responsibility of Capacity Management to predict these Changes and cater for them.
These new requirements may come to the attention of Capacity Management from many different sources and for many different reasons. They may be generated by the business or may originate from the Capacity Management process itself. Such examples could be a recommendation to upgrade to take advantage of new technology, or the implementation of a tuning activity to resolve a performance Problem.
CapM1.1 - IT Strategic Planning
The IT Strategic Plan (or the organization's equivalent document) sets the foundation for the Capacity Planning process. The plan reflects an understanding of how the
Business utilizes I.T. Services to enable key business processes. With an understanding of the interrelationships among I.T. Services, Systems and business processes, CapM is in a good position to estimate capacity needs.
CapM translates these business requirements into capacity 'language' - that is services and component needs. This is done for each line of business and then aggregated in ways which identifies commonalities and areas where resources might be efficiently shared without presenting unacceptable risks to a business area from that sharing.
CapM1.2 - Review Resource Performance
CapM ensures the monitoring of the utilization of the underlying resources. All of the collected data is recorded, analyzed, and reported. As necessary, CapM ensures that the performance of the solutions meet the business requirements.
CapM1.3 - Review SLA Performance
The SLA should include details of the anticipated service throughputs and the performance requirements. Capacity Management provides SLM with targets that have the ability to be monitored and upon which the service design has been based. Confidence that the service design will meet the SLRs and provide the ability for future growth can be gained by using modelling.
CapM1.4 - Develop Capacity Plan
The production and update of a Capacity Plan should occur at pre-defined intervals. It is, essentially, an investment plan and should therefore be published annually, in line with the business or budget lifecycle, and completed before the start of negotiations on future budgets.
The Capacity Plan needs to show what capacity is needed in the future and at what cost. It should also predict what hardware upgrades or additional equipment would be needed to meet future service level objectives. It needs to also include information on sizing of any new systems proposed. It needs to reflect cost constraints and availability or reliability requirements.
The plan should discuss current utilization rates and service performance. It should be based upon business strategies and explicitly recognize business strategic and operational plans and forecasts in its' estimates of future requirements. Recommendations should include estimates of necessary resources, relevant impacts, associated costs and benefits, etc.
A Capacity Plan template
is included in the Appendix.
CapM1.5 - Review Capacity Plan
A quarterly re-issue of the updated plan may be necessary to take into account changes in business plans, to report on the accuracy of forecasts and to make or refine recommendations.
CapM1.6 - Review SLA and Service Catalogue
Capacity Management should assist SLM in understanding the Customers' Capacity requirements, for example in terms of required response times, expected throughput and pattern of usage, terminal population. Capacity Management should help in the negotiation process by providing possible solutions to a number of scenarios. For example, if the terminal population is less than 20 then response times can be guaranteed to be less than two seconds. If more than 20 Users connect then extra network bandwidth is needed to guarantee the required response time. Modelling or Application Sizing may be employed here.
![[To top of Page]](../images/up.gif)
CapM2 - Service Capacity Management
CapM2.1 - Review Service Catalogue and OLAs
The definitive descriptive source of services should be the organization's Service Catalogue and associated processes in each service area. Service Capacity Management will, with the assistance of Service Level Management (who have overall responsibility for the integrity of the Service Catalogue and Operational Level Agreements) review service chains and the respective Operational Level Objectives which drive overall performance targets. CapM reviews the associated metrics for each of the key service chain participants who contribute to overall service level objectives (SLOs).
The service catalogue should be reviewed for service capacity business impact analysis (BIA), a return on investment (ROI) analysis, capacity implications for IT service continuity planning, and as an initial baseline when for workload related issues and demand management feedback.
CapM2.2 - Demand Management
Long-term demand management may be required when it is difficult to cost-justify an expensive upgrade. For example, many applications have CPU usage higher for a few hours each day, typically for mid-morning and mid-afternoon. Within these periods, the processor may be over-loaded for only one or two hours. After normal business hours, the same system may have very low overall CPU utilization, so the resource is under-utilized. If it is possible to justify the cost of an upgrade if it provides additional resources for a few hours of the day. IT can sometimes influence the demand and spread the requirement for resources throughout the day, thereby avoiding the need for the upgrade.
The influence on the services that are running could be exercised by:
- Physical constraints - it may be possible to stop some services from running at certain times, or to limit the number of customers who can use a particular service, for example by limiting the number of concurrent users. The constraint could be implemented on a specific resource or component, for example by limiting the number of physical connections to a network router or switch.
- Financial constraints - If charging for IT services is occurring, reduced rates could be offered for running work at certain times of the day, that is the times when there is current less demand for the resource.
CapM2.3 - Workload Management
To produce a set of forecasts that indicate estimated resource usage for the planning period.
Identifying trends are difficult without a large volume of statistics, so the data must be collected over a good length of time. The types include online, batch, and network, to effectively translate both current and proposed customer demand into workloads. Classification of workload types is generally called the "workload catalogue." The next step is to analyze and understand the trends of each workload, discovering when peaks occur and why they happen. This investigation should encompass short, medium and long-term trends. Workload catalog, peak load analysis, and operating level requirements all contribute to the production of the forecast report(s).
![[To top of Page]](../images/up.gif)
CapM3 - Operational Capacity Management
CapM3.1 - Maintenance - Monitoring
ensure the optimum use of the hardware and software resources, that all agreed service levels can be achieved, and that business volumes are as expected.
Most monitoring tasks are near term in nature, and rely on underlying tools and principles for operations. The collected information must be recorded or sampled over a determined period. The amount of sampling and resources required to do so must be examined also. The capacity management database (CDB) should contain information points to identify historical trends and patterns.
Data needs to be gathered at total resource utilization level, but also at a more detail profile for the workload that each service places on each particular resource. This needs to be carried out across the whole infrastructure, host or server, the network, local server, application and client-side or workstation. Similarly, data needs to be collected for each service, for example, availability and a user screen response time.
Part of the monitoring activity is of a baseline or profiles of the normal operating levels. If thresholds beyond the norm are exceeded, alarms are raised and exception reports produced. These thresholds and baselines are determined from the analysis of previously recorded data, and can be set on:
- Individual components, for example, monitor that the utilization of a CPU does not exceed 80% for a sustained period of one hour
- Specific services, for example, monitor that the presentation time of a web page does not exceed 3 seconds or the transaction rate does not exceed 1000 transactions per minute.
- It is also important to remember monitoring takes up system capacity, thus can influence the performance of the system. Focus performance measurement and monitors on client service level agreements (SLAs). Operating level requirements and other necessary elements for monitoring often fall out of their overall contribution to meeting the SLA. Monitor at successive levels of control (for example, key IT layers: network, OS, hardware, application, and so on) to ascertain OLOs are met.
The operating system, applications management, associated hardware agent, and systems management tools may dictate which monitors are most readily available. Business rules can correlate element data to service levels in many cases. Many monitors are included as part of the operating system, or free as part of a hardware and software vendor solution, while others form part of a larger systems management tool set and need to be evaluated and purchased separately. It is important that the monitors can collect all the data required by the capacity management process, for a specific component or service.
CapM3.2 - Maintenance- Analysis
Identification of areas for capacity improvement
Data monitored and collected is analyzed for identification and adjustment of thresholds and alarms. In reactive organizations these will trigger exception reports and/or which then need to be analyzed and reported upon, and corrective action taken. Ideally, all thresholds should be set below the level at which the resource is over-utilized or below the targets in the OLA or layered OLO. This enables capacity management to take corrective action before the targets in the OLAs have been breached, or the resource has become over-utilized and there has been a period of poor performance.
In proactive organizations, the data collected from the monitoring should be analyzed to identify trends from which the normal utilization and service level, or baseline, can be established. By regular monitoring and comparison with this baseline, exception conditions in the utilization of individual components or service thresholds can be defined, and breaches or near misses in the OLAs can be reported. In addition, the data can be used to predict future resource usage.
Analysis of the data may identify issues of:
- Contention (data, file, memory, processor)
- Inappropriate distribution of workload across available resource
- Inappropriate locking strategy
- Inefficiencies in the application design
- Unexpected increase in transaction rate
- Inefficient use of memory
The use of each resource and service needs to be considered over the short, medium, and long-term, and the minimum, maximum and average utilization for these periods recorded. Over time, the trend in the use of the resource by the various IT services becomes apparent.
One key to determining whether a solution is operating at an acceptable level is latency, or the length of time a user has to wait for a response once a request for information is complete. Heavy workload on a server might create unacceptable wait times even though the server may be capable of handing every request. As a rule try to isolate components that have repeatable, high percentage contribution to performance levels and report them at varying workloads.
It is important to understand the utilization in each of these periods, so that changes in the use of any service can be related to predicted changes in the level of utilization of individual resources. The ability to identify the specific hardware or software resource on which a particular IT service depends, is improved greatly by an accurate, up-to-date and comprehensive CMDB. Any relevant detail performance information should be related or reside and maintained in the capacity database (CDB).
When the utilization of a particular resource is considered, it is important to understand both the total level of utilization and the utilization by individual services of the resource.
The analysis and tuning activities may also benefit from general observations and guidelines in the Guidelines for Effective Capacity Management and Designing Information Technology Solutions for scalabilitysections in this document.
CapM3.3 - Maintenance - Tuning
Better utilization of the system resource or improvement to the performance of the particular component.
The analysis of the monitored data may identify areas of the configuration that could be tuned to better utilize the system resource or improve the performance of the particular service.
Tuning techniques that are of assistance include:
- balancing workloads - transactions may arrive at the host or server at a particular gateway, depending where the transaction was initiated; balancing the ratio of initiation points to gateways can provide tuning benefits
- balancing disk traffic - storing data on disk efficiently and strategically, e.g. stripping data across many spindles may reduce data contention
- definition of an accepted locking strategy - specifies when locks are necessary and the appropriate level, e.g. database, page, file, record, and row - delaying the lock until an update is necessary may provide benefits
- efficient use of memory - may include looking to utilize more or less memory depending upon the circumstances.
Regarding the efficient use of memory, note that a process may utilize resources more efficiently if data is read into memory and manipulated there rather than a sequential read through files. Alternatively, many processes may be contending for memory resource. The excessive demands may lead to increased CPU utilization and delays while pages are swapped in and out of memory.
Before implementing any of the recommendations arising from the tuning techniques, it may be appropriate to consider using one of the on-going, or ad hoc activities to test the validity of the recommendation. For example, 'Can Demand Management be used to avoid the need to carry out any tuning?' or 'Can the proposed Change be modelled to show its effectiveness before it is implemented'.
CapM3.4 - Component Upgrade
to introduce to the live operation services any Changes that have been identified by the monitoring, analysis and tuning activities.
The implementation of any Changes arising from these activities must be undertaken through a strict, formal Change Management process. The impact of system tuning changes can have major implications on the Customers of the service. The impact and risk associated with these types of changes are likely to be greater than that of other different type of changes. Implementing the tuning Changes under formal Change Management procedures results in:
- less adverse impact on the Users of the service
- increased User productivity
- increased productivity of IT personnel
- a reduction in the number of Changes that need to be backed-out, and the ability to do so more easily
- greater management and control of business critical application services.
It is important that further monitoring takes place, so that the effects of the Change can be assessed. It may be necessary to make further Changes or to regress some of the original Changes.
CapM3.5 - Resource Demand Management
Influence the demand for computing resource and the use of that resource.
CapM identifies and quantifies resource usage. A good Capacity forecast will anticipate and plan for peak demand. Modeling assists in peak load analysis to create workload resource usage forecasts. Models of baseline versus actual data allow the informed reduction of the data to provide valuable information fed into the CDB for important outputs from capacity management.
To influence demand, the IT Provider may find it useful to implement charge back for IT services, so that different rates can be assessed to control demand and distribute resources more optimally. This activity can be carried out as a short-term requirement because there is insufficient current capacity to support the work being run, or, as a deliberate policy of IT management, to limit the required capacity in the long term.
Short-term demand management may occur when there has been a partial failure of a critical resource in the IT infrastructure. For example, if there has been a failure of part of the memory on a processor, it may not be possible to run the full range of services. However a limited subset of the services could be run.
CapM3.6 - Application Sizing
Ensure new applications are configured in context with their expected growth and in the context of total infrastructure capacity needs
Application sizing exercises are initiated at the Project Initiation stage for ew applications or when there is a major Change of an existing application, and are completed when the applications are accepted into the operational environment.
During the initial systems analysis and design the required service levels must be specified. This enables the application development to employ the pertinent technologies and products, in order to achieve a design that meets the desired levels of service. It is much easier and less expensive to achieve the required service levels if the application design considers the required service levels at the very beginning of the application lifecycle, rather than at some later stage.
Other considerations in application sizing are the resilience aspects that it may be necessary to build into the design of the new application. Capacity Management is able to provide advice and guidance to the Availability Management process about the resources required to provide the required level of resilience.
The sizing of the application should be refined as the development process progresses. The use of modelling can be used within the application sizing process.
The SLRs of the planned application developments should not be considered in isolation. The resources to be utilized by the application are likely to be shared with other services and potential threats to existing SLA targets must be recognized and managed.
When purchasing software packages from external suppliers it is just as important to understand the resource requirements needed to support the application. Often it can be difficult to obtain this information from the suppliers, and it may vary, depending on throughput. Therefore, it is beneficial to identify similar Customers of the product and to gain an understanding of the resource implications from them. It may be pertinent to benchmark trial the product prior to purchase.
CapM3.7 - Maintaining Capacity Management Database
Easy availability of capacity information
The importance of the CDB is shown by its being the central repository for capacity and performance related information. The database collects much of the data input listed earlier. It collects information about workload, and performance, for example, of how heavily a customer relationship management database is being used, and data allowing trends for future forecasted growth of storage requirements. It provides information for producing reports, the capacity plan, monitoring performance, managing resources, and demand.
In the introduction of capacity management inputs, outputs and sub processes to capacity management were illustrated by diagram above. Most all inputs have associated detail that must be captured within the capacity management database in order to process and produce the required outputs.
Information retained in a CDB might include:
- Financial data – costs
- Hardware data
- Development data
- Service data - problem/change
- Contingency data - hardware, and so on
- Technical data - availability data, and so on
- Business data - future direction and strategy
- Other detail input for the CDB includes:
- Capacity plan - current in place; DTP software
- Model generators - parameters for sizing and modeling
- Sizing software results
- Modeling software results
- Resource utilization monitors - threshold exceptions
- Service level management - threshold exceptions
- Other performance/surveillance software
- Workload performance monitors
- Charging software
- Business forecast detail
- Cost planning software
- Outputs that should be enabled by CDB information are: Service level management guidelines
- Reports - resource utilization exceptions, SLM exceptions, and so on
- Forecasts
- Capacity plan
- Other recommendations for change - tuning
Capacity management issues can dramatically affect the business if they cause unplanned downtime of a vital business function. This requires considerations for capacity and availability management to be intertwined and solution designs consistent. Service continuity management is weighing risk versus cost for scenarios outside the normal availability design. Its contingency planning relies on capacity forecast and recommendations to move forward in documenting a chosen contingency measure. It follows, that the clear and distinct requirements of each process have correlated capacity and performance data identified and properly recorded in the CDB. It is important capacity detail data in the CDB relates to OLA and associated OLA and/or SLA information is tracked in the configuration management database (CMDB). Because there is an implied dependency of availability management information on the proper integration of performance and capacity measurement data, capacity and availability staff often shares common monitoring tools and management solutions.
![[To top of Page]](../images/up.gif)
Terms
Term | Definition
|
Availability | Ability of a component or service to perform its required function at a stated instant or over a stated period of time. It is usually expressed as the availability ratio, i.e. the proportion of time that the service is actually available for use by the Customers within the agreed service hours.
|
Category, Type and Item (CTI) | Method for Classification of a group
|
Capacity Planning and Management | The process by which measurements of current resource utilization are combined with projections of future resource requirements to allow management decisions to be made as to what computer and data communications resources will be required in the future, and how best to allocate existing resources so that they are used in the most efficient and effective manner.
|
Capacity Database | A data repository containing the capacity characteristics of the infrastructure. A Capacity database may included the Performance database.
|
Configuration Item (CI) | Component of an infrastructure - or an item, such as a Request for Change, associated with an infrastructure - that is (or is to be) under the control of Configuration Management.CIs may vary widely in complexity, size and type, from an entire system (including all hardware, software and documentation) to a single module or a minor hardware component.
|
Configuration Management
Database (CMDB) | A database that contains all relevant details of each CI and details of the important relationships between CIs.
|
Core Business Process | A process that relies on the unique knowledge and skills of the owner and that contributes to the owner’s competitive advantage.
|
Critical Success Factor (CSF) | Critical Success Factors - the most important issues or actions for management to achieve control over and within its' IT processes.
|
Customer | Payer of a service; usually the Customer management has responsibility for the cost of the service, either directly through charging or indirectly in terms of demonstrable business need.
|
Demand Management | Influencing the use of IT capacity, perhaps by incentive or penalty, in circumstances where unmanaged demand is likely to exceed the ability to deliver. Demand Management is achieved by assigning resources according to priorities.
|
Environment | A collection of hardware, software, network and procedures that work together to provide a discrete type of computer service. There may be one or more environments on a physical platform e.g. test, production. An environment has unique features and characteristics that dictate how they are administered in similar, yet diverse, manners.
|
Performance Database | A data repository with historical information on the performance of infrastructure components and their expected MTBF
|
Process | A connected series of actions, activities, Changes etc. performed by agents with the intent of satisfying a purpose or achieving a goal.
|
Process Control | The process of planning and regulating, with the objective of performing a process in an effective and efficient way.
|
Release | A collection of new and/or changed CIs which are tested and introduced into the live environment together.
|
Request for Change (RFC) | Form, or screen, used to record details of a request for a Change to any CI within an infrastructure or to procedures and items associated with the infrastructure.
|
Role | A set of responsibilities, activities and authorisations.
|
Service Level Agreement | A written agreement between a service provider and Customer(s) that documents agreed services and the levels at which they are provided at various costs.
|
Service Level Management | Disciplined, proactive methodology and procedures used to ensure that adequate levels of service are delivered to supported IT users in accordance with business priorities and at acceptable costs.
|
System | An integrated composite that consists of one or more of the processes, hardware, software, facilities and people, that provides a capability to satisfy a stated need or objective.
|
Utilization Statistics | Measures that record the amount of computer and data communications resources used to provide data processing support.
|
![[To top of Page]](../images/up.gif)
COBIT Performance and Capacity Maturity Variations
0 Non-existent | Management has not recognized that key
business processes may require high levels of
performance from IT or that the overall business need for
IT services may exceed capacity. There is no capacity
planning process in place.
|
1 (Initial/Ad Hoc) | Performance and capacity management
is reactive and sporadic. Users often have to devise
work-arounds for performance and capacity constraints.
There is very little appreciation of the IT service needs
by the owners of the business processes. IT management
is aware of the need for performance and capacity
management, but the action taken is usually reactive or
incomplete. The planning process is informal.
|
2 (Repeatable but Intuitive) | Business management is
aware of the impact of not managing performance and
capacity. For critical areas, performance needs are
generally catered for, based on assessment of individual
systems and the knowledge of support and project teams.
Some individual tools may be used to diagnose
performance and capacity problems, but the consistency
of results is dependent on the expertise of key
individuals. There is no overall assessment of the IT
infrastructure’s performance capability or consideration
of peak and worst-case loading situations. Availability
problems are likely to occur in an unexpected and
random fashion and take considerable time to diagnose
and correct.
|
3 (Defined Process) | Performance and capacity
requirements are defined as steps to be addressed at all
stages of the systems acquisition and deployment
methodology. There are defined service level
requirements and metrics that can be used to measure
operational performance. It is possible to model and
forecast future performance requirements. Reports can
be produced giving performance statistics. Problems are
still likely to occur and be time consuming to correct. Despite published service levels, end users will occasionally feel sceptical about the service capability.
|
4 (Managed and Measurable) | Processes and tools are
available to measure system usage and compare it to
defined service levels. Up-to-date information is
available, giving standardized performance statistics and
alerting incidents such as insufficient capacity or
throughput. Incidents caused by capacity and
performance failures are dealt with according to defined
and standardized procedures. Automated tools are used
to monitor specific resources such as disk storage,
network servers and network gateways. There is some
attempt to report performance statistics in business
process terms, so that end users can understand IT
service levels. Users feel generally satisfied with current
service capability and are demanding new and improved
availability levels.
|
5 Optimized | The performance and capacity plans are
fully synchronized with the business forecasts and the
operational plans and objectives. The IT infrastructure is
subject to regular reviews to ensure that optimum
capacity is achieved at the lowest possible cost.
Advances in technology are closely monitored to take
advantage of improved product performance. The
metrics for measuring IT performance have been finetuned
to focus on key areas and are translated into KGIs,
KPIs and CFSs for all critical business processes. Tools
for monitoring critical IT resources have been
standardized, wherever possible, across platforms and
linked to a single organization-wide incident
management system. Monitoring tools increasingly can
detect and automatically correct performance problems,
e.g., allocating increased storage space or re-routing
network traffic. Trends are detected showing imminent
performance problems caused by increased business
volumes, enabling planning and avoidance of unexpected
incidents. Users expect 24x7x365 availability.
|
![[To top of Page]](../images/up.gif)
Process Implementation RecommendationsR
A clear set of objectives
A suggestion for an overall objective statement was given earlier: 'The continuing provision of consistent, acceptable service levels at a known and controlled cost.' This objective will need to be extended to incorporate the scope of the Capacity Management team activities, in particular, the range of systems and business processes that they will be required to cover.
Senior management commitment
This is an obvious, but vital step on the road. Without senior management commitment, the required personnel and tool resources will not be provided, and the organizational changes (particularly in terms of business information flow) will not be made.
Process/flow definition
This is a definition of the way in which the Capacity Management team will interface with the rest of the organization. It must include a definition of:
- All the in-bound information flows that will be required
- The organizational impact and any required changes
- The ways in which the work done by the team will be directed by the organization's needs
- The ways in which the team's findings will be disseminated and used.
A realistic plan
The key word here is 'realistic'. While it would be nice to have a completely comprehensive, fully integrated Capacity Management function from day one, it just is not going to happen. The required organizational changes may well take a considerable amount of time to define and implement.
Recruit or retrain the right people
Despite the fact that mathematicians are amongst the most important and worthwhile people on the planet, capacity managers do not have to be mathematicians. Although a certain degree of numeric ability will be required, an ability to communicate with the business is equally important. Such abilities will be required not only to derive the required information from the organization, but also to communicate and present findings and recommendations effectively.
An effective capacity management team member really needs to have a foot on both the business and the technical community. Remember that Capacity Management is a business discipline with technical implications - not the other way round!
Acquire the right toolset
The precise toolset will depend on individual organization's circumstances, but a minimum starter set will include:
- Performance and resource consumption monitors for all system components which contribute to the delivered service
- A mechanism for storing that data in a central repository
- A mechanism for importing business data, and any other data that would provide further understanding of the relationship between the business workload (measured in Natural Forecast Units) and IS workloads
- A facility to generate regular reports automatically, across the complete range of business and system data in the repository, and to distribute those reports in an appropriate manner to the target audience (typically using html and the corporate intranet)
- A mechanism to produce ad-hoc reports and to 'drill down' into the data for problem diagnosis
- A trending facility for use on those aspects of the data for which linear extrapolation is a valid technique
- A performance prediction facility that will allow the non-linear behaviour of systems experiencing resource contention to be taken into account. This will usually be based on some form of analytical modelling.
Walk before you start to run
Avoid the temptation to try and cover too many target systems or applications right from the outset. It will take some time for the organization to adapt to the additional disciplines required for Capacity Management and to start to benefit from its provision.
Iterative evolution
As I mentioned earlier, Capacity Management is and evolving process. Activities and achievements must continually be monitored and checked against the objectives and the plan. This is an ongoing iterative process of refinement and improvement.
![[To top of Page]](../images/up.gif)
Toolset Considerations
Monitoring Agents
There are many, many systems on the market today that claim to provide some level of "service level management." Most of these systems confine themselves to monitoring network infrastructure equipment. While this is a very valuable capability, it doesn't go far enough to be considered to be true SLM. It also doesn't lend itself to meaningful customer service level reporting.
Some types of problem tracking or call management systems market themselves as having service level management functionality. In some ways, this is correct: many businesses use standard help desk tools to report on network availability, overall system availability, and customer service levels based on the types of trouble tickets received. This kind of "service level management" is, by nature, purely reactive, relying as it does on after-the-fact information capture of service failures. These tools also provide no mechanism for tying performance to business goals.
Still other types of systems will be a little closer to a mature implementation of SLM by using test transactions to simulate customer/end user activity. Essentially, these systems send a 'synthetic transaction" through standard customer system, timing them and monitoring for difficulties. Obviously, these systems confine themselves to network and application service level management, although they can make some measure of business success by examining employee productivity (or lack thereof, due to system problems) or customer system experience.
Web monitoring systems are becoming more prevalent. These systems examine the service experienced by web site users, and may also use test transactions to measure web site access events. There are also service level management systems that focus on specific technologies, such as VOIP, or on particular industries (telecom, ISPs).
Management support tools
Effective capacity management requires the use of mature software to monitor and control the service solution and underlying platform. It is difficult to justify building this type of software internal to IT since there are software packages which leverage many years of vendor platform monitor and control elements. We call these IT management support tools. IT management support tools can be grouped by functionality based on three categories:
- Service management tools
- Application management tools
- Infrastructure management tools
Enterprise management frameworks typically focus on infrastructure management, however they often include comprehensive product suites that integrate tools and information across these categories and may also integrate with other vendor products or feed information into products that also provide application and service management For more information see Service Monitoring and Control in the MOF Operations Guide.
Capacity and availability management requirements for information and support tools are similar. Typically application management tools directly address these needs, but the processes also leverage information supplied by service management and infrastructure management support tool categories.
Service Management Tools
One of the main objectives of service management processes is the administration of information to manage the quality of IT services. The information is used to monitor and report information on services and customers. The service management processes ensure that services are defined and changed when necessary and provide support during exploitation of services. These are the IT management processes in which contact between the IT service provider and the customer is involved (incident management, change management, service level management). Incident, problem, and change management also define the workflow within the IT service management department. Service management tools are usually rated by their ability to support the process requirements for: incident management, problem management, change management, configuration management, change management, software control, service level management, and cost management.
Application Management Tools
Application management focuses on the application from the end-users perspective. It ensures that critical applications and data are always highly available and perform optimally. A representative of the end-users and the IT service provider agree on the service levels to be realized. These service levels are being monitored and reported by the IT service provider. Application management processes provide end-to-end information on the agreed and actual availability and performance of the application and the infrastructure resources used by this application. Application management tools are typically rated based on their ability to satisfy the needs of availability management, capacity management, and service level management.
Infrastructure management tools
Infrastructure management consists of systems, network, database and desktop management and software distribution. It is defined as management of the components or elements of the IT infrastructure, for example, servers, routers, hubs, databases, PC's and terminals. The infrastructure management tools typically fall into these categories: system management, network management, desktop management, software distribution, and database management.
![[To top of Page]](../images/up.gif)
Capacity Plan Template
Introduction
Any background information should be introduced in this section. This could include the organization's current levels of capacity, problems being experienced or anticipated due to over or under capacity, the degree to which service levels are being achieved, and what has changed since the last update and issue of the plan.
Scope of the plan
Ideally, all IT services and resources need to be outlined in the plan. This section needs to be specific in detail and outline what and how the elements of the IT infrastructure are to be addressed.
Methods used
The capacity plan uses information gathered by the sub-processes. This section therefore should contain details of how and when this information was obtained, for example business forecasts obtained from business plans, workload forecasts obtained from users, service level forecasts obtained by the use of modeling tools.
Management summary
The capacity plan necessarily contains technical detail that may not be of interest to all readers of the plan. The management summary needs to highlight the main issues, options, recommendations, and costs. It may be helpful to produce a separate executive summary document that contains the main points from each of the sections of the more detailed plan.
Strategic Business Scenarios
It is necessary to put the plan into the context of the current and future business environment. For example, a new CRM solution may currently be utilizing 60% of current processor and memory capacity for its back-end database. Capacity management is involved in monitoring the current system and is able to forecast the recommended additional CPU, memory and disk capacity to accommodate growth for the year. It is important to explicitly mention all known business forecasts so that readers can determine what is inside and outside the scope of the plan.
Tactical summary
A service profile should be provided for each service delivered. This should include resource utilization for a given transaction response time or throughput rate. For example, usage levels for processor, memory, storage, and network, with short, medium, and long-term trends presented in this section.
Forecasted service levels
The business plans should provide the capacity manager with details of the new services planned, and the growth or contraction of existing services. This section should report on new services and the demise of legacy systems.
Operational summary
This section concentrates on the resulting resource usage by the services. It reports again, on the short, medium and long-term trends in resource usage, broken down by hardware platform. This information needs to be gathered and analyzed by the sub processes of management of resources and service performance and so should be readily available.
Resource forecasts
This section forecasts the likely resource usage resulting from the service forecasts. Each business scenario mentioned above should be addressed here. For example, a new Internet storefront project plan may have a corresponding forecast of specific network bandwidth requirements in anticipation of transaction levels and response time for an the secured debit transaction.
Options for service improvement
Building on the results of the previous section, this section outlines the possible options for improving the effectiveness and efficiency of service delivery. It could contain options for merging different services on a single processor, upgrading the network to take advantage of technological advances, tuning the use of resource or service performance, rewriting legacy systems, purchasing new hardware or software and so on.
Cost model
The costs associated with these options should be documented here. In addition, the current and forecast cost of providing IT services needs to be included. In practice, the capacity manager obtains much of this information from the financial management process.
Recommendations
The final section of the plan should contain a summary of the recommendations made in the previous plan and their status, for example rejected, planned, or implemented. Any new recommendations should be made here, for example, which of the options mentioned in the plan is preferred.
The recommendations should quantify business benefits to be expected, potential impact of carrying out the recommendations, risks involved, resources required, and both startup and ongoing costs.
Typical reports or capacity recommendations need to address the following areas:
- The number of users supported by the current hardware
- scalability options if the number of users increases
- scalability options if the solution complexity increases
- Recommended changes in monitoring, analysis, or tuning
- Identify potential bottlenecks
- Performance guidelines for design and development
- Prediction of future service performance
![[To top of Page]](../images/up.gif)