ITIL and ISO 20000 problem management
Remember this situation? You’re running Windows. The blue screen forces you to reboot your PC. And then it happens again.
Incident Management = reboot your PC. And your service (usage of PC) is available again.
Problem Management = figuring out what actually happened (why does that PC get the blue screen), and how to prevent such incidents from occurring.
Incidents are straightforward – you need to provide your users with the agreed functionality as soon as possible (to use their PC and be able to send/receive e-mails, write documents, do calculations, etc.). And – that’s it. Complications take place when we start to deal with problems. Problem resolution methodologies are well known:
- Brainstorming
- Chronological analysis
- Pain value analysis
- Kepner and Tregoe
- 5-whys
- Fault isolation
- Affinity mapping
- Hypothesis testing
- Technical observation point
- Ishikawa diagrams
- Pareto analysis.
Any of the chosen methodologies require an organization to get the problems resolved. The more critical issue is when Problem Management organization has to be set.
What makes it complex?
Before declaring something to be complex, let me elaborate. Getting to the problem resolution is not a one-dimensional process. Well, if you have just a few (but, really few) services – it could be. But companies strive to extend their offerings and, there you go – many different parties, know-how, technologies-¦ etc. are the common ground that lead things to become complicated.
Problem Management has to the find root cause of one or more incidents. Root cause means to figure out what caused an incident. Take a telecom as an example. Many, many customers using a bunch of services. In the background, there are many departments inside the telecom organization that are supporting those services. Problem Management organization has the following challenges:
- Skills – Problem Management requires expertise. And experts in one particular area are not common. In fact, they are rare.
- Diversity – Knowledge is spread across the company.
- External parties – I experienced that companies are very often using external parties to gain know-how (in form of the service or the product). That know-how is needed during the problem resolution procedure.
- Availability – Problems, in certain areas of technology, usually don’t occur all the time. Meaning, there are no permanently dedicated resources. Take, for example, database experts. They are expensive resources involved in projects as much as possible (organizations tend to use their scarce resources as much as possible). When there is a problem to solve, it could be tough to get such experts out of their projects.
Well, so far about challenges. Let’s see what could be done to overcome them.
The approach to Problem Management organization
First of all, there is no recipe for how to organize a Problem Management support organization. Every situation and organization is unique and requires its own approach. From my experience, there are several possibilities; let me describe the two most common:
Stand-alone organization – this would be the situation when you have permanent Problem Management organization with resources permanently dedicated to problem resolution. This is useful when you don’t have diversity of services and you have many problems to solve within the same area of expertise.
Shared resources – this is the usual situation, applicable when there are many different services, and expertise and know-how are spread across the organization. In real life this means that Problem Management uses resources when they need them. It has to be clear where those resources are. One of my projects had exactly that situation. It was a small telecom with many services. They could not afford to have a dedicated group for Problem Management. Instead, they had a Problem Manager and experts across the organization. Depending on the topic, they used resources in various organizational units. In this way, their Problem Management did not accumulate underused resources, and they used their best experts when they were needed.
Roles and responsibilities are another important parameter when organizing for Problem Management. According to the ITIL Service Operation book, there are two important roles in Problem Management organization: Problem Manager and Problem Analyst.
Problem Manager is responsible to organize processes, resources and tools that are needed to resolve the problems. This is much easier when there is a stand-alone organization. When shared resources are used, the Problem Manager is responsible to organize needed resources and coordinate their activities. This could get even more complicated when, e.g., external resources are used. Operational Level Agreement (OLA, for internal resources) and Underpinning Contracts (UC, for external resources) are within the Problem Manager’s responsibility. Read more about OLA and UC here: SLAs, OLAs and UCs in ITIL and ISO 20000.
Problem Analyst – this is the person who actively and productively contributes to problem resolution. Independent of the organizational setup, the most important characteristic of a Problem Analyst is – expertise. If the problem requires different kinds of expertise, then several analysts will be involved (as a team).
Hard, or fun game?
Problem resolution targets are defined in the Service Level Agreement (SLA), or at least it should be, though I rarely found an explicit definition of Problem Management in the scope of a SLA. But within the SLA, there should be a time determinant that makes problem resolution a serious “game.” Depending on the situation, there are many challenges when organizing for Problem Management. Given customers and respective SLAs as drivers, and our own know-how and experience as instruments, problem resolution should not be hard – but rather a fun game.
Download a free sample of the Problem Management process to gain a deeper understanding of how to set up the organization and the process itself.
Author: Branimir Valentic, IT governance and ITIL and ISO 20000 expert.