Root Cause Analysis Tips for Plant Managers
As processes become more involved and systems become more complex, it is no longer enough to simply solve problems as they come up. Reactive allocation of resources for problem solving and maintenance often misses the underlying cause and leaves you chasing down small leaks as the entire dam threatens to burst.
Instead of chasing down small problems and small solutions, root cause analysis (RCA) pushes you to identify the underlying causes of a problem so you can approach it with solutions that resolve both the immediate problem and its source. Such solutions work to prevent future incidents and any other potential problems related to that weak link while opening up your resources for more intelligent allocation. RCA often finds multiple related problems and solutions, providing a better picture of your overall process.
Throughout the rest of this article, we will discuss the stages, methodologies, and benefits of RCA and how this can help you in the work you are doing every day.
Root Cause Analysis Steps
In order to properly perform an RCA, you must identify your problem in the context of your entire system. While there are various specific methodologies, the basic workflow can be broken down into the following four steps.
Identify the Issue
Your problem may start with something simple, like a leaky valve or a clogged pipe, but this is not the end of the story. You need to trace the systemic relationships that contribute to that problem and trace those causes upstream through your system. You should also determine the timeframe of the problem and if any similar issues have occurred before. Some methods and heuristics for this step can be found below.
Gathering data and identifying your problem will go hand-in-hand. In this stage, you can begin to contextualize your problem and understand how the root cause may be cascading through your system, affecting your entire process.
Accessible numerical data and process blueprints are important in this stage. Sometimes, though, the causes and effects can be hidden or difficult to correlate; make sure to loop in the members of your team who are most familiar with the sub-systems in question.
Analyze and Categorize Causes
With the problem and the data in hand, begin assessing possible causes. It is possible that multiple upstream factors can lead to a single problem, or that a single problem is leading to multiple downstream problems — follow the thread as far upstream as possible to ensure that you do, in fact, reach the root cause.
In general, these causes can be categorized across three major groups: concrete, individual, and systemic. Watch out for combinations and overlaps, as things may not always be as simple as they appear. Let’s think about this using an example of a late train arrival.
Concrete causes tend to tie back to a physical object which is somehow faulty, broken, or otherwise performing improperly. Maybe there is a faulty signal along the route that causes a train to wait in a station for an extra five minutes even though the road ahead is clear.
Individual causes can also be referred to as human or operator error. At some point, an individual or a team didn’t follow a procedure or otherwise did not respond to the situation properly. Maybe the driver mixed up his speed zones and went too slow along a high-speed segment, causing an overall loss of speed.
Systemic causes point to flaws in the organizational plan or systemic alignments. This requires either correction to the SOPs or an adjustment of the systemic connections. Maybe additional trains were scheduled along the route to fulfill a delivery and slowdowns were required to maintain proper spacing. Nothing could fix this, and any further individual or concrete mistake only builds on top of the systemic cause.
Solve the Problem
Once the RCA has been carried out through step three, you must identify solutions to resolve the problem and prevent future incidents. These solutions are highly specific to the situation and are meant to resolve both the direct problem and the underlying cause. This stage relies on the cause-and-effect relationships explored above and leverages them to ensure that you don’t have to solve the same underlying problem twice.
The specific tactics for RCA vary widely across industries, problem types, and available data, but the following list represents a good overview of the current paradigm.
The Five Whys
More of a heuristic than an actual strategy, the “five whys” method starts with the basic problem and simply asks “Why did this happen?” Every time you get an answer, you can use that answer as the basis of the next question until you have, ideally, worked far enough upstream to resolve the question.
Five is just a rough estimate of the number of necessary layers — play the “why is the sky blue” game with a child and you’ll likely need more. This method is a great starting point for your investigation, but is unlikely to be rigorous enough for a final report.
The Fishbone Diagram
Also known as an Ishikawa diagram, the Fishbone diagram is an excellent brainstorming format to use for the identification of major categories of possible causes within your system. The branching, diagonalized diagram with a centralized flow toward the stated problem resembles a skeletal fish, hence the name. This is a good way to get your team to discuss major issues.
Failure Mode and Effects Analysis (FMEA)
FMEA is a more holistic approach to the examination of a process, design, or system. This is common in overall evaluations of a workflow and tries to identify the potential failure points in that system, the downstream impacts of those failures, and their relative likelihoods. This is more important in the planning and evaluation stages of a system than a direct problem solving application and provides an excellent reference for future problem solving.
Fault Tree Analysis (FTA)
Based on boolean and/or/not flowchart relationships, FTA makes use of highly structured logical flow charts to determine the paths that lead to a possible failure event. FTA recognizes that a single upstream failure might not be immediately detected without other conditions and can be used to design fault-resistant systems and to identify the intersections of events that may lead to a downstream failure.
Pareto charts help to identify the overall performance of a system and its relationship to various causes of failure or inefficiencies. They are motivated by the Pareto principle — also known as the 80/20 rule — which states that 80% of the consequences in a system come from about 20% of the causes. By grouping failures and comparing their frequencies in a Pareto chart, you can use this heuristic to roughly identify grouped cause-and-effect relationships and target your maintenance at the most important causes.
“Is and Is Not” Analysis
“Is and is not” analysis is a method by which you can narrow the possible causes to those which are directly related to your problem. It forces you to limit the scope of your evaluation and allows you to determine problem boundaries very effectively. You can begin by answering who, what, where, when, and how the problem is and is not to begin zeroing in on the solution without assigning false causality.
Benefits of Root Cause Analysis (RCA)
The basic appeal of root cause analysis is that it will allow you to save time and money by avoiding repetitive or fruitless repair and maintenance jobs. Instead, it allows you to focus on resolving fundamental conflicts in your overall workflow. Finding the right solution will not only allow you to resolve the problem at hand, but also prevent future problems so you can focus on optimizations.
RCA encourages you to eliminate wasteful processes and technology to replace them with more reliable, cutting-edge resources. Ecorobotics offers solutions for many of the identifiable causes found from RCA. For instance, if you find that human error in industrial tank cleaning procedures is reducing your facility’s efficiency, you can leverage robotic tank cleaning to enhance your asset cleaning process. Or, if a Pareto chart shows that 80% of your facility downtime stems from cleaning delays, Ecorobotics’ higher-speed cleaning robots can facilitate higher output.
Fundamentally, RCA ensures that you are properly allocating your resources to find realistic, long-term solutions to problems in any process. It turns your operation from reactive to proactive and can ensure that you are taking advantage of all available efficiencies.
Optimize Your Plant Today
The shift from solving an immediate problem to understanding and resolving the underlying cause of a problem in order to prevent future incidents is the heart of a good RCA approach.There are dozens of resources and tutorials for each individual method across the web, and each has its place in your problem-solving workflow and operations. Something like the “five whys” is perfect for initial brainstorming of an immediate problem, while a framework like FMEA could be integrated into the planning stages of a new process or an evaluation of overall performance.
While RCA cannot clean your tanks for you, it can go a long way toward helping you understand what you need to do to optimize operations in your particular plant.
To learn about more ways to optimize operations, here are ways to improve workplace safety at your plant.