I Violated the First Tenet of Troubleshooting: Failed to Isolate The Problem
The first step in learning from mistakes is recognizing that you made one. I learned a valuable lesson early in my career: you cannot figure out How to fix a problem until you know the 5-Ws: Who, What, When, Where, and Why.
One of my first offshore assignments as a Field Service Engineer was to troubleshoot and remediate a few issues our customer was experiencing with the industrial control system for one of their critical safety systems. After a couple of successful deployments, I was sent by myself to identify the problems and resolve any issues prior to the next operational phase. As it is usually the case offshore, time is of the essence, and deadlines are critical for operational success.
I was able to quickly identify and rectify all issues shortly after arriving except for one: loss of positioning sensor information for the equipment deployed subsea. Thinking it was unlikely a software error, I focused my attention on ensuring the control system was communicating with the sensors. Sure enough, the indicator lights in the communication cards were not blinking as they should, so I knew there was a communication issue. I verified the network was configured correctly and no cables had been disconnected or come loose. I swapped cards to verify whether it was a hardware failure, but still no communication. I even tried swapping the transmit and receive wires thinking they may have been inadvertently miss-wired: nothing.
Starting to grow frustrated, I took a step back and started to trace the circuit to figure out what the potential issues could be (in hindsight, this should have been my first step instead of "shooting blindly" hoping to hit a target). While inspecting the large umbilical cable for damage, I discovered a short section of cable that appeared to have been caught on a bind and bent, penetrating the protective sheath and exposing some of the cables inside. I was certain this was the source of the problem, and with a bit of hesitation, the customer agreed with the findings and mobilized additional support to cut an re-terminate the cable. Once completed, the cable was reinstalled: NO CHANGE.
With 48 hours to go until critical deadline, I was beginning to loose hope. I thought the source of the problem must be located inside the pressure balanced assembly, which would require extensive time to open up, inspect, close, and retest before the equipment could be returned to operation. This meant missing the deadline. Rightfully disappointed, the customer gave me the green-light to prepare for disassembly.
As we prepared to open up the pressure balanced chamber, the umbilical cable was removed. And then it hit me: What if the problem is on the connection with the cable? Once I peaked into the connector plate, the answer became clear: one connector pin was broken, and another was missing. When I checked the pin numbers, they were dedicated to the positioning sensor. So, the cause of the problem was at the most logical and easiest place to check the whole time; the one place I did not think to look. A straight 33-hour shift later, I was able to fix the problem and restore the equipment with two hours to spare before the deadline.
After I had time to reflect on this experience, I realized I had all the answers I needed except "Where". I also understood that sometimes the answer to your problem is right in front of you the whole time. You simply need the right perspective.