Articles, Case studies, white papers
When Failure is Not an Option11/06/2020
Apollo 13 was due to be the third crewed mission to land on the moon. Famously the mission was aborted when an oxygen tank ruptured just three days into the mission, resulting in loss of oxygen supply capacity and putting the lives of the crew in jeopardy. Fortunately, complete disaster was avoided by the crew devising a workaround for their oxygen supply using materials onboard the ship to affect a fix. The crew were able to land safely back on Earth six days later. This is probably one of the most famous examples of where just a small fault which went overlooked, can have a catastrophic effect on the overall function of a system. In the wake of Apollo 13, engineers redesigned the oxygen system to prevent similar accidents, a third oxygen tank was added as an additional backup. Eight more Apollo spacecraft flew and none of them experienced the same trouble again.
Certainly, when it comes to UPS systems, as indeed spacecrafts, if you value your critical load, then ensuring the highest availability is paramount. Achieving 100% uptime must be the primary goal of any UPS solution. But good old Ying and Yang….there is always a trade-off. The highest levels of resilience can be achieved within any electrical system by removing as many single points of failure as possible, and by adding redundancy. However, the more redundancy that is introduced, naturally the more it costs. But if the critical load really is that critical, compromising on correct configuration and quality to reduce costs is a false economy.
UPS technology has seen huge developments over the years. Systems have technically advanced, become more efficient and take-up significantly less space than previously. The good news is that they cost less to purchase and run too. Efficiency-wise, the most modern UPS have become about as close to perfect as possible, keeping in mind there will always be some losses due to the very nature of electrical switching.
The latest “true” modular UPS systems are designed so each module contains all the power elements of a UPS including rectifier, inverter, static switch, display and importantly, all control and monitoring circuitry. This configuration places it above other designs that can have a separate, single static switch assembly, and the technology behind the intelligence modules means there is no single point of failure. Repair is easy as whole modules can be hot swapped, giving a Mean Time to Repair (MTTR) of less than three minutes.
However, installing a truly modular UPS which ensures maximum resilience and availability is only helpful if the cables connecting all the components themselves do not create a single point of failure. The simplest communications bus is a single cable. If this breaks, or becomes disconnected, the entire system could be compromised.
For this reason, the ring circuit has been introduced within the majority of modern UPS systems. If the circuit breaks the signals can simply communicate the other way around the fault. However, the introduction of the Triple Mode communications bus is the safest option of all.
Like its name suggests, the Triple Mode communication bus has three paths of communication between UPS modules and parallel frames made up of three separate ring circuits. Three brains communicate with three other brains in each module – it’s the belt, braces and buttons approach.
We liken Triple Mode to the comparison between a tight rope walker and someone walking across a bridge. If a tight-rope breaks, the consequences will be dramatic and far reaching. In the same way, a single communications bus is far more precarious than a triple mode ring connection, which is more like a bridge with multiple supports. Here the single points of failure are completely removed. Even if one or several bridge struts fail, the others will support the load.
It's a similar story with batteries. Modular systems can either have individual batteries associated with each module or an option where a common battery string is used for all the modules. But what if the busbar connecting all the batteries in the common battery string fails? The whole UPS system becomes unavailable. Each time we remove a point of failure we increase the level of availability, but this does also increase the cost. It is important customers are aware of this ‘trade off’ from the outset and are comfortable with the level of availability their system provides.
There are other areas too which can introduce issues that reduce the availability of a UPS. I’ve talked about battery calculations before. If two manufactures quote for a similar system and one has a significantly cheaper battery calculation there will be a good reason for this when you look at the small print! It may be that the calculation was made for batteries operating at a higher temperature than normal, it may be because a low Battery End of Discharge Voltage variable was used, or a higher percentage efficiency improvement was estimated by negating the front-end rectifier conversion. Always read the fine print!
Maintenance is another area of concern. The best, most available UPS configuration will only continue to do its job if it is properly checked and maintained over time.
There is sometimes a perception that a UPS is a bit like an item of mechanical plant, like a boiler or a pump so there is a tendency to overlook its importance. But it’s no good spending £800K on two super-duper new servers to sit in a rack if they’re supported by an ageing UPS system that has been sat in a plantroom and not maintained for several years! For medical applications needing power for equipment to save lives, the UPS is even more important.
Therefore, to decide on the level of resilience required an assessment of the value of the critical load needs to be made. Any manufacturer can quote for a cheaper system however, customers need to be aware that this will result in less resilient protection. It may be a cheaper part made of lower quality materials which is more likely to fail, or a lower battery calculation resulting in a shorter run time. Similarly, availability may be reduced by introducing the small risk of using a common battery string or purchasing a modular system without triple mode. However, if that critical load is worth protecting the increased availability is worth the price so it’s important for customers to be aware of the cost versus limitations of their UPS system.
I hope I’m forgiven if my memory is not perfectly accurate, but I recall the words of John Glenn, the first astronaut to orbit the earth when asked: “what’s the last thing you think about just before lift-off? His answer was poignant: “I’m sitting on a huge bomb made of the million cheapest parts…”
The team at CENTIEL has been at the forefront of UPS development over several decades. We are trusted and always happy to advise on where and how systems can become vulnerable. Our goal is clear: to achieve the ultimate availability of power possible for our client base. Our leading-edge technology, backed-up with our comprehensive maintenance contracts carried out by our experienced engineering teams ensure our clients’ power has the very best protection at all times.
I’m sure the crew of Apollo 13 and the team who worked so hard to get them home safely, would have preferred to know there were no elements of the space shuttle where maintenance was overlooked resulting in the whole mission being compromised in the way it did. So, when failure is not an option, the simple message is don’t cut corners and compromise on availability.
By Mike Elms, Managing Director, CENTIEL UK