Levi, Ray & Shoup, Inc.

Maintenance fireworks on a holiday weekend

7/7/2022 by Patrick Schmidt

This past Monday we in the United States of America commemorated the foundational event of our history, Independence Day. Also known as the Fourth of July, we celebrate the day with parades, picnics, and fireworks. In addition, many businesses and most government offices are closed so that employees can spend the holiday with friends and family, and maybe even take a much-needed vacation to enjoy the warm summer weather.

But not everyone is off for the day, and we rely on them for essential services during holidays. Medical professionals are tending to their patients in hospitals and first responders are on duty. What other “essential” services do we rely on? Information technology professionals and the infrastructure they maintain.

We may not think of it very often, but there are hundreds, if not thousands, of moving parts that make our modern life possible.

Are you hungry after the parade and want to pay for those street tacos at the food truck with your credit card? The transaction runs through IT infrastructure for approval before you can dig into your lunch.

Looking to avoid the traffic on your way to the lake for the fireworks show? IT infrastructure helps you plan your route. The map on your smartphone relies not only on GPS satellites, but analysis of real-time traffic data that is being processed in a datacenter.

Heck, even parking meters are now wired to the internet.

All the services we employ appear to operate seamlessly most of the time. That is because IT infrastructure is up and running and being monitored by IT professionals. But what happens when something goes down? And what happens when it goes down on a holiday weekend?

I have a true story from the July Fourth weekend in 2016 when a critical piece of infrastructure failed for a managed services provider (MSP) in my hometown.

I was on my way to the nearest gas station on Saturday, July 2, to pick up bags of ice for our trip to the lake. My mobile phone rang, and the caller was a bit of a surprise. It was friend named John who was a services executive at a major OEM who needed help. He explained that the MSP (who was not my customer) had one of their primary storage arrays go down and they couldn’t service their customers.

I asked John why he was calling me, and he bluntly said, “The array is not on support, and you know we don’t send anyone out on weekends or holidays without a contract.” He thought that because my company was local, and MSP’s reseller was not, that we might have an engineer who could make a service call.

That was a big ask, but I did try to find someone. I had no success and called John back with the bad news that no one was available, at least until Tuesday the fifth. He was disappointed but understood and said he would try something else.

So, what happened? The OEM bent their policy and did dispatch engineers to the customer over the holiday weekend. They had the array up and running in the early morning hours of Tuesday, July 5. Apparently, they had to fly multiple parts into datacenter from across the country and spend the overnight hours from the fourth to the fifth installing them.

So, back up and running and they were all good, right? Perhaps… Until the bill for the services came… for more than $225,000. You read that right. The cost was enough to buy a mid-sized home in some areas, and enough to buy a supercar like a Lamborghini.

In addition, the amount of the services bill doesn’t account for contractual penalties. Most MSP contracts have clauses that guarantee a certain level of uptime. This event would have certainly triggered a multitude of penalties and credits to the MSP’s customers.

The MSP was making a bet and wagered that running without hardware maintenance would be no big deal. They thought, this gear is reliable, why would we want to pay $47,000 for maintenance on this array for a year? Besides, it’s only disk drives that fail, right? We have spares and redundancy is built in.

But it wasn’t bad drives that brought the array down. It was the simultaneous failure of both controllers. They gambled and lost. Would you make that bet?

How would a critical system down for over three days impact your business? Now more than ever it is essential to make sure your coverages fit your needs. If it has been a while since you reviewed your IT asset lifecycles, contact us to see how we can help keep you up and running. After all, you want to keep the fireworks in the air, not your datacenter.

Patrick Schmidt is a Technology Lifecycle Management Specialist with LRS IT Solutions. For more than 23 years, he has been helping customers get a firm grasp on their asset and contract management with a combination of comprehensive service level analysis and lifecycle management best practices.