Here's what the CrowdStrike outage exposed about our connected world. It's not good.

Nearly a week after a massive IT outage shut down computer systems around the world, cybersecurity company CrowdStrike (CRWD) issued a statement Thursday revealing that a single software update was responsible for grounding planes, curtailing hospital procedures, and closing businesses for days.

The announcement came as the majority of companies returned to business as usual. But it points to the vulnerability of our modern internet infrastructure and how taking out even a relatively small number of devices — Microsoft (MSFT) estimates 8.5 million systems were affected — can impact our lives.

“What we see here is the cascading effect that a minor software update, or in the future, maybe a cyberattack or malicious code, can have a huge impact,” David Bader, director of the Institute for Data Science at the New Jersey Institute of Technology, told Yahoo Finance.

And without some kind of broader plan to address the matter, another widespread outage is all but guaranteed to happen.

“What we’re seeing today is these types of cascading failures occurring more and more frequently,” Bader said. “These will continue as we see AI, and as we move toward [artificial general intelligence], that these types of failures, whether they’re accidental, some bad programming, such as CrowdStrike, or whether their malicious attacks, will continue showing the vulnerability of our technological world.”

A lack of coherent rules

According to the announcement statement by CrowdStrike, the company issued a software update on July 19 that included a flaw that went undetected in validation checks. The error immediately crashed certain Windows systems connected to the web, causing them to display a crash message known as the blue screen of death.

CrowdStrike says it’s responding to the matter by reworking how it prepares its software updates, including more stringent testing and staggering deployment to prevent a global systems collapse in the future.

Screens show a blue error message at a departure floor of LaGuardia Airport in New York on Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Yuki Iwamura)

Screens show a blue error message at a departure floor of LaGuardia Airport in New York on Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Yuki Iwamura) (ASSOCIATED PRESS)
Screens show a blue error message at a departure floor of LaGuardia Airport in New York on Friday, July 19, 2024, after a faulty CrowdStrike update caused a major internet outage for computers running Microsoft Windows. (AP Photo/Yuki Iwamura) (ASSOCIATED PRESS)

It’s important to note that software is developed by people. And while they’re usually incredibly capable people, they’re still human, and humans make mistakes. That’s generally how flaws enter software ecosystems, whether it’s CrowdStrike’s programs or some other company’s platform.

“Even the best testing processes fail,” explained Gartner analyst Jon Amato. “You can do a certain amount of automated testing, but those automated tests are themselves designed by human beings and human beings are fallible.”

And while CrowdStrike is certainly looking to improve its own internal processes as far as ensuring the stability of its software updates, that doesn’t mean every other software company will do the same.

“We really don’t have any organization in the US that is looking holistically at our technological resilience,” Bader said.

He added, “We don’t have a body that can generate the best practices needed for private industry to both protect against the delivery of the software updates and what a customer should do, for instance, the banks, the hospitals, the airlines, how they should protect themselves to ensure that these problems don’t impact them in the future.”

And while the Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency offers tips, there’s no major enforcement mechanism in place to force companies to follow specific strategies when issuing software updates or addressing program failures and malicious attacks.

Without those, Bader said, a larger outage and prolonged recovery are bound to happen.

A bigger problem?

Outside of a need for a regimented approach to IT failures, the CrowdStrike outage also points to a broader problem within the backbone of the world’s tech infrastructure: A small number of companies have an outsized impact on how the web operates.

“We definitely know that these are very fragile systems, and the fact that they work as well as they do is, frankly, a miracle, given all of the different players, the lack of heterogeneity of the stack,” Gregory Falco, assistant professor of mechanical and aerospace engineering and systems engineering at Cornell University’s Sibley School, told Yahoo Finance.

But expanding the number of companies that plug directly into our internet infrastructure isn’t exactly an easy fix either. That’s because the more companies there are, the more opportunities there are for failures.

Ultimately, the solution to these kinds of world-scale problems might just come down to forcing companies to be better prepared for catastrophe. And if software does fail, understanding how to contain the fallout.

By Daniel Howley

Email Daniel Howley at dhowley@yahoofinance.com. Follow him on X at @DanielHowley.

https://www.aol.com/finance/crowdstrike-outage-exposed-connected-world-150321849.html

https://headtopics.com/ca/what-the-crowdstrike-outage-exposed-about-our-connected-56460861

David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.