Update: Worldwide IT outage due to buggy Crowdstrike update

The world is 16+ hours into what looks like the biggest IT outage in history, triggered by a defective update for Crowdstrike endpoint security software for Windows machines.

Crowdstrike IT outage

The price of both Crowdstrike’s and Microsoft’s shares has tumbled down as a result, and the companies are offering (and updating) advice on how organizations can recover affected workstations and endpoints.

The restoration might not be that much of a problem for organizations in the IT sector and with a healthy number of IT staff, but will likely be a long process for companies that have outsourced their IT department or have a huge number of affected Windows-based systems that are scattered and cannot be quickly serviced en masse (e.g., information kiosks, display systems, PoS systems, etc.).

In the meantime, users of the subreddit where sysadmins congregate and talk shop are sharing methods and procedures they have devised/used to get many machines working quickly.

Threat actors taking advantage of the chaos

“[The incident is] going to cost companies billions, it will lead to legal action, and it will affect businesses and users in a way we’ve never seen before,” Guy Golan, CEO and Executive Chairman of Performanta, told Help Net Security.

“Attackers may have more awareness of who is using CrowdStrike as a result of watching this unfold which could cause further cyber security complications down the road.”

It’s also possible and likely that some threat actors will take advantage of the chaos currently disrupting IT and security teams’ regular work and monitoring for intrusions.

Crowdstrike has warned organizations to make sure to communicate with the company’s representatives through official channels. Dr. Johannes Ullrich, Dean of Research at the SANS Technology Institute, has received reports of phishing emails claiming to come from “Crowdstrike Support” or “Crowdstrike Security.”

“I do not have any samples at this point, but attackers are likely leveraging the heavy media attention. Please be careful with any ‘patches’ that may be delivered this way,” he added.

Organizations must plan for cyber resiliency

“What today demonstrates is that in today’s modern business world we have become heavily reliant on the Internet and IT systems. Which is why organizations need to look at cyber-risks as business risks and not simply IT risks and plan to manage them accordingly,” Brian Honan, CEO of BH Consulting, told Help Net Security.

“In particular, organizations need to design, implement, and regularly test robust cyber resilience and business continuity plans not only for their own systems but also for those services and systems they rely on within their supply chain. The events of today highlight the importance of regulations such as the EU NIS2 Directive and EU DORA in ensuring organizations are taking the appropriate steps to manage cyber risk within their own organizations and just as importantly within their supply chain.”

Due to required manual intervention, the recovery time from this problem could end up being long, he noted, and advised organizations to prioritise the systems that are most critical to their business and recover them in order of priority.

“Another aspect of this incident relates to ‘diversity’ in the use of large-scale IT infrastructure,” says Tony Anscombe, Chief Cybersecurity Evangelist at ESET.

“This applies to critical systems like operating systems (OSes), cybersecurity products and other globally deployed (scaled) applications. Where diversity is low, a single technical incident, not to mention a security issue, can lead to global-scale outages with subsequent knock-on effects.”

It is quite possible – and, in fact, very likely – that some of the disruptions that happened today worldwide were part of the latter.

Solving the larger problems

Crowdstrike has recently pushed out an update for Falcon Sensors on Windows that had a bug that also incapacitated some systems, but wasn’t as widely disruptive as this most recent issue.

“Questions will need to be asked of CrowdStrike as to what went wrong with their testing and quality assurance processes to ensure there was no impact on their customers and what they are going to do to ensure there is no repeat of today’s issue,” Honan added.

Tom Lysemose Hansen, CTO of Promon, says that the nightmare-inducing problems associated with pushing a faulty update or patch like this is the very reason why most firms wait around a month or so before they choose to implement them.

Unfortunately, the Falcon agent asks and is usually granted permissions to implement updates automatically.

Jake Williams, a former NSA hacker and VP of R&D at Hunter Strategy, pointed out that this incident highlights the risks of SaaS-based services taking update cycles out of the hands of systems administrators.

“Many security teams don’t realize that their endpoint protection platforms’ signature updates often themselves contain code, further exacerbating the issue. We should expect to see changes in this operating model. For better or worse, CrowdStrike has just shown why this operating model of pushing updates without IT intervention is unsustainable,” he opined.

OPIS OPIS

OPIS

Don't miss