Tech stack uniformity has become a systemic vulnerability
Crashes due to faulty updates are nothing new; in fact, one reason IT teams often delay updates is their unreliability and tendency to disrupt the organization’s day-to-day operations. Zero-days are also an old phenomenon. In the past, due to a lack of cybersecurity awareness among even the largest vendors and their users, zero-days were not only more common but also often publicly available, enabling script kiddies to exploit them.
Another thing that was usual in the past is the diversity in technology stacks. For instance, a bug affecting a Sun Solaris server would not impact an OpenBSD system. Today, we see a much smaller number of operating systems in widespread use, and even “different” Linux distributions often share common codebases, which means that, for example, a bug or vulnerability affecting Ubuntu would probably also affect Linux Mint.
Tech stack diversity used to limit the impact of a single faulty update or exploit, but we are now experiencing an era of dwindling vendor and product diversity in critical systems (through, in certain industries, this has been a long-standing issue). Coupled with increased connectivity—where most computer systems, including large parts of critical infrastructure, regularly depend on other computer systems and online services—this creates a macro-scale “single point of critical failure.”
In other words, the colossal failures we are witnessing are not simply technical issues that can be resolved with better coding practices; they are manifestations of a classic population ecology problem, akin to those seen in biology or sociology.
Are we doomed?
Consider the agricultural history of bananas: Until the mid-20th century, Gros Michel was the world’s most popular banana strain, favored for its superior taste. However, in the 1950s, Panama disease caused by the Fusarium oxysporum fungus wiped out the Gros Michel strain worldwide. Humanity’s response was to plant the less tasty but more resilient Cavendish strain everywhere. However, even the most resilient systems have vulnerabilities. So long as we keep planting the same strain everywhere, it is likely that eventually there will be some germ that may kill the whole population once again. The recent strain of Panama disease called Foc Tropical Race 4 (Foc-TR4) is quite a good contender for this role.
While we may soon face a banana extinction, the same is unlikely with apples. Why? Because there are many popular strains of apples (vendor and product diversity), one vulnerability exploited by a single germ cannot wipe out the global apple population.
The IT industry faces a similar dilemma. We are increasingly reliant on a small number of technology stacks across the board. In many cases, even when the systems deviate slightly from the standard stack–say, using the same software but on Linux or Mac instead of Windows–they often remain unaffected by certain vulnerabilities. However, the lesson here isn’t to simply use Linux or Mac as a panacea; no single system is immune to the uniformity problem. Instead, the real issue is that any widespread uniformity in technology makes the entire ecosystem vulnerable.
Unlike the random emergence of agricultural diseases, today’s multi-billion dollar zero-day exploit market leverages the expertise of top cybersecurity researchers to intentionally target vulnerabilities. Given enough time, vulnerabilities will always be found, and so long as the system is built upon a very small number of products by a handful of vendors, such vulnerabilities as well as bugs introduced by the vendor will retain their ability to cripple entire infrastructures.
In light of this, we must recognize that recent outages (e.g., CrowdStrike) and near-misses of global hacks (e.g., XZ bug) are not just isolated incidents but clear warnings of the systemic vulnerability we are facing. These events highlight the destructive potential of future attacks.
Consider a scenario where millions of computers are targeted in a coordinated cyberattack. If each of those systems were infected with ransomware that encrypts all data, exfiltrates critical information, and perhaps even installs firmware-based malware that cannot be removed even if the hard drives are formatted, the consequences would be catastrophic. Such an attack could paralyze economies, compromise national and international security, and wreak havoc on critical infrastructure sectors such as healthcare, finance, and energy. Entire economies could be maimed for extended periods.
Recognize the risk and do something about it
Will the entire critical IT infrastructure eventually and periodically get blown to bits?
Not necessarily. The key point here is to realize that the lack of diversity is a security risk. Acknowledging this problem–by embedding it in our security assessment protocols, such as ISO checklists, and incident response plans–will compel us to find better solutions.
These solutions can vary depending on the organizations’ specific needs, resources and capabilities. For instance, companies can mitigate this risk by implementing heterogeneous redundancy, where secondary systems, such as backup servers, are activated only in emergencies and are built on a different technology stack. This alternative stack could involve products from smaller suppliers or even different product lines from the primary supplier, as long as they are based on distinct technological foundations.
Another effective strategy is micro-segmentation, where the system is divided into isolated segments, some of which are built on different technologies. Furthermore, organizations can adopt hybrid systems, where redundancy is achieved between the company’s own network and the cloud, or benefit from multi-cloud systems, where different cloud providers are utilized simultaneously.
Ultimately, the most crucial step is for organizations to recognize that relying on a single, standard technology stack is a risk. By reflecting this risk in their security assessment protocols and incident response plans, companies will be better equipped to develop tailored solutions.
By continuously prioritizing vendor and product diversity and making it a core component of our security strategies, we can enhance the long-term resilience of the global technology landscape. In other words, diversity–one of the best defense mechanisms in nature–is what we need not only to protect the global IT infrastructure but also our own organizations.
Contributing author: Fulya Acikgoz, Lecturer (Assistant Professor) in Marketing, University of Sussex