A Catastrophic System Update and a Huge Failure of QA

In a shocking display of incompetence, millions of computers around the world simultaneously became unusable, all thanks to a bug that led to the dreaded “Blue Screen of Death.”

CrowdStrike, a US cybersecurity company based in Texas, offers ransomware, malware, and internet security products primarily to businesses and large organizations. But on Friday, July 19, they released a sensor configuration automatic update on their Falcon program targeting Windows systems. This reckless update wreaked havoc globally.

Falcon sensor, a cybersecurity program providing automated malware protection, antivirus support, incident response, and other security features, is cloud-based. This means it operates alongside CrowdStrike’s servers, without requiring customers to manage extra equipment or software. Yet, the company’s gross negligence in quality assurance and testing allowed a disastrous bug to slip through.

CrowdStrike stated these types of updates happen multiple times a day. However, this routine update triggered a catastrophic “logic error” that caused Windows systems to crash. The update was meant to target malicious system communication tools but instead plunged millions into chaos.

Millions of Windows PC users reported seeing a “Blue Screen of Death” on their devices, with many systems trapped in a relentless reboot loop.  Thousands of flights were grounded, causing chaos for travelers, while banks reported disruptions to critical online transactions. TV broadcasters and telecom operators also faced significant issues, adding to the widespread confusion. To make matters worse, several 911 operators across the US were unable to respond to emergencies for several hours on Friday morning, putting countless lives at risk. This is an outrageous failure of responsibility and competence.

While it may be possible to escape the reboot loop by manually entering SAFE MODE, most users have no clue how, almost all enterprise users do not have admin rights to do so, and millions of kiosks and POS terminals lack any traditional mouse or keyboard to be able to access that mode, rendering them dead until an IT professional can be called in to fix them. One by one.

This entire incident highlights a glaring lack of proper testing and quality assurance within the company, raising serious concerns about their operational practices and commitment to their customers’ security.

Root Cause

The cause of this catastrophe is clear. The company moved to a DevOps execution mode some years ago in order to push out updates multiple times a day. As the updates became more frequent, the amount of testing continued to fall. And therein lies the trap. Testing less is NEVER acceptable, even when software tools tell you that a patch or update needs limited testing. Because all software today is immensely complex and has many interdependencies it is almost impossible to be absolutely sure that even a small patch will not cause problems somewhere sometimes in some systems (or browsers or phones). Testing less is a symptom of a broken system which recognizes it cannot test everything in 2 hours so abandoned that safety net to test less. And this is the result.

Did CrowdStrike test this release at all in any Windows systems? While I don’t have inside knowledge the answer is clear. It was untested. They had become so complacent and sure of their processes, after having thousands of updates go off without a hitch, that testing effectively ceased. There is no other explanation. Since this essentially renders useless any Windows version past Windows 7.11, there simply is no other explanation other than complacency leading to blue screens of death.

Lessons

This is a huge mess that could have been avoided.

The worldwide cost of IT intervention and lost productivity? Many Billions.

Cost to Crowdstrike’s market cap? Billions.

Money saved by not testing that update. $1000. Max.

Nice work.

As AI continues to be able to create massive end to end scripts and tests in minutes, there simply is no reason to not test your releases to the fullest extent every time. Testing less leads to a view that we are getting away with testing less, and less, and less. Until it blows up in your face. While many here will argue, the only reason to test less was time and cost. As time and cost head to zero with AI, we must leave that reasoning behind and test everything fully.

Don’t have egg on your face. Test more. And leverage AI to generate, update, maintain and run those tests on your fully integrated systems before release.

Appvance IQ (AIQ) covers all your software quality needs with the most comprehensive autonomous software testing platform available today.  Click here to demo today.

Recent Blog Posts

Read Other Recent Articles

Technical debt is a term familiar to many development teams, referring to the long-term consequences of taking shortcuts in software development. While sometimes necessary to meet tight deadlines, this debt accumulates over time, leading to increased maintenance costs, reduced productivity, and greater risk of defects. Fortunately, the advent of AI-powered solutions like Appvance IQ (AIQ)

Enterprise applications are the backbone of modern businesses, supporting critical operations across diverse industries. However, their complexity and scale pose unique challenges for testing teams. Ensuring these applications perform seamlessly requires handling large volumes of test cases without sacrificing speed or performance. Appvance IQ (AIQ) is uniquely designed to scale automated testing to meet the

Ensuring product quality while maintaining speed to market is paramount in the software development process. Regression testing—the process of verifying that new code changes do not disrupt existing functionality—is essential, but it can also be time-consuming and repetitive. Automating regression testing with Appvance IQ (AIQ) offers an efficient solution to streamline this process, saving time

Empower Your Team. Unleash More Potential. See What AIQ Can Do For Your Business

footer cta image
footer cta image