AI’s Testplan for CrowdStrike

jason arbon
6 min readJul 26, 2024

--

We’ve heard a lot about the Crowdstrike issues from human engineers and testers — but, surprisingly, we haven’t heard what AI thinks.

Many people, especially engineers and software testers were quick to criticize CrowdStrike for insufficient testing. With little information and context, these folks started using the incident to push agendas like “large corporations should employ more testers”, or self-promotion “I would caught the issue”, or the least helpful comments of “there should have been more tests” (in hindsight). Some recommended more staging and incremental rollouts, while others advocated for manual, human-evaluation of updates, delaying real-time fixes needed to counter hackers.

CrowdStrike has shared some details about their testing procedures before the incident including:

1. Template Type stress and fuzz testing in staging environments
2. Content Validator checks (though a bug allowed problematic content to pass)
3. Knowing there were previous successful, similar, deployments between April 8–24, 2024

Their old test plan is more comprehensive than the ones used by most of their critics. Ironically, most vocal individuals in the testing community don’t even leverage techniques like fuzz testing or perform as much testing on their own products. But, testers be testers.

CrowdStrike has identified a bug in the content validator that should have caught this issue. So, arguably, they had enough testing — but there was a problem with it. And, that test should be better tested. Bugs in testing code are common, as they’re usually not written with the same level of scrutiny as the product code. There is a “turtles all the way down” problem of who or what is testing the tests.

CrowdStrike is now implementing a wide array of additional test coverage and processes, including:

1. Improved Rapid Response Content testing
2. Enhanced Content Validator
3. Improved error handling
4. Staggered deployment strategy
5. Enhanced monitoring
6. Third-party validation
7. Increased customer control
8. Improved transparency

We live in the age of useful AI. So, out of curiousity, I did what it seems not many others have done, and asked off-the-shelf AI what it thought. While their updated testing plan sounds comprehensive, I asked Claude.ai for its testing advice. Maybe not all that surprisingly, the AI-generated plan was more sophisticated than many human suggestions and potentially better than CrowdStrike’s updated public testing plan. It was definitely better and more constructive, than any suggestions I’ve seen forward by any humans in the public domain. Even better than my own.

I’d share that full test plan, but, well, it is a long document, test plans can be boring to read. The last time I generated and shared a test plan to be helpful, I got stuck on V1 of the ChromeOS project. It is also probably best for the world to take a few minutes try using AI to generate a test plan on their own anyway.

Quick Prompt: How would you test to avoid such an issue? CrowdStrike delivers security content configuration updates to our sensors in two ways: ***Sensor Content that is shipped with our sensor directly, and Rapid Response Content* that is designed to respond to the changing threat landscape at operational speed. The issue on Friday involved a Rapid Response Content update with an undetected error.

Followon Prompt: anything more? what additional testing could you do to avoid any issues with these data file updates?

Summary Of AI’s Initial Test Plan

Differential testing:

Compare the behavior of the system with the new update against the previous version
Automatically flag any discrepancies for manual review

Fuzzing:

Use automated fuzzing techniques to generate a wide range of inputs
Test how the system handles unexpected or malformed data

A/B testing in controlled environments:

Run parallel systems, one with the update and one without
Compare performance, accuracy, and resource usage over an extended period

Machine learning-based anomaly detection:

Train models on normal system behavior
Use these models to detect unusual patterns post-update

Formal verification:

Apply formal methods to mathematically prove the correctness of critical components
This is especially useful for security-critical parts of the system

Chaos engineering:

Intentionally introduce failures and edge cases to test system resilience
Ensure the update doesn’t introduce new vulnerabilities under stress

Threat modeling:

Conduct thorough threat modeling sessions for each update
Identify potential new attack vectors or weaknesses introduced

Red team exercises:

Have a dedicated red team attempt to exploit the system post-update
This can reveal unforeseen vulnerabilities

Historical data replay:

Test the update against a large corpus of historical threat data
Ensure it correctly handles known past threats without regression

Canary deployments:

Deploy to a small subset of real-world users who opt-in for early updates
Monitor closely for any issues before wider rollout

Compliance and regulatory checks:

Ensure each update meets all relevant industry standards and regulations
Conduct audits to verify compliance

Cross-platform testing:

Test on all supported operating systems, hardware configurations, and software versions
Ensure compatibility across the entire ecosystem

Load testing and scalability analysis:

Subject the update to extreme conditions to ensure it performs under heavy load
Test how it scales across large networks

Long-running tests:

Conduct extended testing over days or weeks to catch time-dependent issues
Look for memory leaks, resource exhaustion, or degradation over time

Dependency analysis:

Thoroughly analyze all dependencies and their potential impact
Ensure all third-party components are up-to-date and secure

Code signing and integrity checks:

Implement rigorous code-signing processes
Verify the integrity of updates at every stage from development to deployment

This AI-generated plan was created with just two prompts, demonstrating the potential of AI in test planning. I just copied the basic description of the problem from CrowdStrike’s incident report and then nudged it to be even more detailed with a second follow-up prompt. Some of the AI’s suggestions were reasonable and expected, and covered most of what CrowdStrike, and the crowd of humans on X or YouTube suggested, but it also included ideas that CrowdStrike (and other teams working on critical infrastructure) didn’t mention and should probably consider.

Yes, I read and analyzed the AI’s test plan myself. But, I’m also lazy and error-prone, so I asked a second AI (ChatGPT 4o) to compare the Claude.AI-generated plan with CrowdStrike’s plan. The AI highlighted several key differences in the test plans:

1. Greater depth and breadth in the AI plan
2. More structured staged rollout
3. Emphasis on sandbox and environment testing
4. Introduction of advanced testing techniques
5. Increased focus on security-focused testing
6. More attention to performance and scalability
7. Greater user / customer involvement
8. Inclusion of compliance and documentation processes
9. Emphasis on dependency management
10. Focus on code integrity

Summary

In summary, while CrowdStrike’s original testing and newly planned additional testing, delivers pretty good coverage, the AI-generated plan is far more comprehensive, structured, and detailed. It introduces many advanced testing methodologies and focuses more on security, performance, and user/customer involvement.

Interestingly, it seems that neither the critics on social media nor $100B “AI” companies like CrowdStrike and Microsoft are leveraging AI deeply in their test planning. Even a quick prompt delivers a far more comprehensive and convincing test plan than those put forward by internal or external humans.

Using AI not just to improve the test planning, but also to communicate the new test plan might have put customers and the world more at ease than the human-crafted incident reports and updates so far. Given what we see here, I’d have more trust the test planning of an AI collaborating with an experienced tester, and that’s after only 15 minutes of experimentation. I chatted more with Claude.AI, and we started developing a pretty extensive and detailed plan together — AI was the best test planning collaborator I’ve engaged with the past week :)

AI was the best test planning collaborator I’ve engaged with or read in the past week. The AI also apologized for not having more context to do a more detailed test plan and was humble enough to note that even with all this additional testing — bugs will still slip through. The AI didn’t spam LinkedIn or X.

Most interestingly, this is arguably a multi-billion dollar issue, but a simple $30/month AI chatbot and about 15 minutes of an experienced person’s time would go a long way in protecting the world from future incidents like this — and at least making the world feel safer.

The biggest lesson of this CrowdStrike incident is probably that humans should talk to AI first.

- Jason Arbon, CEO @ Checkie.AI

--

--

jason arbon

blending humans and machines. co-founder @testdotai eater of #tunamelts