AI Changes Exploratory Testing

jason arbon
11 min readSep 14


Let’s explore how AI is changing exploratory software testing. Yes, most exploratory software testers think that only humans can do this work. Strangely many of them haven’t actually tested the AI, or explored it all that thouroghly. Lets explore some examples of how generative AI will influence and impact this form of testing. I’m writing this as a former professional exploratory tester with experience applying real-AI for testing, and the goal of making sure great testers and open-minded folks aren’t caught off gaurd.

GenAI is not “Classic AI” or “Automation”

It is important to note that this wave of generative AI will hit differently. This isn’t just another phase of “test automation” like we’ve seen the past two decades, with mixed results. This is also not the “AI” of hueristics from the past few years claiming to fix subtle things like page selectors, or painfully identifying search boxes and shopping cart icons from screenshots with a bunch of training exmaples.. This is not the “Fake” AI described by overley enthusiastic marketing teams and or heads of testing companies scrambling to seem relevant to customers and investors. Today’s generative AI will actually change everything because it has a two new magic properties:

  1. It knows almost everything. It has read the entire internet, most testing books, and will soon understand every blog post, tutorial, and edcuational video on the topics of Testing — and everything else.

2. It is generative. This means it can create about anything that can be done in text: test cases, plans, strategies, risk analysis, bug reporting — anything that can be done in text. Unlike old-school Google search, its doesn’t find things already written by humans — it generates new things.

AI is a Pair-Testing Buddy

High-confidence, great exploratory testers should be super excited about generative AI and they should see it as their new best friend. Rather than wax poetic about what “AI” might do, some day, in vague terms — lets do a little self-testing and self-exploration. For each aspect of exploratory testing, lets see how the AI performs. We are testers right? Let facts, and experience, not fear and avoidance, drive your evaluation of generative AI. In the interest of generality and simplicity, lets use the Google Home page as our ‘system under test’. There is a list of the AI responses I elicited from ChatGPT, for laziest of testers reading this. But, if you are a great exploratory tester, you should:

  1. Take a few minutes and consider what your expert mind would come up with for each point below.
  2. Take another couple minutes and explore/have a conversations with ChatGPT by yourself about the question/topic. Ask difficult questions, ask followup questions and ask it to explain itself.
  3. Compare your answers with that of the AI. Be Honest with yourself evaluation. Maybe write both responses down and ask a human friend to analyze and compare the results independantly if you are really a ‘scientist’ or ‘philosopher’ of testing :)

Best Attributes Explored

Here are some of the top attributes about exploratory testing, compared to other scripted, or classically automated test scripts.

  1. Creatitivity: Just like exploratory testers, generative AI can create new ideas, evaluate other ideas — not simply repeat or execute canned ‘checks’.

Exercise: “Think of the most creative, out of the box test cases for the Google home page.”

2. Contextual, Human-Like Communciation: AI can now create human-like reporting and advocacy for bugs instead of the simple pass/fail test results of scripted testing. Even consdier the ‘context’ of the application, business, team, and customer — implicity and explicitly.

Exercise: “You are an exploratory tester and you found a bug on the google home page where if you don’t type any text into the search box, hit the search button, you are presented with seemingly random search results. Generative a great bug report explaining why this is important to fix.”

3. Critical Thinking: AI can evaluate and even critically question all aspects of software’s design, implementation, and even in a business context. Better than most testers — it can even create suggestions or fixes.

Exercise: “Consider the Google home page from a ‘critical thinking’ perspective”.

4. Empathy: AI can emulate different points of view such as the customer, or the business. AI can exhibit empathy like an exploratory tester, and ironically probably with with less bias.

Exercise: “Empathize with the different types of end-users who might try to use the home page.”

5. Learning: Some go so far as to describe exploratory testing as a process of ‘learning’. Learning about the product behavior, customer, competitors, business, etc. Whether you want to learn about anything, there is proabably no better tutor in the world than a long session with ChatGPT on any topic, including exploratory testing itself. Yes, it sometimes hallucinates, but so do people, and there is even misinformation, even plain dissagreements between books and even Google search results.

Exercise: “What pieces of information should you gather to learn the context for great exploratory testing of the home page?

6. Variety: Most imporantly, like humans, the AI doesn’t always agree with itself — every interaction can produce slightly different results. AI, like humans, can confidently make mistakes or guess incorrectly.

Exercise: “How could you add variety to your exploratory software testing sessions for the home page?”

If you are a great exploratory tester — you will be able to ‘explore’ generative AI and discover these capabilities. The very capabilities you pride yourself in having, and identifying in other humans, now exists in the machines. You won’t hold preconceptions or biases, even if they are personal and related to job security either as a tester, teacher, vendor, or manager.

The Great Testing Filter

Generative AI is a ‘great filter’ for the world of testers. Those that are open to exploring AI deeply, will become super smart exploratory testers. Those that don’t explore and leverage generative AI will hinder their work. Exploratory testers cannot hide anymore because AI has been democratized and is easily accessible. In seconds your managers, peers, and even customers can leverage AI to evaluate your work. They can simply copy/paste your reports, bugs, etc into ChatGPT and ask it to evaluate the quality of your work: What is great about the work, what is missing, what could be better, what text should I put into their performanc review? Even more interesting, if you beleive AI is just a ‘stochastic parrot’, ‘garbage’, or ‘not useful’ — all these other folks will still use it. You will ultimately be judged by AI, directly or indirectly. For those exploratory testers who don’t embrace AI, they will be out-performed by others that do.

“You can’t hide because it is trivial for your managers, peers, and even customers to use AI to evaluate your work.”

Near Future

We can see these interactions between exploratory testers and humans — but creative engineers will, yes, help automate that human->computer->human workflow because the humans are the slow and expensive part of all this.

Human exploratory testers will be aided by tools that bring together the data and context and make this AI context available in realtime. Folks have already begun work building chrome extensions that auto fetch the relevant information and results of auto-generated prompts similar to those above. Exploratory testing humans will soon get a lot smarter and faster thanks to generative AI.

Many exploratory testers won’t heed the advice, some will even argue with the AI itself. Historically that’s not been the smart decision when it comes to technological progress.

Mid-Term Future

Some of the exploratory testing above, will go ‘Full Auto’ as I like to call it — removing the human from the loop for most basic exploratory testing.

When basic exploratory testing is fully automated, the mediocre human exploratory testers, and the ones that avoided AI, will be retraining for other jobs. To be frank, I think this will be most of them.

Those that are really great exploratory testers, and early adopters of AI in their work, will convert to a role where most of their time is spent in three areas:

  1. Reviewing the results of the AI, checking for false negatives and flase positives, with that feedback going back into the AI mind for future training and testing.
  2. “Nudging” the AI. Influencing its behavior, and providing context the AI doesn’t have about the product, team, or priorities, but not adding specific, scripted, test cases.
  3. Doing the corner-case exploratory testing that the AI cannot perform, but this will be an ever shrinking piece of the testing work.

For those that question possiblity of this near future, I suggest they consider the ability of the AI in the scenarios above, combined with the rapid pace of recent AI progress, and two other critical data points:

Price: A call to GPT3.5 costs as little as $0.003 per call/prompt — and its price was recently cut by 10X. How much did it cost your company for you to simply read this blog post? Probably more than 10X the the combined time for all the AI calls above. Did they cut your salary by 10X?

Speed: Each response above, returned in under 30 seconds. How long did it take you to think about, and document your responses. 10X increases in speed have a tendancy to disrupt work patterns and industries.

People are rarely rational. Many ‘manual’ and exploratory testers will claim, until the end, that the AI isn’t useful or reliabile enough, not realizing that it is often difficult to compete with ‘good enough’ when the price is so low and the speed is so high.

Given my experiences in testing, I’ll just say that generative AI appears better than 90% of all testers I’ve had the privilege to work with — and definately better than me.

AI-Example Answers

You really should ask the questions above of your self first, then the AI, and only after that, read below. You will only believe it, or understand it well if you do it yoruself. For all those that don’t think it is worth the 5 minutes to talk to a machine to ensure your job security, read-on :)

  1. Creativity:

“Think of the most creative, out of the box test cases for the Google home page.”

Example follow-on question

regarding the exploratory testing in the area of cosmic radiation, in teh context of testing the google home page: please explain more about how the test could be conducted, what results to look for and how to communicate the findings, positive or negative.

“Why would performance degradation be a possible impact of issues caused by cosmic radiation?”

I have degrees in computer and electrical engineering and I just learned some cool geeky things. I wouldn’t have been this creative. Something tells me the average, even the best, exploratory tester wouldn’t have been creative enough with this test design and reasoning. Yes this example is ‘out there’ but fun, and demonstrates just how creative generative AI can be. Maybe not all of is fully true, but it directionally seems correct. Nerd cool. If you thougtht of this yourself, cool, but I wouldn’t wan’t to go to a party with you.

2. Contextual, Human-Like Communciation

You are an exploratory tester and you found a bug on the google home page where if you don’t type any text into the search box, hit the search button, you are presented with seemingly random search results. Generative a great bug report explaining why this is important to fix.

Example follow-on question

How much might the operational costs induced by the empty search result bug cost Google, the company?

And a more specific follow-on question:

Estimate the specific cost related to server-load only.

Know any exploratory tester that could even attempt that — leta lone be correct?

And, deeper followup/recomendation/analysis:

based soley on this cost of server-load, should Google probably fix this bug, if so what priority or urgency?

3. Critical Thinking

Considering the google home page, as an expert exploratory tester, describe how you would apply “critical thinking”

4. Empathy

Empathize with the different types of end-users who might try to use the home page.”

Lets ask a follow-on question:

“for the tech novice, what should an exploratory tester look for when testing the google home page?”

And, even more details about what to specifically look for on the page.

And, lets see if the AI can empathize with Humans that cannot accept that AI might be as good, or better than they are:

Now lets ask the AI to compare the advantages of humans vs AI in exploratory testing.

And, what about AI empathizign with, AI?

5. Learning

“you are an expert exploratory software tester. assuming you had acess to a human helper that could create prompts, or copy/paste information info a prompt from a webpage, how would you learn about the context needed to test the home page?”

Followup specific to Historical Context

And, lets see if the AI can help us apply the lessons learned in James Whittaker’s “Exploratory Software Testing” book.

6. Variety

“in the context of the google home page, how could the concept of ‘variety’ be aplied to exploratory software testing?”


If you are exploratory software tester you should be actively exploring how generative AI can enhance your value, productivity, and job security— if not you, others will. If you manage exploratory software testers, try using GPT to quickly asses how well your team is perfoming. If you want to pretend this revolution isn’t happening, just remember next year that I tried to help :)

Mabye we’ll soon explore how AI will impact other folks in testing: Test Managers, Test Automation Engineers, Engineering Managers, Product Managers, CxO’s… or mabye the AI will just be doing it all by the time we get around to it.

— Jason Arbon



jason arbon

blending humans and machines. co-founder @testdotai eater of #tunamelts