The Future Tester

jason arbon
9 min readDec 26, 2019
Credit: Microsoft’s Concept — Future vision 2020

“I wonder if the baristas deliberately slow things down to make it feel special…”

“You are always looking for trouble Angela!”

“Once a tester, always a tester. Just a healthy bit of skepticism Dave. Thanks for the latte, it is a great way to start the day.”

AI Assistants

Both Angela and Dave’s watches blinked with the same simple message: “New build, low risk, verifying. — the bots.”. Angela smiled. She knew Dave was thinking the same thing. Angela used to be the first, and last, in the office, always a little frazzled, always a bit stressed, worrying about everything that could go wrong with their team’s new social app. There just never was enough time to think clearly.

“I remember feeling guilty learning JAVA at the office when I could be testing something.”

They both laughed.

“So, Dave, I saw the design sketch of your latest idea. I tried, but couldn’t figure out who that feature is built for, and it doesn’t feel smoothly integrated with the rest of the app flow — it looks like a jarring interrupt to the UX. Maybe if you only showed it when the user needed it…”

Both their watches flashed again. “New feature. Similar to ‘Instachat likes’ but slower.”, with a picture of new ‘like’ button, that looked more like a link than a button.

“Sneaky Dave, you got caught!”.

“Wasn’t me, but I’ll dig into it. It is one thing for my team to make a blatant feature copy, but it is painful to be called slower than Instachat!” Dave tapped his watch to accept an auto-generated quick design review meeting for this afternoon with his team and Angela. “The bots catch everything, no peace for developers anymore!”

Dave said, “The mice will play when the cat’s away, I better get back to the mothership.” They’d named the office ‘the mothership’, back when they seemed to spend every good hour of the day coding and testing, fixing and verifying, with every handoff tracked by a database. Angela could have stayed sipping coffee, but she knew Dave would appreciate the company, and she hated to waste one of few tables for just herself.

Back at base, Angela grabbed her machine off the counter where she’d left it and headed to one of the few old-school desks near the window that remained. The mothership was ‘post-open-workspace’ — the hip thing today was to have couches, beanbags, and all variations of standing and sitting desks deliberately grouped among alcoves. Lots of monitors hung on the walls, scrolling through different metrics, and ready for sharing screens. First-come first-serve. People never knew where they would sit that day. The alcoves helped keep the noise level down. There was a lot more discussion and a lot less JIRA these days.

Angela’s screen fired up as soon as she sat down. Her monitor glowed with what looked like a dossier. She was a secret agent hunting down the truth. There were images and videos of the app screens before and after the change to the like button, and videos of clicking it were playing in a loop. Below that was the blame zone — the headshot of Dave’s new intern, of course, she’d got used to seeing his face lately. A code diff was on the right, but that’s not what caught her attention.

“COMPUTER, show me more like buttons” she half-asked, half-commanded quietly. Angela named her bot ‘COMPUTER’, just like the one in old Star Trek shows — it gave her extra geek cred and it made her feel like she was on the bridge. Small videos of the like buttons in similar social apps appeared in a grid on her screen. The most popular apps in the upper left corner, the ones nobody ever heard of on the bottom right. One hundred little videos cycling in step showing clicks on like buttons in the various apps were dizzying. The grid confirmed what she thought — the new like button looked more like a link than a button. It didn’t look like the like buttons from the best apps, and if anything, looked more like a few of the less-beloved apps in the bottom-right.

Angela circled a grid of top apps, tapped her own app’s like button, then said “COMPUTER, file case for like button. Priority 2, blocking.” The bot replied, “Yes captain, filed and assigned to Dave”. Angela didn’t realize it, but the bug report had already been created, just not committed. It didn’t create new work without human approval.

“Now, why are we so slow…” Angela mumbled to herself. “COMPUTER, compare our like button network traffic with InstaChat’s. The machine quickly pivoted to a timeline view showing the sequence of API calls under the hood, both within the apps, and out to their servers. The COMPUTER had a note that Dave’s team was using an older version of a networking library, and InstaChat was using a different one altogether. “Interesting, she whispered”.

Why was their like button so slow to update after clicking? Angela noted that the speed of their App’s server calls were surprisingly similar to InstaChat’s. “Ruled that out…” She thought. There! Almost embarrassingly, she noted there was a ‘sleep()’ call in the application code, with a comment ‘//Quick hack for testing’. She tapped the sleep() call, and frustratingly called out “COMPUTER, file case for ‘Intern!’”. “Yes Captain”, it responded with little emotion.

“OK, we need to add additional checks for all this.” Angela said to herself. The bot interface has a recommended list of things to do. Angela noticed the bot had already prepared to add 23 new test cases for general ‘like button’ functionality and ‘user interface checks’. The bot had scanned the open test case repository for test cases referencing ‘like buttons’. The tests coded by other testers were stored in an application-independent format. The only wiring needed to reuse and run these tests on Angela’s app were already generated. The bots had created neural net classifiers for the various like button states and had pre-trained a reinforcement based model for execution. Even better, the bots already spun up this version of the application and showed Angela a quick list of the test results. Scanning the list, Angela saw two failures that wouldn’t be applicable to her app as those tests also loaded the article outside of the stream to verify the counts — her app didn’t have a social stream like those old school ones. She quickly tapped to remove those tests from her runs and said: “Thanks, COMPUTER, add those tests to the regression suite.” “By your command”, the bot said in a voice harking back to Battlestar Galactica.

Not only is this fresh code, from an intern, its already proven buggy and is user-visible so she knew she’d need to add a few more tests and stress this out a bit. “COMPUTER, show me all app flows that contain a like button”. The screen instantly showed a graph-like view of multiple steps through the app. Angela saw one path were the test bot clicked the like button, wandered off somewhere else in the application, then came back and clicked the like button again. This turned ‘off’ the like button visually but did not decrement the like count.

“COMPUTER, I need a reproduction instance of flow #4 at step#8”. Almost instantly the screen displayed a live interactable view of the application just as the bot had seen it the first time. Step 8 was the step before the like button was clicked a second time. She tried several different scenarios of clicking a third, fourth, 15th time even.

Angela then made a quick request to ask Dave’s designer a question over video chat, sharing the new test she’s just created. “What were you expecting to happen in these scenarios?”.

“Um, yeah. Sorry, we hadn’t thought of those scenarios. Thanks, I’ll get back to you and Dave when I have a bit more design on these interactions.”

If only he took more pride in his job, Angela thought. “Maybe I should be a designer and avoid all this pain in the first place” she grumbled inside. But, she shelved that thought quickly because then no one would be catching her mistakes, and she was the highest-paid person on the team. She continued a bit more exploratory testing and didn’t find any new issues but all of her interactions were saved for future regression on these like buttons with a quick tap.

Angela knew she this wasn’t going to be enough to make sure this was tested fully, so she clicked the ‘Reinforcements button’, which automatically synthesized the data in front of her into a simple report’, then “COMPUTER, please send this off to the crowd testers, localized to just the U.S for now, exploratory testing, use best judgment.”.

‘COMPUTER, deploy automated exploratory bots as well.’

“By your command” the bot again confirmed as requests for testing help were sent out to U.S testers who had previous experience testing ‘like’ buttons, and were highly rated, with links to the same app flows that Angela was interactively exploring a few minutes ago. Hundreds of simple exploratory bots spun up in the cloud. Some trained to behave like normal users. Some trained to break things with corner cases, repetition, and speed. With a quick command, Angela was able to spin up the equivalent of a hundred testers on a single feature. With a quick command, she could just as easily spin them all back down.

New Communication

Now, the part of her job she loved more than coffee on Wednesdays with Dave. Angela tapped on a button she had whimsically personalized to read “Holodeck”. The InstaChat’s lead tester, Priya, who used to test like buttons on her app was nudged away from cat videos by his bot, accepting the invite. A new window appeared on Angela’s screen with a list of past recordings with the tag ‘like button’ on the right, pictures on the top like buttons and videos of interactions along the bottom. Priya’s face materialized on to the screen.

They’d talked once, perhaps 6 months ago. The two first shared common tester angst of ‘why so many bugs?’, then laughed that bugs are what kept them employed. Priya said she too caught many bugs related to application state around like buttons when InstaChat first added them to their app.

Priya shared, “The biggest issue, actually, it became a business issue too, was that likes could sometimes be lost when they were clicked by many people at the same time. Not only was the count incorrect, but later, when InstaChat started paying people for likes, it turned into a class action lawsuit”.

Angela reflected that It was awesome the testing community had finally realized, just like open-source, that testing methods and cases and even folklore were far better shared than kept close to the vest. With networking effects, when a company like FedEx shares all the interesting addresses it has discovered for testing, they receive many more test cases from other testing experts such as interesting money test inputs from Bank of America testers, or concurrency tips and utilities from the Google Ads test team. FedEx would even get a few cases it hand’t thought of from their friends at UPS and the Post Office.

Angela then sent a quick, low priority message to Marla, the VP of Product asking “Hi Marla, are we ever planning to monetize likes like InstaChat?” Marla got back right away because Angela’s questions were always important — and she always learned something herself.

“Hmmm, hadn’t thought of that Angela, but, yes, we should assume that likes will be monetized in the future from an engineering perspective. Please make sure likes are robust!”

“Thanks Marla, will do.”

After a moment, Marla added, “Sometimes I think you are running this whole company!”.

Continual Learning

“Hi, all!” Angela said as she joined the virtual call with 12 other folks who had joined before her — they didn’t have a snack. It was great to see her old team still working together in spite of working one or two buildings or states away. The time she invested in herself and her team to learn these new tools had paid off well.

“Any one like ‘like’ buttons?!?” She quipped! The next 30 minutes were spent exchanging ideas on testing buttons, concurrency, scale, and monetization. They were all so much better off than 2 years ago in their 3 clusters of 4 cubes off in the corner. Angela ironically was still using that same desk off in the corner, but almost nothing else remained the same.

Also learning and growing were the bots. Silently, they evolved and updated models to track and predict bugs, code, and test cases. The bots tirelessly made these predictions and shared their suggestions and findings while the humans were drinking coffee. They learn from every interaction. They even continually share and merge learnings about ‘like buttons’ automatically between different app teams and companies. More and more, the only difference between the machines, is that Angela’s meetings take 30minutes, the machine conversations only last about 300 milliseconds.

Angela wraps up the meeting with “Sometimes I’m not sure who’s getting smarter faster: me or the machines!”. They all smiled and knew they all had a lot more time and money for coffee.

--

--

jason arbon

blending humans and machines. co-founder @testdotai eater of #tunamelts