Something big happened in a baseball game last night that is causing a buzz in the sports world today. I think it’s related to a buzz in the world of software testing.
Armando Galarraga, a pitcher for the Detroit Tigers, was on the verge of pitching a “perfect game” — a game not only in which no batter of the opposing team gets a hit (a “no-hitter”), but in which no batter even makes it to first base. That means pitcher Galarraga would have had to outlast 27 batters trying to smack the ball into play. That’s some great pitching on his part along with some exceptional defensive support from his teammates.
Perfect games are rare. In the 134-year history of Major League Baseball, there have only been 20 perfect games. Two of them, amazingly, happened last month, which has never happened in one season.
And last night at 6 pm Pacific Standard Time, Armando Galarraga was set to be the 21st.
In the 9th and last inning, Galarraga faced one last batter: Jason Donald. Galarraga delivered a pitch and Donald connected. The ball was covered by Tiger first baseman Miguel Cabrera who was way off the base to field the ball, so pitcher Galarraga ran to cover the base that Donald was running for. Cabrera threw the ball to Galarraga in time to beat Donald by a full step before hitting first base in mid-stride.
But to everyone’s astonishment, first base umpire Jim Joyce called Donald safe! Being safe means Donald had made it to first base before the ball reached Galarraga’s glove, spoiling his perfect game.
As the crowd booed, Tiger manager Jim Leyland came out and argued with Joyce, but the call stood. The crowd then watched the instant replay which showed the Indians batter Donald out by a full step. Donald had not beaten the throw. He should have been out. Jim Joyce got the call wrong and everybody saw it.
But in baseball, even though umpire judgment calls can be argued, those calls rarely get reversed unless by another umpire who saw the play. It was hopeless. Furthermore, it was time to move on to the next batter, which Galarraga did — and subsequently got him out to end the game.
It didn’t matter that the Tigers won the game. The “perfect game”, a game in which Galarraga technically allowed no batters to reach first base — was spoiled even though the objective truth (according to the camera footage) showed that Galarraga did not allow Donald to safely reach first base.
Unlike other sports, the camera has no say in how baseball games are decided. In baseball, it’s the umpires that decide. It’s purely human judgment in the moment. Other sports allow appeals to officials if the camera shows a different story than what their ruling indicated. Not baseball. At least, not *yet*. After last night, that might change because this particular game had a bearing on some historical statistics that make baseball much more interesting for a lot of people to follow.
That judgment call by umpire Jim Joyce is now the topic of sports radio call-in shows, newspaper sports sections, and online blogs and articles all across the country today – how he got the call wrong, what the camera showed, if baseball should allow instant replay to influence the game, even how the call was handled by the pitcher, the umpire, the manager, and soon, the Commissioner of Baseball, who oversees everything in the sport.
How is this important to software testing?
There is a balance in baseball between what the camera sees and what the umpire sees. In testing, there is a balance between what the tester can test and what the computer can test.
In software, testers use their judgment. Machines have no judgment other than what they are programmed to do. They are programmed to execute and record, to render and calculate.
As it happened, about an hour before that game, I was talking with Michael Bolton and Ben Simo online about the term “exploratory test automation.” I had retweeted Elisabeth Hendrickson‘s post about a class she was hosting at Agilistry (called “Exploring Automated Exploratory Testing“).
Bolton, Simo, and I were discussing that title, trying to see if we could come up with something more accurate, because Elisabeth’s title seemed to be a contradiction-in-terms. How do you automate exploration when exploration is inherently human judgment and skill as we react to what we learn in the moment and automation is not? We were pretty sure we knew what she meant by the class, but how best to describe the interaction between machine and human?
It’s important to know that me, Michael, Ben, and my brother are people who believe in the power of language to convey ideas and meaning. We argue over precision and semantics because they communicate more than just words. We believe it is important to debate these kinds of things, openly, publicly, because it propels and provokes conversation about meaningful ideas that are meant to help all testers everywhere win more credibility and respect, much in the same way arguing baseball calls can evolve the sport.
So we traded ideas of how to describe the computer’s role in exploration. Since it was a public discussion on Twitter, people following that thread could chime in:
Michael Bolton’s idea was to call it “Tool-Supported Exploratory Testing” (proving to be a humorous, dyslexic TSET)
James wanted to flip the words and call it “Exploratory Automated Testing”
Oliver Erlewein liked “ETX” (and so did I) but doesn’t yet know what the X could be — it’s just cool.
Zeger van Hese suggested “Hybrid Exploratory Testing”
I offered the playful “Bionic Testing” after the Six-Million-Dollar Man.
Alan Page said it could simply be called “exploratory testing” and leave it at that because no matter whether your exploration was computer-assisted, it’s still exploration. James liked that and so did I.
But isn’t there a term or a phrase or a word that can more accurately and precisely describe the computer’s role in assisting testing?
Is it automation when you use a tool to help reveal a bug?
Is it automation when a machine executes a programmed test procedure?
Is it automation when you use Task Manager to see the list of processes in memory?
Is it automation when you execute Cucumber or Fitnesse (keyword-driven) tests?
What do you call it when you click a button on a test harness and it clicks on the objects on the screen for you and delivers a report at the end of the script?
If it’s all “automation”, doesn’t that imply that it needs no human intervention?
I think we can find a better term.
Everyone can agree that computers help exploration. Call them “probe droids” or “bots” or “tools” — they inform a human about things that are notoriously hard for humans to know on our own. They do things that are hard or slow or tedious or expensive or impossible for a human to do.
But we also know that it’s also impossible for software to test itself in all the ways we can test it — just like it’s impossible for a camera to replace umpires at baseball games. Computers and humans enhance each other.
Today in baseball, there’s a lot of energy and debate because of that game last night. Galarraga’s near-perfect game may lead to a major change in using replay in baseball games. The Commissioner of Baseball may even overturn Joyce’s ruling, meaning that the official record books would reflect a perfect game last night in Detroit.
Today in software testing, there’s energy and debate around the word “automation”, especially with more classes like Elisabeth’s and the more we talk about Test-Driven Design and tools on projects.
While baseball debates whether to use instant replay in helping to decide close plays , I’ll bet you if they decide to use it, they will not call it “automated baseball.” We testers *know* we use technology to help us with testing, I just think we can do better than “automated testing”.