When I first studied testing, I learned that a test involved comparison of the test result to an expected result. The expected result was the oracle: the thing that would tell you whether the program passed or failed the test. Especially for automated testing, we would look to a reference program as our oracle. This is a program that generates the expected results, rather than the results themselves.
In 1980, Elaine Weyuker’s On Testing Nontestable Programs shattered that view. Weyuker argued that “it is unusual for … an oracle to be pragmatically attainable or even to exist” (p. 3). Instead, testers relied on partial oracles. For example:
- A tester might recognize a result of a calculation as impossibly large even though she doesn’t know what the exact result should be. (You might not know offhand what 1.465732 x 2.74312 is, but if a program said 7,000,000 you could reject that as obviously wrong without doing any calculations.)
- A tester might recognize behavior as inappropriate, even if she doesn’t know exactly how the program should behave.
Weyuker’s paper wasn’t widely noticed in the practitioner community. At the Quality Week conference in 1998, I saw testers’ jaws drop open when Doug Hoffman explained this problem and its implications (A Taxonomy for Test Oracles). Doug explained the problem this way:
Suppose that we specify a test by describing
- the starting state of the system under test
- the test inputs (the data and operations you use to carry out the test)
- the expected test outputs
We can still make mistakes in interpreting the test results.