Automation: your check mate
According to Keith Braithwaite, principle consultant with Zuhlke Engineering, much of the routine checking that makes up the bulk of testing is ripe for automation, freeing up testers for more important, high-value work.
To obtain the highest confidence in a software system at reasonable cost, in terms of correctness and robustness, that system must be tested. Other ways of gaining high confidence do exist but are rarely economic given the state of the art. That may change as the capability and ease of use of formal verification techniques continues to improve.
Right now though and for, say, the next five years at least, testing will remain a keystone activity for systems builders. A shame then that so much testing effort is wasted, and that so many opportunities to gain greater value from testing are missed.
How is testing effort wasted? It is wasted by devoting so much human attention to checking, rather than testing. Checking is a valuable activity, so it’s good that so much of it is done. But it is also exactly the kind of activity that people are bad at, and so it’s a shame that so many of them spend their working day engaged in it. Checking involves sustained close attention to many repetitions of almost identical actions. That’s hard for people to do well, but easy for machines.
Go and look at your testing department, you’ll find that a lot of their activity is really checking. Showing that the system as built conforms to a specification is checking. All verification that a system provides outputs in agreement with the specification when provided inputs that the specification states are valid, is checking. All regression testing is checking, checking that the behaviour of the new system agrees with that of the old. Again, this is good and valuable activity, but people shouldn’t be doing.
Checking can, and should be, highly automated. A very few characteristics of a system, particularly those surrounding the user experience, are hard to automate. Others are not. All ‘functional’ requirements can be checked automatically, that’s pretty much what ‘functional’ means: the output can be described as a mathematical function of the input. It might not be trivial to check that the functions which the system actually implements are the ones which the specification requires, but it is possible, and desirable.
Go look at your test organisation again. How much of the checking, now that you recognise checking, of functional correctness is being done through the external interface of the system (the user interface, if there is one)? This is a very common and very inefficient way to determine that the computations done by the system provide the expected output for acceptable input.
Current good engineering practice recommends that those components of a system which sit at its periphery and have as their main responsibility interaction with other systems (the user is a system) should not also have responsibly for functional processing. The UI should have no business logic in it. Servlets exposing a web service should have no business logic in them. Instead, the computation that produces functional correctness should be held in some interior components.
This is not to say that the ‘domain model’ beloved of object-oriented programmers is always the right answer. Those interior components could very well be manifested as database tables and procedural code (although that hopefully not in stored procedures), or a declarative rule set, or a collection of transformations to a data stream. The technology and architectural stance can vary widely but the overarching principle is to observe that a system has an inside and an outside with a surface between them (please, not a ‘stack’ with a ‘top’ and a ‘bottom’). That surface has transducers on it that mediate communication across the surface of the system, and functional correctness with respect to the problem domain lives inside the surface; perhaps far inside.
A checker (are you getting used to that by now?) who can only determine functional correctness through the external interfaces of the system, via those transducers, is like an eighteenth century doctor trying to diagnose a patient through purely external observations.
Of course those transducers, such as UI components, must be checked themselves. But this is really the job of a UI toolkit developer or vendor. “Does the list box correctly display the entries in its backing data structure? Does it correctly pass on events representing user gestures?” These questions and other like them should have been answered long before that UI toolkit was accepted for use in building a system. The better focus for the checker’s attention is: does the logic correctly update the backing data structure when a certain kind of event takes place? This requires access to the internals of the system in the checking environment.
Again, how many of your checkers are running through scripts? Howe many of them are doing this by hand? I hope very few. The majority of commercial ‘test automation’ tools really provide a facility for automated checking at the system boundary by stepping through scripts. That’s much better than manual stepping through scripts, but stepping through scripts is not especially valuable in the general case. Sometimes it is required. Sometimes correctness of a system relative to a specification is all about sequences of steps, sequences of question/answer interactions at the system boundary. But not always. Stepping through the process of entering a deal into a trading system and then stepping through the process of the end of day batch job is a very inefficient way of finding out if the pricing engine gives the right answer for that trade.
If scripts are not always the answer, what to do in the other cases? Recall that what we check is functional correctness and the word “functional” gives us a clue. A mathematical function can be expressed as a certain kind of mapping between two sets of values, the domain (usually corresponding to inputs) and the range (usually corresponding to outputs). The function itself is then a subset of the Cartesian product of these two sets.
In all but the very most trivial cases this set is intractably vast. This observation is the origin of the oft-repeated claim that no amount of testing (including checking) can prove the absence of defects. Of course, that is literally false: one can calculate the exact amount of testing which would prove the absence of defects. Doing that immediately shows that there is no possible way of doing that amount of testing. Try this for yourself—imagine you are to check the multiplication function in your favourite language. Take the square of the number of distinct numerical values that a variable in your language can hold, this is the size of the set that the multiplication function exists in. Could you possibly do that many checks, even for so simple a function?
In fact, through the use of large multi-processor machines such exhaustive checking has been tried in very simple cases and did find some interesting defects that might have been hard to find any other way. But you can’t afford to do it for your system, I confidently assert.
However, those who work as checkers have a lot of expertise in finding interesting cases to check, to finding cases that are strongly representative of a large equivalence class. Of finding cases that sit near the awkward corners of both the problem domain and the proposed solution embodied by the system. It is possible to intensively check those illustrative examples of desired system behaviour. And with smart automation which, like modern diagnostic equipments, can look inside the ‘patient’ the checking of those examples can be very quick.
What opportunities are being missed? ‘Checking’ is by definition checking against a specification or requirement. Let’s say that we are going to gain confidence in the correctness of a system by intensive automated checking. We know from Royce’s original waterfall process paper of forty years ago that putting what he calls testing at the end of the development lifecycle “is risky and invites failure”. Royce recommends that testing starts as early as possible, certainly before development is complete. And, as it happens, he recommends that coding starts before design is complete and that design starts before analysis is complete and that analysis starts before requirements gathering is complete and he recommends to do these activities more than once so that we can learn through experience.
It’s a shame that more of the authors of software development standards that cite the waterfall process seem not to have read Royce’s paper, where he tells us that a waterfall process is largely bogus and what to do instead.
Anyway, in the limit of Royce’s recommendation we might very well want to start testing (checking) very early. Very, very early. There is a great opportunity here. Since checking is checking against a requirement and smart checking is against illuminating examples of required behaviour, then producing examples to check can become part of the requirements activity.
If we do this, then several advantageous side effects arise. It becomes hard for requirements to be vague, since an example must be explicit. It becomes hard for requirements to be dropped in to a document without thought because producing examples requires careful thought. It becomes hard for requirements to be hard to check because we can even make it a project rule that a requirement without examples to check is not accepted.
The checkers are now up at the front, at the business end (literally, the business facing end) of the development process, bringing their skill and expertise at finding illuminating examples to bear at exactly the point where survey after survey shows that project after project goes off the rails—during requirements capture. Most projects fail, and most projects that fail, fail not because the developers did a bad job of building the system (although that does happen) but because they turn out to have built the wrong system.
If requirements come with (in advanced teams, requirements are) examples to check, and if smart tooling allows examples to be checked quickly, reliably and intensively, then a much smarter way of running projects becomes possible. This can extend as far as reporting project status.
Count the examples crated: this measures something like scope to be delivered. Count the examples that check against the system as currently built: this measures something like scope delivered. The first time I had a project report to its board this way as it happens the news was not good but a senior project manager on that board said afterwards that this was the first status report from a project he had ever seen that he believed. And because the board believed the report they were able to respond and assist that team to improve. Which they did, and they had the evidence to show it.
That’s a lot about checking, so what is left for testing? Michael Bolton has written extensively about this distinction and he characterises testing as what happens when “we’re trying to find out about the extents and limitations of the product and its design, and when we’re largely driven by questions that haven’t been answered or even asked before”
Note that part about questions that haven’t even been asked before. Testing isn’t about the requirement, it’s about things that aren’t in the requirement. Checking an example gives a binary result, the output was as expected or it wasn’t. Doing a test can give any kind of result at all, but mostly doing a test results in learning. It might result in learning a more interesting question to ask next time.
This is not a kind of activity that lends itself well to automation. Machines don’t (yet) have the imagination required to figure out new and interesting ways to stress a system, nor to usefully interpret the very wide range of results that might ensue. Bolton cites James Bach who characterises testing as the action of “questioning a product in order to evaluate it” and notes that this is not a quality assurance process (checking is) but a process which informs quality assurance.
Testing is a very high value activity that requires very high value people to do it. And they will be free to if all the checking is automated away from them.




