Sunday, 29 May 2011

Show your working

I recently came across this post by John D Cook favouring crude software models over complex ones. I completely agree - there comes a point where bells and whistles on models add only false confidence. However I have a slightly different take on it.

Consider weather forecasting: behind the scenes we know complex models are necessarily at work, computing probability distributions, correlating known historical trends to the current data, and CFD simulation. Yet what is the end product of this complex statistical analysis? Clear-cut maps, yes/no answers. It will rain tomorrow between 1 and 3; Friday will be perfect for a barbecue; there isn't a hurricane on the way. Apparently if complex calculations produce complex answers we have to paper over them and produce artificially simple answers.

What use is a weather report when I don't know which conclusions are sure-as-dammit, and which merely more-likely-than-not?  I'm not the only who finds these false clear-cut answers unhelpful (and disingenuous). My wife is a medical student and has told me a few stories about the various technological aids they work with in the hospital. One is an electronic ECG analyser, that diagnoses whether a patient's readout has a variety of abnormalitites. I was immediately enthused - it sounds like a fascinating problem - and I asked how this incredible opus of statistical inference, inspired by the breath of the Gods themselves has revolutionised their pitiful Medieval healthcare practises: "The seniors recommend we don't trust it".

I was stunned. How could this be? A computer should be ideally placed to perform such a task. However the perception of the end users is that a human is a more reliable interpreter of the ECG. This message struck a chord with systems I've helped write in the past - specifically invoicing systems. In one case we spent 3 months tracking down a variety of bugs where the client believed the total of the invoice was out by a single penny. The net result of these errors was that the client started manually checking the total of every invoice - defying the point of the system.

So what's the solution?

I'm reminded of a recurrent conversation between myself and pretty much every maths teacher I ever had:
Teacher: You haven't shown any of your workings for any of the answers on this test! If your answer's not correct you could still get partial credit for having the right working.
Me: But were the answers correct?
Teacher: Yes...
Me: So what's the problem?
I was too young (and arrogant) to see it at the time, but the problem was that mistakes are easy to make, and showing your working is a safety net to reduce the consequences when you make them. This is especially true with complex algorithms where probability and statistics are involved.

When our algorithms show their working, and we dispute the answers they produce, we have the option of following the steps through ourselves. It may be that the algorithm made a duff decision, in which case the user can disregard its conclusions - or it may be that the users would have missed something that the computer nailed, in which case this has now turned into a learning exercise.

I was very much persuaded by the arguments in About Face that users relate to programs in similar ways to colleagues. To show your working is to explain your point of view - and be perceived as a source who sometimes has useful things to contribute, but sometimes make errors. Crucially, however, in either case you give the user the ability to decide whether to trust your judgement. Without this, you are a stranger, and the user has no means of deciding whether to trust you.

There are of course usability issues to consider in this. The "working" must be intelligible - this is not an excuse to return to splurging out pages of output on every button-click. However intelligent solutions can be found. For example in the ECG example, why not produce an output that superimposes onto the received signal a caricature representing how the program has interpreted it? For the invoicing example, why not provide (on click of a button) itemised calculations with sub-totals, exchange rates and rounding policies explicitly described? And I'm still holding out for weather reports with error bars.

Whatever the mechanism, when we go beyond spitting out a single number and provide a way for our users to verify the algorithm's calculations, we humble our programs, make them more transparent, and this is the root of trust.

Complex algorithms produce complex answers. 
Any attempt to simplify complex answers will involve losing something, often accuracy.
Your algorithms will not always produce the correct results. 
Show your working.

Update: John D. Cook provided some thoughts via twitter:
... I emphasize to students that their work must be PERSUASIVE and not just correct if they want it to be used
If the real goal is to persuade the user, showing your working is just one route to this - and if implemented poorly (e.g. splurging out pages of text) then it may fail to make a good enough case. We are putting forward an argument to the user, and the considerations for prose arguments (brevity, clarity etc) still apply.