how to learn from failure

Abstract

A majority of the production failures (77%) can be reproduced by a unit test. They’re gaps and things we assumed. It’s the little things (0, off by one error, NaN) that triggers larger problems.

One of the major benefits of program verification is that it gives programmers a language in which they can express that understanding. These techniques are only a small part of writing correct programs; keeping the code simple is usually the key to correctness. Usually, programmers uses most of their energy on the hard part and miss the “easy” part (an exception, off-by one, check return value, etc…). This is called foreshadowing.

Muphry’s law: if you write something correcting or criticising the quality of someone else’s editing, proofing, spelling, grammar, etc., there will be some kind of editorial error in what you have written. Muphry’s law is a specific application of Murphy’s low, that anything that can go wrong will go wrong.

A good technique to spot errors: look at your code, look again, then present to your colleagues in a big screen, as often the errors are spotted on big screens.

Lots of systems independently correct operating together can make things go wrong. Context determines correctness in as much as you know there’s intrinsic correctness and there’s extrinsic correctness. It is right in its context, but is it for a larger context?

The connections between modules are the assumptions which the modules make about each other.

Early detection of configuration errors to reduce failure damage. What, then, is configuration? It is a formal structure for specifying how some aspect of software should run. Configuration is code!