Monday, November 2, 2009

Armstrong Thesis - Chap. 5

What I like most about Armstrong's approach to a fault-tolerant system is the clear separation between code that computes a task (worker processes) and code that handles error, exceptions, and/or failures. When these 2 things are combined, it makes the code more complex and more opportunity for bugs. I see this at my job all the time. Try/catch statements within try/catch statements within loops within conditional blocks etc ……. leads to very ugly code!

I'm a firm believer that simplicity is key to writing fault-tolerant systems. For this reason, I think the supervision hierarchies explained in this chapter is a good and intuitive strategy for having the system operate when there's an error in the system.

Also, Armstrong does a nice job explaining the differences between an error, exception, and a failure.
Error: "we will define an error as a deviation between the observed behavior of a system and the desired behavior of a system."
Exception: "Exceptions are generated automatically by the run-time system when the run-time system cannot decide what to do."
Failure: "If there is no 'catch handler' for an exception then the process itself will fail."