Monday, November 2, 2009

Armstrong Thesis - Chap. 5

What I like most about Armstrong's approach to a fault-tolerant system is the clear separation between code that computes a task (worker processes) and code that handles error, exceptions, and/or failures. When these 2 things are combined, it makes the code more complex and more opportunity for bugs. I see this at my job all the time. Try/catch statements within try/catch statements within loops within conditional blocks etc ……. leads to very ugly code!

I'm a firm believer that simplicity is key to writing fault-tolerant systems. For this reason, I think the supervision hierarchies explained in this chapter is a good and intuitive strategy for having the system operate when there's an error in the system.

Also, Armstrong does a nice job explaining the differences between an error, exception, and a failure.
Error: "we will define an error as a deviation between the observed behavior of a system and the desired behavior of a system."
Exception: "Exceptions are generated automatically by the run-time system when the run-time system cannot decide what to do."
Failure: "If there is no 'catch handler' for an exception then the process itself will fail."

Thursday, October 22, 2009

Armstrong Thesis - Chap. 4

The author states that it is impossible to write concurrent code in a side effect free manner. Do you agree?

The author didn't exactly say it's "impossible" but he did say that it's not possible. I believe you can create side-effect free concurrent code, it's just much more difficult to verify that it's side-effect free.

Erlang provides primitives for performing concurrent action (modules/ports). What are the advantages/disadvantages of using this vs. libraries or OS capabilities? Have you found any particularly useful libraries?

Well, in an interview with Joe Armstrong http://www.ddj.com/linux-open-source/201001928?cid=RSSfeed_DDJ_OpenSource he says that Erlang processes are lighter weight than threads so this is an advantage over OS capabilities. I could see how having these primitives can help establish consistency in their usage and avoid multiple people from developing their own version of concurrent libraries. If you don't like what the language provides, you can always create a library that extends the built-in primitives.

Is it possible to totally abstract out concurrency? The author lists several advantages, are there any disadvantages?

There are different levels of abstraction. I think the examples Armstrong gave did a good job illustrating the separation of functional, sequential code from the non-function, concurrent code. If having the concurrency implementation in a different module & file from the functional implementation is sufficient to say "total abstraction", then Erlang clearly makes this possible. Loosely coupled systems almost always have the upper-hand so if there are any disadvantages, I would just say that it may make it a bit more difficult for new programmers to understand.

Fault tolerance is demonstrated via catching crashes and timing out replies. The author also mentions using supervisors to handle this case. Do you have any experience with either of these patterns for fault tolerance?

No unfortunately I've haven't had any experience with developing a supervisor that deals with the errors/crashes of worker processes/threads.

The author lists several of the abstractions held entirely in the server module and aided by Erlang. Are any of these abstractions ONLY possible in Erlang?

Hot code loading that can be used for dynamic upgrades comes to mind as shown in the "swap_code" function of Figure 4.7.

Would there be any difficulties of following the error handling recommendations (let another process fix it, supervisors, let it crash) in another language than Erlang?

You would have to create & run an additional server process to do the error handling. For e.g., you would have to create and compile ErrorHandlingServer.java or ErrorHandling.cpp into different jars/executables and then run them. More coding would be required for communication than Erlang.

How important is it to use intentional programming when it comes to maintenance? (The author gave the example of history of the Dictionary API)

Very important but at the same time, it's difficult to assess whether certain names imply different implementation. In many cases, only time will tell whether you need to break up a method into several methods with more specific names as was the case with the lookup --> fetch, search, is_key example.

Wednesday, October 21, 2009

Dense Linear Algebra, Graph Algorithms, Monte Carlo

The Dense Linear Algebra computational pattern reminds me of a compiler optimization to reduce the memory access miss rate taught in computer architecture courses. They teach the same strategy of organizing the computation in a blocked manner as show in figure 4. Their argument is that it will improve spatial and temporal locality, which will reduce the number of cache misses. What the article brings to the table is that you can auto-tune the size of the block to obtain the right balance and you can use SIMD instructions to execute the same operation on multiple variables.

The Graph Algorithms pattern takes me back to the Algorithms course I took last year. For sure, one can't understand this if they haven't taken the Algorithms course. What's new to me is the partitioning operations. In other words, I haven't heard of Kernighan-Lin, Fiduccia-Matheyses, nor ParMetis algorithms. I can see how partitioning the graph structures can allow multiple, independent threads process them in parallel. If each thread can produce a subresult, I can also see how this could be used in dynamic programming algorithms.

The Monte Carlo pattern was difficult to understand at first but the Estimating Pi example along with some help from my wife (who's an actuary) made this more clear. Apparently the Monte Carlo method is used extensively by quantitative analysts to simulate the Black-Scholes model for predicting prices of derivatives.

Sunday, October 18, 2009

Armstrong thesis - Chap. 2

Chapter 2 of Armstrong's thesis is describing how to build fault-tolerant systems with very demanding requirements. The solution is to implement modules as isolated processes with no sharing of resources, message passing communication to achieve concurrency and error detection, and a fail-fast policy when a process crashes; similar to how processes in operating systems work. But it seems that in order to achieve all of this, you need the right hardware, the right OS, and a programming language that allows developers to implement fault-tolerant systems that support all of these requirements. They're asking for a lot from a system so I'm curious to understand the details to their approach. They made several references to Tandem computers so I wonder if their approach is similar to providing redundancy at every level.

The chapters mentions that programming languages can't be used for building robust systems because they're not able to "isolate software components from each other". Well, couldn't I just simply create new processes in C++ or Java to create this isolation? Or maybe a language such as Erlang provides the ability to develop an application where the underlying system in which it runs will separate different modules into processes (haven't read Chap. 3 of the thesis yet). If this is case, then it's true C++ nor Java have frameworks that separate their modules into separate processes. At least none that I know about.

I didn't understand how adding more processes could not affect the original system. I mean, everyone knows that adding more processes can consume a lot of CPU and memory resources. Unless they mean that sufficient hardware should exist for supporting 100,000 processes! Also, how are object-oriented languages not true asynchronous messages. Many of them provide the ability to specify a call-back so that programs don't block on an RPC call. What does a COPL have that an OO language doesn't with respect to this?

Erlang's support for hot code loading is one I've never heard about but I can see its enormous potential if you can combine it with an AOM system. Not only would you be able to keep your system running during upgrades but this could realize an extremely flexible system.

Saturday, October 17, 2009

Event Based

I think the article could have given a formal definition of Implicit Invocation within the context of Event-based systems. Here is one I found from "An Introduction to Software Architecture" by David Garlan and Mary Shaw of Carnegie Mellon University:

"The idea behind implicit invocation is that instead of invoking a procedure directly, a component can announce (or broadcast) one or more events. Other components in the system can register an interest in an event by associating a procedure with the event. When the event is announced the system itself invokes all of the procedures that have been registered for the event. Thus an event 'implicitly' causes the invocation of procedures in other modules."

This paper http://www.mach-ii.com/resources/intro_to_implicit_invocation.pdf does a better job explaining the Event-based pattern although it doesn't explain how it could be used for parallelism. Also it says that Event-based is an Architectural Style, implying that it's not a pattern. It says that design pattern such as MVC implement architectural styles. So how do we draw the line between what's a pattern and what's an architectural style? I always thought architectural styles was a subset of patterns.

Friday, October 16, 2009

Pipes & Filters, Layered Systems, Iterative Refinement

So far the pattern descriptions from the Berkeley wiki seem to provide a reasonably good summary. By no means is it comprehensive but if one is familiar with the Pipe-n-Filter & Layered patterns, then it's straight forward. The Iterative Refinement pattern, though, was a hard read. By rereading this article twice, one can get a high-level understanding. VHDL was the first thing that came to mind for applying iterative refinement but I was a bit disappointment that not enough known use examples were given.

It seems to me that it would be easier to implement reentrant applications in Pipe-n-Filter and Layered architectures as opposed to some others, simply because they promote more local state dependencies and less shared state dependencies between the filters/layers (because they promote the modular ability to interchange filters/layers with ease). And this reentrancy would allow additional parallelism on top of the parallelism that could be achieved through pipelining. As the article states, pipeline parallelism can be achieved along a linear branch while task parallelism can be done for tasks along parallel branch.

On a side note, the description of how a filter cannot invoke/control another filter up-stream or down-stream is a great example of showing how it's data abstraction but not necessarily object-oriented programming. Most students, including myself, have a hard time thinking about data abstraction without thinking about objects encapsulating data.

Finally, the Layered systems article says "Increasing abstraction increases code readability and project organization" and I'm not sure I agree with this. Usually when you increase abstraction, it makes it harder for one to read the code because you're constantly going through all these redirections in order to search for functions that do actual work. Smalltalk and the Adaptive Object Model are examples that come to mind. Although one can argue that Smalltalk makes up for all these redirections by promoting methods with a few lines of readable code.

Wednesday, October 14, 2009

CHESS

A deterministic solution for finding Heisenbugs? Is this too good to be true? Looking around in the Internet for 15 minutes and I could not find another debugging solution that claims to provide exhaustive test coverage of all possible interleavings of a multi-threaded program and still be able to reproduce a discovered Heisenbug. Given that it was produced my Microsoft researchers, it only works for the Win32, .NET, and Singularity frameworks because CHESS relies on understanding the semantics of the program in order to produce preemption points for possible interleavings. So a Java implementation of CHESS would need to understand Java's semantic so that it can put preemption points whenever it make calls to the thread libraries or makes system calls. Though they wouldn't release the source code for CHESS, I don't see why someone can't reproduce this in Java using the same ideas. Would it be considered "stealing" the ideas from Microsoft? I wonder if Microsoft will market this to lure more developers to use .NET.

At first, I was skeptical about how they could possibly cover all scenarios but they relied on several heuristics to minimize the # of possible interleavings. They limited the areas where preemption can occur to synchronization points and at instructions that participate in data races.