Arod's Blog: October 2009

Thursday, October 22, 2009

Armstrong Thesis - Chap. 4

The author states that it is impossible to write concurrent code in a side effect free manner. Do you agree?

The author didn't exactly say it's "impossible" but he did say that it's not possible. I believe you can create side-effect free concurrent code, it's just much more difficult to verify that it's side-effect free.

Erlang provides primitives for performing concurrent action (modules/ports). What are the advantages/disadvantages of using this vs. libraries or OS capabilities? Have you found any particularly useful libraries?

Well, in an interview with Joe Armstrong http://www.ddj.com/linux-open-source/201001928?cid=RSSfeed_DDJ_OpenSource he says that Erlang processes are lighter weight than threads so this is an advantage over OS capabilities. I could see how having these primitives can help establish consistency in their usage and avoid multiple people from developing their own version of concurrent libraries. If you don't like what the language provides, you can always create a library that extends the built-in primitives.

Is it possible to totally abstract out concurrency? The author lists several advantages, are there any disadvantages?

There are different levels of abstraction. I think the examples Armstrong gave did a good job illustrating the separation of functional, sequential code from the non-function, concurrent code. If having the concurrency implementation in a different module & file from the functional implementation is sufficient to say "total abstraction", then Erlang clearly makes this possible. Loosely coupled systems almost always have the upper-hand so if there are any disadvantages, I would just say that it may make it a bit more difficult for new programmers to understand.

Fault tolerance is demonstrated via catching crashes and timing out replies. The author also mentions using supervisors to handle this case. Do you have any experience with either of these patterns for fault tolerance?

No unfortunately I've haven't had any experience with developing a supervisor that deals with the errors/crashes of worker processes/threads.

The author lists several of the abstractions held entirely in the server module and aided by Erlang. Are any of these abstractions ONLY possible in Erlang?

Hot code loading that can be used for dynamic upgrades comes to mind as shown in the "swap_code" function of Figure 4.7.

Would there be any difficulties of following the error handling recommendations (let another process fix it, supervisors, let it crash) in another language than Erlang?

You would have to create & run an additional server process to do the error handling. For e.g., you would have to create and compile ErrorHandlingServer.java or ErrorHandling.cpp into different jars/executables and then run them. More coding would be required for communication than Erlang.

How important is it to use intentional programming when it comes to maintenance? (The author gave the example of history of the Dictionary API)

Very important but at the same time, it's difficult to assess whether certain names imply different implementation. In many cases, only time will tell whether you need to break up a method into several methods with more specific names as was the case with the lookup --> fetch, search, is_key example.

Wednesday, October 21, 2009

Dense Linear Algebra, Graph Algorithms, Monte Carlo

The Dense Linear Algebra computational pattern reminds me of a compiler optimization to reduce the memory access miss rate taught in computer architecture courses. They teach the same strategy of organizing the computation in a blocked manner as show in figure 4. Their argument is that it will improve spatial and temporal locality, which will reduce the number of cache misses. What the article brings to the table is that you can auto-tune the size of the block to obtain the right balance and you can use SIMD instructions to execute the same operation on multiple variables.

The Graph Algorithms pattern takes me back to the Algorithms course I took last year. For sure, one can't understand this if they haven't taken the Algorithms course. What's new to me is the partitioning operations. In other words, I haven't heard of Kernighan-Lin, Fiduccia-Matheyses, nor ParMetis algorithms. I can see how partitioning the graph structures can allow multiple, independent threads process them in parallel. If each thread can produce a subresult, I can also see how this could be used in dynamic programming algorithms.

The Monte Carlo pattern was difficult to understand at first but the Estimating Pi example along with some help from my wife (who's an actuary) made this more clear. Apparently the Monte Carlo method is used extensively by quantitative analysts to simulate the Black-Scholes model for predicting prices of derivatives.

Sunday, October 18, 2009

Armstrong thesis - Chap. 2

Chapter 2 of Armstrong's thesis is describing how to build fault-tolerant systems with very demanding requirements. The solution is to implement modules as isolated processes with no sharing of resources, message passing communication to achieve concurrency and error detection, and a fail-fast policy when a process crashes; similar to how processes in operating systems work. But it seems that in order to achieve all of this, you need the right hardware, the right OS, and a programming language that allows developers to implement fault-tolerant systems that support all of these requirements. They're asking for a lot from a system so I'm curious to understand the details to their approach. They made several references to Tandem computers so I wonder if their approach is similar to providing redundancy at every level.

The chapters mentions that programming languages can't be used for building robust systems because they're not able to "isolate software components from each other". Well, couldn't I just simply create new processes in C++ or Java to create this isolation? Or maybe a language such as Erlang provides the ability to develop an application where the underlying system in which it runs will separate different modules into processes (haven't read Chap. 3 of the thesis yet). If this is case, then it's true C++ nor Java have frameworks that separate their modules into separate processes. At least none that I know about.

I didn't understand how adding more processes could not affect the original system. I mean, everyone knows that adding more processes can consume a lot of CPU and memory resources. Unless they mean that sufficient hardware should exist for supporting 100,000 processes! Also, how are object-oriented languages not true asynchronous messages. Many of them provide the ability to specify a call-back so that programs don't block on an RPC call. What does a COPL have that an OO language doesn't with respect to this?

Erlang's support for hot code loading is one I've never heard about but I can see its enormous potential if you can combine it with an AOM system. Not only would you be able to keep your system running during upgrades but this could realize an extremely flexible system.

Saturday, October 17, 2009

Event Based

I think the article could have given a formal definition of Implicit Invocation within the context of Event-based systems. Here is one I found from "An Introduction to Software Architecture" by David Garlan and Mary Shaw of Carnegie Mellon University:

"The idea behind implicit invocation is that instead of invoking a procedure directly, a component can announce (or broadcast) one or more events. Other components in the system can register an interest in an event by associating a procedure with the event. When the event is announced the system itself invokes all of the procedures that have been registered for the event. Thus an event 'implicitly' causes the invocation of procedures in other modules."

This paper http://www.mach-ii.com/resources/intro_to_implicit_invocation.pdf does a better job explaining the Event-based pattern although it doesn't explain how it could be used for parallelism. Also it says that Event-based is an Architectural Style, implying that it's not a pattern. It says that design pattern such as MVC implement architectural styles. So how do we draw the line between what's a pattern and what's an architectural style? I always thought architectural styles was a subset of patterns.

Friday, October 16, 2009

Pipes & Filters, Layered Systems, Iterative Refinement

So far the pattern descriptions from the Berkeley wiki seem to provide a reasonably good summary. By no means is it comprehensive but if one is familiar with the Pipe-n-Filter & Layered patterns, then it's straight forward. The Iterative Refinement pattern, though, was a hard read. By rereading this article twice, one can get a high-level understanding. VHDL was the first thing that came to mind for applying iterative refinement but I was a bit disappointment that not enough known use examples were given.

It seems to me that it would be easier to implement reentrant applications in Pipe-n-Filter and Layered architectures as opposed to some others, simply because they promote more local state dependencies and less shared state dependencies between the filters/layers (because they promote the modular ability to interchange filters/layers with ease). And this reentrancy would allow additional parallelism on top of the parallelism that could be achieved through pipelining. As the article states, pipeline parallelism can be achieved along a linear branch while task parallelism can be done for tasks along parallel branch.

On a side note, the description of how a filter cannot invoke/control another filter up-stream or down-stream is a great example of showing how it's data abstraction but not necessarily object-oriented programming. Most students, including myself, have a hard time thinking about data abstraction without thinking about objects encapsulating data.

Finally, the Layered systems article says "Increasing abstraction increases code readability and project organization" and I'm not sure I agree with this. Usually when you increase abstraction, it makes it harder for one to read the code because you're constantly going through all these redirections in order to search for functions that do actual work. Smalltalk and the Adaptive Object Model are examples that come to mind. Although one can argue that Smalltalk makes up for all these redirections by promoting methods with a few lines of readable code.

Wednesday, October 14, 2009

CHESS

A deterministic solution for finding Heisenbugs? Is this too good to be true? Looking around in the Internet for 15 minutes and I could not find another debugging solution that claims to provide exhaustive test coverage of all possible interleavings of a multi-threaded program and still be able to reproduce a discovered Heisenbug. Given that it was produced my Microsoft researchers, it only works for the Win32, .NET, and Singularity frameworks because CHESS relies on understanding the semantics of the program in order to produce preemption points for possible interleavings. So a Java implementation of CHESS would need to understand Java's semantic so that it can put preemption points whenever it make calls to the thread libraries or makes system calls. Though they wouldn't release the source code for CHESS, I don't see why someone can't reproduce this in Java using the same ideas. Would it be considered "stealing" the ideas from Microsoft? I wonder if Microsoft will market this to lure more developers to use .NET.

At first, I was skeptical about how they could possibly cover all scenarios but they relied on several heuristics to minimize the # of possible interleavings. They limited the areas where preemption can occur to synchronization points and at instructions that participate in data races.

Tuesday, October 13, 2009

Refactoring for Reentrancy

I'm glad this article used test cases as examples where reentrancy can be used because earlier this year, I ran into issues running JUnit tests on a multi-threaded application that depended on each execution to be independent from the other. I had to manually re-initialize persistent data after each run. Now I at least know one can wrap static variables with ThreadLocal and use TestRunner to spawn threads that utilize their own copy :-). Aside from testing, other applications for reentrancy are in interrupt handlers where interrupts can be dispatched to multiple cores.

Given that I have limited experience with using reentrancy for creating multithreading applications, it appears to me that the solution does provide a solution that makes a program reentrant. Ideally, all variables would be local so as to avoid the whole transformation of static variables but it's understandable that this is used for legacy code. However, the transformation makes the code look really ugly in this simple example so I'm having a hard time figuring out how this would scale in an enterprise application.

On a side note, I don't think object fields need to be made thread-local if a new instance is created for each execution. Also, I don't think shared resources such as standard output or a printer can be thread-local. In the paper, they circumvented this by outputting to a logger as opposed to standard output. In cases where it's necessary to output to a printer or standard output, one may have to introduce a mechanism for serializing the output in a predictive order. And finally, reentrancy does imply thread-safety because multiple threads can execute the same code without interfering with each other. But thread-safety does NOT imply reentrancy because thread-safety cannot guarantee a method would return the same output for a given input each time its executed.

Saturday, October 10, 2009

BA Chap. 14 - Rereading the Classics

One of the reasons I decided to pursue my career into Software Engineering is that developing software gives one the opportunity to create something of practical use in a relatively short time, even if it's programming something as simple as a time clock. But I admit that I've fallen in the trap of spending too much time trying to make UML diagrams as beautiful as possible, and resulted in delivering incomplete code. This article is sort of a wake up call to those who are more concern about decorations and ornaments as opposed to features that are more pragmatic.

Having said that, I found the article's comparison of languages such as Smalltalk, C++, Java, & Python to be very informative because it gives a better explanation of the advantages & disadvantages of each of these languages from an object-oriented perspective. For example, every newly defined class in Smalltalk is inherited from another class, which is in clear violation of the "Favor Composition Over Inheritance" principle. Interfaces in C++ is limited to defining abstract classes whereas in Java, one can explicitly define interfaces. Despite Smalltalk's limitations, though, having a purely object-oriented language "allows us to limit the number of syntactic constructs in the language". Lisp is another example of a purely object-oriented language where program and data are "lists".

I do remember half-a-decade ago that I was annoyed by the fact that Java did not support templates. Then they came out with Generics in SE 5.0 but this didn't calm my frustration because I still had to define a type through an interface as the article pointed out. Because of this, one could argue that Java is slightly limited in its support for polymorphism.

The biggest take away was that the argument of "statically typed languages being more protective of coding errors" is weak because the most difficult errors are actually run-time bugs. Syntax errors are straight-forward to figure out, especially with the use of IDEs such as Eclipse and Visual Studio. Appropriate run-time error messages for syntax errors and a call-stack should be sufficient for most debugging purposes.

Tuesday, October 6, 2009

Refactoring Sequential Java Code for Concurrency via Concurrent Libraries

Q1. Sometimes it is possible to retrofit parallelism by refactoring an existing sequential application. Other times the whole system needs to be re-architected for parallelism. What are the advantages/disadvantages of these approaches? When would you do one over the other?

The advantage of retrofitting parallelism in an existing sequential application is that the amount of work can be very minimal relative to re-architecting the whole application. On the flip-side, re-architecting the whole application can give the opportunity to provide a solid foundation for supporting parallelism, therefore, reducing the amount of work and errors associated with retrofitting parallelism. I would apply retrofitting when the time-to-market of an application is crucial and when only a small component of the application needs parallelism. However, if several components of the application are starting to smell do to the ugly code produced by retrofitting, then the team should consider re-architecting the application. Especially in the case when the team has some time to invest, re-architecting the application can save time in the long run.

Q2. There are many libraries that target concurrency and parallelism (e.g., Java's ForkJoinTask, Microsoft's TPL, Intel's TBB, OpenMP, MPI, etc.). What are the advantages and disadvantages of using parallel libraries?

In the case of the ForkJoin Task, it's included in Java 7 so no need to install/download additional packages. Also, due to the programming ease of Java, it seems straight-forward using Java's concurrency library from semantics point-of-view.

Q3: The approach presented in the paper puts the programmer in the driving seat: the programmer selects a snapshot of code, and a refactoring. The process is semi-automated, but not fully automatic.
Compare and contrast the refactoring approach with a fully automatic approach, like the one used in automatic parallelizing compilers.

The semi-automated approach forces the programmer to think more about what parts of the code should be parallelized and which parts should not. Along the same lines, it can provide more granularity, will simplify the implementation of an automated system, and will prevent an automated system from bloating the refactored code with unnecessary code, in which case the programmer would have to go back and remove the excess.

Q4: Some of the transformations presented in the paper make the code harder to understand (e.g., Converting recursion to ForkJoinTask).
What are some approaches to unclutter the parallel code?

In the ForkJoinTask conversion, I would create methods getLeftTask & getRightTask that return a Left task & Right task, respectively, of type SortImpl. This would be used to replace lines 30-36 of Figure 5 with:
SortImpl task1 = getLeftTask();
SortImpl task2 = getRightTask();

Q5: The paper presents three automated refactorings to make a program more parallel. By no means is this a comprehensive list of refactorings. What are some other refactorings that you have applied, or have seen in other projects?

At my work, I developed an application for deploying software to multiple computers from a server. Initially, it was a sequential deployment but because remote transfer & installation has unpredictable delay time for each remote machine, I had to refactor the code deploy the software in parallel. Basically, I just ran a loop that spawn a thread to deploy the software and blocked until they all came back.

Q6: The paper presents some empirical evaluation to support the claim that these automated refactorings are useful. What are some other factors that you would have liked to see evaluated?

I would've liked to have seen this strategy being applied to 3D games.

Monday, October 5, 2009

BA Chap. 12 - When the bazaar sets out to build cathedrals

Since I've never worked with a community on an open-source project, I always wondered how developers from around the world are able to effectively work together to develop great software without a direct manager to report to, and without compensation. Sure there are some professionals that get paid by their employers for contributing code to these open-source communities. But even at my current job where my company invests vast amounts of resources do I have trouble getting things done when I work with foreign teams. The article said it best with, "Discussions that can be finished in 15 minutes in a stand-up office meeting may take days of arguing". It seems that they get their best work done when they meet in their quarterly/bi-annual/annual meetings and have these week-long coding sprints. But even then, what do they do when a contributor doesn't provide what he/she commits to do or when he/she quits? I find it astonishing that work actually gets done due to personal motivation, and that their pride alone can provide better quality code because developers are not rushed to meet a deadline.

Overall, the article's chronological description of their communities thought process throughout the evolution of Akonadi & ThreadWeaver was intellectually stimulating. I initially did not understand why they didn't Unit Test the PIM infrastructure. Automatically testing the core functionality to see if anything broke by a change in PIM could have saved them time. Then again if the code-base was very large, the effort to do unit-testing may not have been worth it. I originally thought that applying the adapter or façade pattern to abstract the support of multiple backend implementations would've solved the assumptions that made about limited data and application usage. But then I realized that it wouldn't have solved their concurrent access to data requirement. Also, I didn't understand why they didn't use a DB indexes for their data when they discussed their need to load the whole address book into memory. Then, they later explained that their data had to support multiple types so it had to be stored in the DB as string. Indexes doesn't help in this situation so Akonadi was implemented to support serializer plug-ins for components.

Threadweaver, similarly, had some clever architectural features in their job queue for concurrency. The queue made sure jobs were executed in sequence through the notion of Dependencies without the nasty use of a mutex!

Saturday, October 3, 2009

Fork/Join Parallelism

The first question that comes to mind is why do we need a framework for parallel computing? Doesn't the operating system in conjunction with hardware already provide this support for us? In the case of Java Fork/Join, doesn't the JVM provide some parallelization oomph ? Well, the article explicitly states that this framework is intended for programs that execute subtasks in parallel and, therefore, can not rely on the shortcomings of pthreads and the java.lang.Thread class from the perspective of task parallelism computing. It's worth restating their deficiencies as follows:

1. Thread synchronization and management is generalized to accommodate the different types of parallel computing so it unnecessarily blocks threads in task parallel programs.
2. Tasks are too coarse-granular that it "limits opportunities for exploiting parallelism".

But aren't there automated tools with configurable parameters that one can use for parallelization? This investigation led me to following results from Wikipedia:

"Automatic parallelization by compilers or tools is very difficult due to the following reasons:

* dependence analysis is hard for code using indirect addressing, pointers, recursion, and indirect function calls;
* loops have an unknown number of iterations;
* accesses to global resources are difficult to coordinate in terms of memory allocation, I/O, and shared variables.
"

For these reasons, we have the POSIX Thread library, Cilk, OpenCL, and CUDA among other programming models. And why Java? It provides more portability to fork/join programs since they can run on any machine that has a JVM. And one will not have to worry about porting the fork/join framework since it'll be included in Java SE 7 as listed in http://www.javaworld.com/community/node/3458. According to http://www.javaworld.com/javaworld/jw-12-2008/jw-12-year-in-review-2.html?page=4, there is a ParalleArray class that represents an array where you can perform operations such as filter, map, and apply on the array's data items in parallel. The ForkJoinPool class will provide the threads to perform these concurrent operations and "can automatically take advantage of increasing processor-core counts in the future without modifying the functions". This will be particularly useful for 16-32 core machines since SE 6 only functioned well for 4-8 core machines.

BA Chap. 11 - GNU Emacs

When I first began exploring the various applications that came with Fedora Core 3 in 2004, I originally thought Emacs was an over-bloated text editing application, with meaningless features. But reading this article made me realized how much I missed out!

One of the coolest features is its ability to pipeline commands for text to be navigated, searched, organized, trimmed, and sorted all at once. As the article explained, this is possible due to their Model Design of text buffers. Also, it provides support for text terminals while simultaneously managing graphical frames. These are features you can't do with Visual Studio nor Eclipse.

Using the MVC pattern is a natural choice for text editing applications. It proved to be particularly useful in the case of Emacs since their Controller code, in isolation from the Model & View, was almost done entirely in Emacs Lisp. I haven't written code in any language related to Lisp but, apparently, Emacs Lisp provides the following advantages:

1. Making small changes/additions requires little programming effort compared to other strongly-typed languages. "You can write a useful command in under a dozen lines"
2. Can test expressions in the Emacs buffer for immediate evaulation. This is similar to how you can test PHP & Python regular expressions at websites such as http://re.dabase.com/
3. As a unique programming language, it's powerful enough to make sense of thousand-line programs.
4. It doesn't crash when run-time bugs are encountered but it does report the errors. Since I don't know the technical details, I wonder how they handle null pointers and the avoidance of data corruption. I would think one would prefer to crash the application then to continue running and corrupting data.

My only concern about the architecture is that every function in a package is visible to other packages. What happened to Encapsulation? Wouldn't these lead to complex dependencies if functions just started referencing functions from other packages at will? Also, I was expecting the article to explain the Model's complexity in its section, "How complex is the Model?". But all it really does is explain how it doesn't support stylessheets, among other features, as in the case of Microsoft Word documents.

Lastly, I did enjoy reading about how it compares & contrasts with Eclipse & Firefox. Although it bashes Eclipse for all the boilderplate code necessary to implement a simple plug-in, from an Eclipse user's standpoint (not from a developer perspective), this is a straight-forward and convenient architecture. And the author took the opportunity to credit Emacs for Firefox's successful architecture since the Emacs project may not be around for too long.

Arod's Blog