Arod's Blog: Refactoring Sequential Java Code for Concurrency via Concurrent Libraries

Q1. Sometimes it is possible to retrofit parallelism by refactoring an existing sequential application. Other times the whole system needs to be re-architected for parallelism. What are the advantages/disadvantages of these approaches? When would you do one over the other?

The advantage of retrofitting parallelism in an existing sequential application is that the amount of work can be very minimal relative to re-architecting the whole application. On the flip-side, re-architecting the whole application can give the opportunity to provide a solid foundation for supporting parallelism, therefore, reducing the amount of work and errors associated with retrofitting parallelism. I would apply retrofitting when the time-to-market of an application is crucial and when only a small component of the application needs parallelism. However, if several components of the application are starting to smell do to the ugly code produced by retrofitting, then the team should consider re-architecting the application. Especially in the case when the team has some time to invest, re-architecting the application can save time in the long run.

Q2. There are many libraries that target concurrency and parallelism (e.g., Java's ForkJoinTask, Microsoft's TPL, Intel's TBB, OpenMP, MPI, etc.). What are the advantages and disadvantages of using parallel libraries?

In the case of the ForkJoin Task, it's included in Java 7 so no need to install/download additional packages. Also, due to the programming ease of Java, it seems straight-forward using Java's concurrency library from semantics point-of-view.

Q3: The approach presented in the paper puts the programmer in the driving seat: the programmer selects a snapshot of code, and a refactoring. The process is semi-automated, but not fully automatic.
Compare and contrast the refactoring approach with a fully automatic approach, like the one used in automatic parallelizing compilers.

The semi-automated approach forces the programmer to think more about what parts of the code should be parallelized and which parts should not. Along the same lines, it can provide more granularity, will simplify the implementation of an automated system, and will prevent an automated system from bloating the refactored code with unnecessary code, in which case the programmer would have to go back and remove the excess.

Q4: Some of the transformations presented in the paper make the code harder to understand (e.g., Converting recursion to ForkJoinTask).
What are some approaches to unclutter the parallel code?

In the ForkJoinTask conversion, I would create methods getLeftTask & getRightTask that return a Left task & Right task, respectively, of type SortImpl. This would be used to replace lines 30-36 of Figure 5 with:
SortImpl task1 = getLeftTask();
SortImpl task2 = getRightTask();

Q5: The paper presents three automated refactorings to make a program more parallel. By no means is this a comprehensive list of refactorings. What are some other refactorings that you have applied, or have seen in other projects?

At my work, I developed an application for deploying software to multiple computers from a server. Initially, it was a sequential deployment but because remote transfer & installation has unpredictable delay time for each remote machine, I had to refactor the code deploy the software in parallel. Basically, I just ran a loop that spawn a thread to deploy the software and blocked until they all came back.

Q6: The paper presents some empirical evaluation to support the claim that these automated refactorings are useful. What are some other factors that you would have liked to see evaluated?

I would've liked to have seen this strategy being applied to 3D games.

Arod's Blog

Tuesday, October 6, 2009

Refactoring Sequential Java Code for Concurrency via Concurrent Libraries

No comments:

Post a Comment

Followers

Blog Archive

About Me