The Dense Linear Algebra computational pattern reminds me of a compiler optimization to reduce the memory access miss rate taught in computer architecture courses. They teach the same strategy of organizing the computation in a blocked manner as show in figure 4. Their argument is that it will improve spatial and temporal locality, which will reduce the number of cache misses. What the article brings to the table is that you can auto-tune the size of the block to obtain the right balance and you can use SIMD instructions to execute the same operation on multiple variables.
The Graph Algorithms pattern takes me back to the Algorithms course I took last year. For sure, one can't understand this if they haven't taken the Algorithms course. What's new to me is the partitioning operations. In other words, I haven't heard of Kernighan-Lin, Fiduccia-Matheyses, nor ParMetis algorithms. I can see how partitioning the graph structures can allow multiple, independent threads process them in parallel. If each thread can produce a subresult, I can also see how this could be used in dynamic programming algorithms.
The Monte Carlo pattern was difficult to understand at first but the Estimating Pi example along with some help from my wife (who's an actuary) made this more clear. Apparently the Monte Carlo method is used extensively by quantitative analysts to simulate the Black-Scholes model for predicting prices of derivatives.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment