Arod's Blog: September 2009

Wednesday, September 30, 2009

Our Pattern Language (OPL)

Indeed, this is just an overview of a language, OPL, for describing patterns in parallel programming. As the article listed, OPL has the benefit of guiding framework design, providing a consistent method for communicating concepts, and helping newbies become familiarized with parallel programming. For sure, the structure and organization of layers will provide context when we start diving into the more specific parallel programming articles. But for now, this just lists patterns within the OPL framework without describing them.

Tuesday, September 29, 2009

BA Chap.10 - Jikes RVM

Unless one is familiar with how Java Virtual Machines work, this is a difficult read. Even so, the benefits for having a self-hosting JVM doesn't seem apparent. It could be because I didn't fully comprehend the details. But from what I can understand, here are the benefits of Jikes RVM:

1. Simplifies the development model compared to using C/C++.
2. Allows developers to use language features in order to make better optimizations.
3. Helps detect & fix bugs in the JVM.
4. Provides the ability to user better libraries and abstractions.
5. Facilitates communication between the runtime and the application.

It's hard to see these benefits when the article doesn't explicitly mention success stories in industry. Sure they mention that hundreds of researchers in over 100 institutions are part the development community, but that's too vague.

What I wasn't able to understand was what is the difference between self-hosting and metacircularity? According to the article, they mean the same thing. According to Wikipedia, it means,

"A meta-circular evaluator is a special case of a self-interpreter in which the existing facilities of the parent interpreter are directly applied to the source code being interpreted, without any need for additional implementation"

Can anyone provide a better explanation?

Also, I was confused on the discussion of how they bootstrapped Jikes RVM. Are all JVMs written in C considered to be bootstrap JVMs within the context of creating self-hosting JVMs?

Finally, the most surprising thing I found about the article was that metacircular runtime systems have been around since Lisp and Smalltalk were developed. In addition, Just-In-Time compilations and Adaptive optimizations have appeared in Smalltalk since the 1980's. Now they're becoming mainstream with their inclusion in Java, Python, and the .NET framework. Interesting to observe that many of us view these as "new" concepts.

Saturday, September 26, 2009

Adaptive Object-Model

At first this was a difficult architectural pattern to understand because its' class definitions have a higher level of abstraction (e.g. Entity & EntityType). But after reading about the TypeObject pattern at http://www.comlab.ox.ac.uk/people/jeremy.gibbons/dpa/typeobject.pdf, the Property pattern at http://www.codeproject.com/KB/architecture/AOM_Property.aspx, and Entity Relationships at http://www.codeproject.com/KB/architecture/EntityRelationships.aspx, this made a lot more sense. My suggestion is for developers to get a good grasp on these patterns before they dive into trying to understand AOM.

The first thought that came to my mind was that AOM is pretty much database design. Its like the Entity class is a table that has a foreign key to the EntityType table. But this foreign key is actually a pointer to an instance (or typeObject) of the EntityType class. Data in databases are used by applications at run-time, which is analogous to meta-data being interpreted by AOM architectures. The only distinction the article gives is "The key problem with databases is attaching method to these objects". This leads me to conclude that the TypeObject pattern is more lot database design, but AOM includes interpreting and executing Behaviors and Business Rules.

I recently worked on a sub-system of an application that had the AOM architecture. All an administrator had to do was specify required fields in a form but when I looked at the code, I thought it was unnecessarily complicated. From the UI, it navigated through 3 classes where the implementation would read from the database to figure out what fields to display (e.g. Birth Time), how to display a required field (e.g. should it have an asterisk or a pound sign), and how to behave (Strategy Pattern. Should it throw an error message? If so, what message). The traditional way would've been to implement this statically as a class, but I soon realized how incredibly easy it was to update the form. All I had to do was add a new DB entry! We only provided the System Administrator limited ability to change the meta-data; more specifically, they had the power to change the required fields in a form. But we didn't want them to know that they could, theoretically, include new fields and change their behavior at run-time through simple DB modifications. Providing too much power to users brings the risk of them inadvertently putting the system in an unstable state. If they want a change in behavior, we can simply provide a DB-update script that would reflect the changes without having to recompile the code!

Thursday, September 24, 2009

Beautiful Architecture Chap. 9 - JPC: An x86 PC Emulator in Pure Java

Beautiful Architecture Chap. 9 - JPC: An x86 PC Emulator in Pure Java

I find the discussion of emulators vs. virtual machines to be somewhat confusing because the terms "emulation", "virtualization", and "simulation" all seem to be used interchangeably. Given how rapidly technology evolves to produce a solution that has the best of both worlds, we probably won't ever find a hard distinction between an emulator, a simulator, and a virtual machine. Nevertheless, it's important to understand their key differences at the moment and to dive into the architectural details of existing solutions. Here's a list of pros & cons of JPC and Virtual Machines in general.

JPC Pros
1. No dependency on the underlying hardware
- Can virtualize an x86 machine on any host that has a JVM
- Completely isolated from HW & SW platform changes; there's no need to change the OS or to have special hardware
2. JVMs are ubiquitous and considered to be one the most secured virtual machines
- JVMs guards against programming errors
- The 3 layers (JPC, JVM, Hardware) are completely independent from each other since they're made by different companies. Given that each have other general uses, they must have gone through some rigorous testing. Therefore, it's unlikely for a security threat to permeate through all these layers.

JPC Cons
1. Highly dependent of Java
- The process of trying to achieve optimal performance, the development of JPC let to workarounds specific to a Java environment.
- With talks about Sun going to be bought out in near future, Java's destiny is uncertain.
2. Still slower than a VM
- Currently executes code at 10% native speed.
- Article never talked about how it compared in performance with other VMs

VM Pros
1. Paravirtualization eliminates the extra level of indirection by making calls to the Hypervisor (as implied by Dan Orchard's blog).

VM Cons
1. As the article states, "You need hardware that is the same as that being 'virtualized'"
2. More dependency on HW/SW
- Paravirtualization requires changing OS or having the HW provide these capabilities
3. Security holes have been claimed
- Blue Pill & System Management Mode (SMS) Memory Attacks via Intel CPU cache Poisoning
- "HW supported x86 CPU virtualization has security vulnerabilities due to the shared L1/L2 cache of multicore chips"

Neither Pro nor Con
1. Virtualization products are typically designed for speed, not security.

In comparison to other emulators, Bochs (written in C++) needs to have built-in support for different operating system while JPC only needs support from a JVM. However, "JPC has to deal with the extra design restrictions and performance considerations for running under a JVM".

Awana81 pointed out that they didn't mention that it only supports a limited number of OSs as of today. Throughout the article, they celebrate the successful emulation of an x86 computer, which is everywhere, yet they haven't gotten it to boot up a WinXP desktop (which is probably the most common OS nowadays). They were only able to boot up to the command line prompt of DOS, many flavors of Linux, and legacy Windows OSs.

Overall, I can see several reasons why emulators like JPC can be useful. You can test software for mobile devices, embedded system and video game consoles before flashing the code into these devices. Also as described at http://www-jpc.physics.ox.ac.uk/applications_cloud.html , this could provide opportunities for cloud computing on idle desktops as opposed to large datacenters where it can save financial & environmental costs . JPC has partnered with NereusV to provide a way for people to donate CPU idle time of their computers by simply going to a webpage; nothing to be installed. Then, developers push their x86 PC software to these NereusV clients without any action by the host user. This is all done within the confinements of JPC, which adds a security level on top of the Java Applet Sandbox.

In the end, the usage of emulators and VMs boil down to this: virtual machines are mostly used for running different OS environments while emulators are mostly used to emulate embedded, mobile devices.

Wednesday, September 23, 2009

Big Ball of Mud

What's considered a big ball of mud is relative to some degree. Some may consider the code to be spaghetti like, while others may consider it to be a great piece of art. Nevertheless, there are several software architectures that most would agree is a big ball of mess. Why is this so popular? I think the main reason is because it's the easiest, fastest way to get code out the door. Managers don't usually like to invest time & money in making a product architecturally sound. A less common situation is when a company buys several small companies, and they stitch together their respective software products in hopes that it will provide the ultimate product. Unfortunately, this is extremely vulnerable to code duplication. More often than not, businesses should at least consider taking the hit and starting over.

What I particularly liked about this article is that it highlighted the forces that cause a big ball of mud. Sure we all know and complain that architecture takes a back seat to time-to-market demands. But on the flip-side of the equation, premature architecture can be risky since it might consume unnecessary resources and it can "discourage evolution and experimentation". Given today's economy, businesses are in great need of fast Return-On-Investments (ROIs). As the article mentions, what's the point of making your product architecturally beautiful if it's going to miss the market deadline and kill your business? Whether or not it was good style would be a moot point.

Finally, I'm a big believer in giving developers time to prototype before committing to a new project. This is because developers first need to get their hands dirty in order to obtain the domain experience they need to make good architectural decisions. Notice that the same reasoning applies as to why its important to first prototype a product of the Layers architectural style.

Monday, September 21, 2009

Beautiful Architecture – Guardian: A Fault-Tolerant Operating System Environment (Chapter 8)

By far, the saddest part was that their Tandem Beer Bust was destroyed! In all seriousness, it seems that the Tandem computers with the Guardian OS, collectively, provided a fault-tolerant environment. Although it highlighted the Guardian having process pairs (a primary and a backup in "hot standby" state) the hardware also gave way to much of this fault-tolerant environment with its multiple processors, multiple disk controllers, and multiple busses as shown on page 177. This provided their biggest distinction with conventional computers in that "no part of the system can fail without bringing down the system". In short, they really just provided redundancy in several areas of the hardware and OS. Redundancy, as we all know, leads to more costs but I was surprised that they didn’t mention the enormous power consumption that it required. If you throw in the fans that were located below the I/O controllers, then that's a lot of juice needed to keep the 6 foot processor cabinets cool. Also, redundancy doesn't ensure that data won't get corrupted if both primary and redundancy components fail.

I didn't really think there were any advantages to their naming conventions. I mean, when you have different formats for unpaired system processes, unpaired user processes, named user processes, and network-visible processes, it just becomes a real burden on the programmer. This lack of consistency leads to more bugs and security holes (as shown by the ability to steal the system's root password) that could ultimately lead to the demise of a system.

Thursday, September 17, 2009

Layers

When we develop most software systems, I think we subconsciously try to implement them using some variant of the layered structure because it's usually the most straightforward way to think about the problem. And I think we all know that by having many layers, it could introduce inefficiencies in some cases. But I don't think we can anticipate the other liabilities that the article describes until we get stuck during coding. For example, the liability of Cascades of Changing Behavior is something I missed several times in the past, mainly because it's difficult to predict what the implementation of the lower layer will entail. Its not until you're actually coding that you realize you need to make changes in every layer to accommodate errors that need to be propagated to the top (sometimes you do this because the service team wants to view lower layer error information at the GUI level). The "you don't know until you're actually doing it" argument can also be applied to the Unnecessary Work and Difficulty in Establishing Granularity Levels liabilities. I think applying a layers architecture is a good way to start designing, assuming that you can't think of a better architecture up-front. But once these liabilities begin to creep up, one may want to consider refactoring the code to conform to an alternative architecture. Or maybe it can be switched over to a Relaxed Structure like TCP/IP if it's an infrastructure system. As the paper says, "The main reason for this is that infrastructure systems are modified less often than application systems, and their performance is usually more important than their maintainability." One of TCP/IP's biggest philosophical complaints about layering is that it constrains data manipulation functions because optimizations of each layer have to occur separately (http://tools.ietf.org/html/rfc3439#page-7).

On a last note, this paper introduces us with some other architectural patterns such as the Blackboard Pattern, the Refactor Pattern, and the Microkernal Pattern. If you would like to take a closer look at these, check out this link http://www.vico.org/pages/PatronsDisseny.html

Wednesday, September 16, 2009

Beautiful Architecture – Xen and the Beauty of Virtualization (Chapter 7)

Interesting that this article mentions Bochs since I actually worked on a research project that used this emulator a few years ago. Overall, this article clarified several misconceptions I had about virtual machines and did a great job distinguishing between virtualization and paravirtualization. I should point out, though, that Virtual Memory was created to make programming applications easier and to make more efficient use of memory. Not necessarily to "ensure that processes cannot interfere with the data or code of other processes". A system would still need to ensure the integrity of data access in a non-Virtual Memory implementation. And just to briefly correct Chad's statement that "All the operating systems hosted by the hypervisor share the same virtual and physical memory". Though they share the same physical address, the guest OS's have their own virtual addresses and page tables. Each OS would ask the hypervisor if it could make an update in its own page table, and the hypervisor would then make sure it doesn't conflict with the physical address mapping of another OS's page table. The article quotes, "Xen must validate all updates to the page tables, and the kernel must inform the hypervisor when it wants to change any page table".

Tuesday, September 15, 2009

Pipes and Filters

Most of us are familiar with the Pipes & Filters architecture from previous experience with Unix and compiler programming. But it's great to read an article that explicitly states the advantages and disadvantages from using such an architecture with several variations that could be used in different contexts. The pipeline in computer architecture came to my mind quite frequently when I was reading this article but it made me realize how different it actually is. For one, the pipeline stages in computer architecture are not intended to be exchanged or recombined for future enhancements (at least I don't think so). Non-adjacent processing steps DO share information through feedback lines for branch prediction and other features. It doesn't allow different sources of input data (imagine a hacker being able to change the source of data at this level), and it always stores the final results in a single format. What are the similarities? The obvious is that you're moving data from one stage (filter) to another via intermediate hardware logic (pipes). But in both cases, you can also multi-process the steps in parallel. Pipelines in computer architecture are not just trying to process data streams from on step to another; they're also controlling the entire execution flow of a program which probably explains why it's more complex. Each step serves the sole purpose of processing its data as fast as possible (at the nanosecond level) for a very specific context. Filters, on the other hand, are designed to be used in different contexts.

Unfortunately, this is as close as I've been to developing an application that might make use of the Pipes & Filters architecture. But it wasn't even an application; it was a MIPS emulator that I implemented in Verilog.

Monday, September 14, 2009

Beautiful Architecture Chap. 6 - Data Grows Up

Mainly because of its popularity in the Internet, this article is the highlight of this book (it's the first bullet in the book's back cover). But even so, it's a great read (40 pages!) that shows readers step-by-step the reasoning behind the development of FQL, FBML, & FBJS, which was mainly to control the execution of external applications within the Facebook platform. It goes into great lengths describing how its architecture was molded to provide the data integrity that users expect. Yet, it has received a lot of criticism regarding their general approach to the use of people's info over the years.

Although the platform's API, FQL, FBML, &FBJS allows Facebook to restrict the usage of user data by third-party apps, the inverse doesn't seem to apply as demonstrated by their launching of Beacon in late 2007. Facebook can obtain user activity from external applications and post them in news feed. Users can deny confirmations for publishing information provided by Beacon, but there's "no option to prevent Facebook from storing and using information sent by Beacon". If they do decide to give users this option, it should be a fairly straightforward implementation; all they would need are the $user and $app_id parameters from the user. Considering that numerous security holes that have been discovered, users have a right to prevent Facebook from storing such info.

Here are some examples taken from http://en.wikipedia.org/wiki/Criticism_of_Facebook:

"On February 24, 2006, a pair of users exploited a cross-site scripting (XSS) hole on the profile page and created a fast-spreading worm, loading a custom CSS file on infected profiles that made them look like MySpace profiles."

"On April 19, 2006, a user was able to embed an iframe into his profile and load a custom off-site page featuring a streaming video and a flash game from Drawball."

" In July 2007, Adrienne Felt, an undergraduate student at the University of Virginia, discovered a cross-site scripting (XSS) hole in the Facebook Platform that could inject JavaScript into profiles, which was used to import custom CSS and demonstrate how the platform could be used to violate privacy rules or create a worm."

" On March 26, 2006, a user was able to embed JavaScript in the "Hometown" field of his profile which imported his custom CSS."

Notice how most of these occurred between 2-3.5 years ago so I think we can safely assume that Facebook has corrected most these. Nevertheless, Facebook needs to think beyond its own architecture for preventing other apps from compromising user data and be humble enough to realize that they can also, indirectly, be the source for data misusage. FBML & FBJS are great solutions that give them the flexibility to patch security holes in other parts of the platform while giving external apps the ability to run dynamic content. But we all know that no system is ever 100% secure.

Friday, September 11, 2009

Excerpts From Christopher Alexander

Since these excerpts were intended to describe patterns in the architecture of physical structures, here are my two-cents on how they would or would not apply in the software domain.

"On no account place buildings in the places which are most beautiful. In fact, do the opposite. Consider the site and its buildings as a single living eco-system. Leave those areas that are the most precious, beautiful, comfortable, and healthy as they are, and build new structures in those parts of the site which are least pleasant now."

Does this mean we shouldn't put a piece of functionality on the best part of an existing platform because it might cover up its attractiveness? Is this principle suggesting the motto "If it's not broken, don't fix it" ? Don't agree this principle applies to software because if we're trying to cover up some classes, for example through the use of the façade pattern, then we're trying to simplify the overall functionality of the system. And this doesn't mean you can no longer use the covered up classes because we promote extensions in object-oriented design. In the case of buildings, once you cover up the best part of the land with a building, you pretty much can't take advantage of it. Now this principle does have a good reason for building on top of the worst part of a platform; it may force developers to refactor crappy code while adding functionality. The fact of the matter is that once a developer hacks some code together, we shouldn't expect it to get fixed anytime soon unless you have a disciplined team like the guys from Design Town who mark their "fudges". Many managers won't invest resources to refactor code that "already works".

"Unless the spaces in a building are arranged in a sequence which corresponds to their degrees of privateness, the visits made by strangers, friends, guests, clients, family will always be a little awkward"

Does this mean we should develop an application with the right degree of privacy levels or else it will be awkward to use/extend from? Well, encapsulation is supposed to provide us with this ability so that we don't change the state of a system through some unsafe method. In terms of actual usage, a user will certainly be annoyed if an application has unnecessary levels of privacy restrictions while a federal agent would feel insecure using an application that does not provide encryption.

"When they have a choice, people will always gravitate to those rooms which have light on two sides, and leave the rooms which are lit only from one side unused and empty". Software modules will be used if they're multiple channels for communicating with them but this may not actually be ideal.

Does this mean we should develop software modules that provide more than one way to communicate with it? If so, this definitely does not apply to software because you rather provide a single, consistent interface that would enforce communication integrity as described in ArchJava. Users would also prefer a single, consistent method for using a system. In software, simplification is king. The principle of adding more "windows" can make a system unnecessarily complicated.

Thursday, September 10, 2009

Beautiful Architecture (Chap. 5) - Resource-Oriented Architectures

It's great to know that there already exists several ROA implementations such as Ruby on Rails, NetKernel, Django, etc. because this article has convinced me that RESTful architectures are the future. Although my understanding of Web architecture is limited and I've never heard of ROA, the approach of using logical named interfaces would allow us to not worry about concrete implementations and give us the reassurance that it will scale as new technologies come and go. This is because we can use the same name (that's easy to remember) to retrieve the same data in different structural forms such as text, images, or in applications as shown in Figure 5-5 of page 101. In addition, by restricting REST to 4 variables, it makes it easy to design any operation imaginable which leads to a more scalable, flexible, and extensible architecture.

I remember going through the steep learning curve of implementing a Java application that had SOAP messages with MTOM attachments and had to figure out how to write-up a WSDL; it was unnecessarily brittle. Sure there are several online examples of using web services to send SOAP messages, but when you run into complex issues, you may have understand the nuts & bolts of it.

Tuesday, September 8, 2009

Although ArchJava may be too stringent to be applicable in many scenarios, it's definitely a great extension to use when starting on a new project with multiple developers. First, it encourages team members to think about designing to interfaces via connections as opposed to concrete classes. Second, it attracts programmers to think about the problem in a hierarchical manner, which often leads to a simpler design. Just look at how they first broke up the Aphyds into a simple Model and View architecture. They later broke up the View into subcomponents such as CircuitViewer, PlaceRouteViewer, ChannelRouteViewer, etc, and then broke up the AphydsModel. They didn't worry about the details until they needed to. And finally, it prevents programmers from furtively making references to classes that clearly violate the Law of Demeter. In a previous game project that I worked on with a team, it often took me a few hours to figure out root cause of a bug because there were references to model data in at least 10 different classes.

Interesting that the article discussed how they decided to persistent components for the entire execution of the program instead of having a dynamically changing architecture due to screen refresh issues. We also encountered a similar issue when the game was restarted because the code would create new instances of the model data while the View continued to hold references to the old model data. We simply decided to re-initialize the model data.

Overall, the idea behind it is great if you want to make a software architecture explicit and want to avoid unnecessary references that can lead to bugs. However, I don't think many would use it for a couple of reasons. First, its restricted to Java. Sure you can create a version of this in .NET. But what would you call it? ArchNet? And who would take the time to do this? From looking at the ArchJava home page, it seems that they stopped working on this since June 2005 so I wouldn't count on the original team. More importantly, I don't think architects would take the time to learn the ArchJava syntax since they would pretty much have to write a good chunk of code, which can be cumbersome. Maybe if they built a GUI on top of this that would automatically generate the method signatures, connections, etc., then I think there's a better chance architects would use it.

Monday, September 7, 2009

Beautiful Architecture Ch 4: Making Memories

This is the first article that shows how a team architects a software solution for a real customer. I particularly enjoyed how it starts off by thoroughly describing the problem domain and the workflow for producing high-quality pictures. With this in mind, the team was able to focus on only those architectural facets that were relevant to their problem-space as opposed to making assumptions about what the product solution will need, which would be wasteful. Figure 4-1 on page 64 pretty much sums it up.

A benefit of reading an article like this is that it introduces me to existing concepts, technologies, and tools that could be useful when architecting a future project. For example, the mention of Spring and OSGi frameworks (can be used for encapsulating & declaring module dependencies) sparked my curiosity to learn more about them. And I can consider using Graphviz and Java NIO for a current project at my job.

One of the things that stood out was that the author's team from Advanced Technologies Integration (ATI) used Lean Software development. According to Wikipedia, it is "a translation of lean manufacturing principles and practices to the software development domain." Manufacturing principles can applied in the software domain? Six months ago, I attended a Lean Six-Sigma course but the instructor was unable to provide concrete examples of how this is applied in the software world. It turns out that Lean Software Development is a form of Agile that makes use of these principles but just calls them differently. For those that are interested, here are some examples from Wikipedia:

Waste in Software Development
* unnecessary code and functionality
* delay in the software development process
* unclear requirements
* bureaucracy
* slow internal communication

Lean Software Development Tools
* Seeing waste
* Value stream mapping
* Set-based development
* Pull systems
* Queuing theory
* Motivation
* Measurements

Wednesday, September 2, 2009

Beautiful Architecture Chapter 2

This paper seems to argue that one SHOULD focus on functionality initially when designing the architecture of a system as given in their Design Town example,

"Early in the design process, we established the main areas of functionality (these included the core audio path, content management, and user control/interface)"

but that you should "defer design decisions until you have to make them". Since there's only so much you can know upfront, I think a general understanding of the functionality would be sufficient to commence a simple architecture. Because of its simplicity, you can then focus on adding the quality attributes that were discussed in Chapter 1 of Beautiful Architecture. Assuming these first two steps were done correctly, you've now created a superstar architecture that could tack on new functionality with ease. Sounds like a cookbook recipe, doesn't it? But aren't we all trying to come up with a recipe for architectural design that works? At the very least, this provides a reference to work off of when you don't know where to begin or which paper's recipe to follow.

An interesting aspect of the Metropolis that I can relate to is when it said,

"the Metropolis started out as a series of separate prototypes that got tacked together when they should have been thrown away. The Metropolis was actually an accidental conurbation. When stitched together, the code components had never really fit together properly. Over time, the careless stitches began to tear, so the components pulled against one another and caused friction in the codebase, rather than working in harmony."

I'm at a company that has the constant habit of buying other small companies. Business leaders think you can just combine their software solutions and they'll somehow "magically" work together. The problem is that interoperability in some industries, like Healthcare, is harder than others (which explains why we don't yet have an integrated, National Healthcare System). Then I'm put in a situation where I'm adding new functionality in 5 different places from 3 different products using C++, C#, Java, and some other weird language. What a mess! So how can a business invest in the future through software reconstruction and, at the same time, deliver short-term products to keep itself financially healthy?

In Design Town, they mentioned that developers would mark technical debts on "fudges" that would be scheduled for correction at a later revision but how likely is that to happen in a business where everything is a priority that needs to get done now? I would expect most teams to ignore these fudges.

Arod's Blog