Sometimes it helps me think about my work if I write it out on virtual paper. So, be warned that this post will probably not be very entertaining or educational. But, I do promise that there is an obscure reference to the movie "Kung-Fu Panda" hidden within.
Here's an interesting project I'm working on. The goal of the project is to effectively visualize the differences between two Bills of Materials (BOMs) from our ERP system. For those unfamiliar with BOMs, they are just a hierarchical structure of the components that make up a product being manufactured. A real nerd might call it an acyclic, directed, weighted (weights are the quantities) graph. But, it is easier to just think of it as a tree.
Turns out that it is pretty useful to be able to easily see what the differences are between two different bills of materials, especially for people who like to scratch their heads and figure out how to make the product cheaper, or what price to sell it for, or assign costs to products, etc.
The first thought I hatched when thinking about how to implement this was to see how close my needs were to the Java Structure Comparison tool in Eclipse. I have put many hours into driving this particular IDE, and it seems similar to what I wanted.
For those unfamiliar with it, you can take a class that is similar to this:
package com.iecokc.cooking;
public class CakeRecipe {
public String flour;
public String eggs;
public String frosting;
public void beatEggs() {}
public void stir() {}
}
And another class like this:
package com.iecokc.baking;
public class CakeRecipe {
public String flour;
public String eggs;
public void beatEggs() {}
public void stirBriskly() {}
}
and you end up with a navigable tree that shows only the differences, like this:
The problem with the comparison implemented in Eclipse is that it uses a strict definition of node identity in the trees that is compares. For example, if I had renamed the one of the classes from "CakeRecipe" to "BundtCakeRecipe", Eclipse would have refused to even consider that these might be similar in structure and therefore should be compared. Nope, different class name or method signature means that it is a completely different thing.
Instead, I needed to have my comparison consider the concept of something being "mostly equal", or "mequal" (trademark pending). If I can pin down this slippery concept of "mequals", I can then make a comparison structure that shows more than just adds and removes, but also edits of a particular node. It might look like the mock-up below (courtesy of Brent).
In the tree above, two BOMs (say, "A" and "B") are compared. The green nodes with a "+" next to them represent nodes that were added (in "B" but not in "A") and the red nodes with a "-" represent nodes that were removed (in "A" but not in "B"). The blue nodes with the pencil represent nodes that are in both BOMs, but are different in some way (a.k.a. - mequal).
Now, there are many nodes that are exactly equals in both BOMs. It might sometimes be useful to show them, and it would look like the gray nodes below:
So, how do I go about identifying the nodes that are mequal to each other between the two BOMs? Well, it is not just a function of the two nodes in consideration, but also a function of the structure that they contain, and the structure that they are contained in. For example, if there is a panel (however these are identified) in the top-level of each BOM, then these would probably be considered to be mequal. But, if there are two panels in BOM "A" and only one in BOM "B", then there is a choice to be made: which panel in "A" is more mequal to the one in "B"? Answering this question may need to consider the structure (child nodes) of the panels being analyzed.
Whew! This seems like it will be hard work. So, I'll search for libraries that already implement it. Since a BOM is a tree structure that could be represented in XML, maybe an XMLDiff library exists that I can leverage. Let me search... yup! There are about a thousand and one XML diff libraries, but I couldn't find a maintained one that worked on unsorted nodes and didn't rely on strict node identity.
So, I search the research publications. I particularly like some of the ideas from this paper. I'll charge blindly forward and try to implement it. The next few blog posts will be about the implementation (if successful).