2010-10-10 19:45:56Z
So, a mathematician will go on about:
A Graph.
That is connected.
That has no cycles.
now we have a tree, though not hierarchical yet
That is directed.
That defines a single unique root node.
That enforces a unidirectional flow of directed edges always away from the root node.
bla bla bla
And a DB coder will go on about:
It’s a tough dataset to get to Normalized-7
Don’t put node info in the tree structure
Trigger the hell out of it to enforce node deletion database integrity
Atomic! Atomic! Atomic!
So what are the real properties of a Hierarchical Tree? I was baffled by the reading I did, so here’s a try:
What is it for?
What do you do with it?
How do you look at its data?
What changes the most?
What it is for
Bills of Materials (for the complete product including sub-assemblies,) Corporate Documents of Authority, Mind Maps (all the rage,) Corporate Organizational Charts (Org Charts) Directories, Genealogy, Taxonomy …
What you do with it
Different from what it is for, trust me, I’m a budding meta-mathematician. You store things in a database of record. You organize your thoughts on the subject. You get everyone on the same page. You make training easier. You design things more easily. You go “AHA!” You say “Oh … I always wondered what they meant.”
How you look at its data
The real problem is that it is a O(kN) problem. Not even as polite as O(N2) on N as depth … to me, depth is what is useful, not how many nodes.
Tree View
There are tree view widgets in GTKmm, in Windoze.NET, Qt and all them things. These all stem from that old old Xerox/UPARC GUI experimental stuff. The first one I ever saw was the on the old Macintosh … the first one … in about 1986. The whole damned thing was so easy that I didn’t know that I should have fallen off of my chair because of the gravity of it all.
Icy Things Diagrams
A very tedious, incredibly non-scaling thing that makes smaller and smaller boxes as the detail fills in to the right. I think that governmental places like this one.
Sun Things Diagrams
A slightly less tedious, also non-scaling (though not as bad) thing that makes smaller and smaller sectors on circles as you go from the root (at the center) to the leaves (at the perimeter) of a big smiling sun circle. Gotta love those run-on sentences.
A powerful tool for some data types. Basically if you call a sub-tree a complete subset of its parent then a Venn Diagram is the most intuitive description of this. It cleverly reduces the order-of problem since the circle gives us an O(N2) capacity as you proceed radially from the center. That still leaves us with O(kN/2) though, which is still fundamentally no better.
Rectangular Tree Map
A wonderful diagram to look into the depths of hell. This makes it sound bad, but I think that when shading and smooth zooming is used it is really powerful.
Hyperbolic Tree Map
Go to the ocsigen website … it’s too cool.
This is God’s own answer to the problem of knowledge navigation. There is a Stallmanesque war on about this one since, apparently, Xerox holds patents on the thing. Go give it a try, it’s simply wonderful to do the mind-map thing.
What changes the most
Well, now I have to get really specific. First there are nodes that actually hold data. Mr. Celko is right, of course (in fact I call him a real breath of fresh air) in saying that there should be no data other than the tree’s minimum-sufficient structure in the tree table. The node is a record set that is merely foreign-keyed from the tree on another table (or a pointer to a class if you’re a coder.) A homogeneous tree can have one table with node data. A heterogeneous tree needs … duh … as many tables as there are node types, plus the node-to-table-type crosser.
Sometimes, maybe, when you change structure of the tree, you change node contents. So the high-structure/low-node change combo is moot. I also think that the Bible model (neither the content nor its relationships change) is all wrong since the person who writes that Bible wants to do it efficiently in those first seven days. It is conceivable, I suppose, that the low-structure/high-node combo is somewhat plausible, but according to my Bible reasoning, that first day of setting up the structure needs to be nimble too.
So, I say, always make both node changes and structure changes blazingly fast. No compromises or Anti-Knuthian early optimization. My point is …
Everything Changes, count on it.
Storage Methods
Linked List
Multiply linked list, in fact. The direction matters, parent/child and sibling. Does a parent keep the pointer to the first child or the array of children, or is it available immediately to the children? Moot if you keep a pointer to the parent where the parent has the single list of siblings.
Array
The array is totally appropriate for siblings. One hard fast rule of the htree is that the root must be the one and only root. Roots don’t have siblings. so the pointer to the parent is null. likewise it’s sibling array would be a length-1 array.
The not-obvious point is when you want to go through the whole dang thing fast. Another array comes to mind. A brute-force tree traversal winds up being, generally, an easy-peasey O(N) on node count. The practical reality is that a linked list tree walk will be twice as long as just a straight line search through a linear array.