[N.B. I've moved this discussion to the OHS development list...]
In message <3A525655.FF59E087@eng.sun.com>, Eric Armstrong writes:
>In conversation with Eugene, he's been asking
>the question: Are multiple parents for a node
>really necessary? Basically, if that concept
>adds complexity, maybe we should plan on not
>implementing it.
>
>I have felt that it was vital, but so far have
>not presented a compelling enough use case to
>settle the argument. This message is an attempt
>to do so. (And it led to another interesting
>insight that prompted me to jot down my thoughts.)
I'm sort of on Eric's side on this one. I do believe that content
re-use is so fundamental as to require some kind of means of managing
multiple parenthood. Of course as has been pointed out, with
node-level versioning this can get complicated.
Now, there is a simple way of handling this without causing many
problems (I believe) and that is to conceptually separate the nodes
from the paths used to access them (this is a strategy used in Scene
Graphs such as Inventor etc. where reuse of scene components for
essential for efficiency).
So, now I'd finally like to insert myself into the design discussions
in a serious way. Let's assume that we want to build a system that
allows (at minimum) the integration of arbitrary XML documents. What
I'll outline below gets us nearly there (and could easily be used as a
framework upon which to build DOM or SAX interfaces).
Let's walk through some simple IDL:
----
interface Node {
readonly attribute ID id;
};
interface ElementNode : Node {
readonly attribute string tag;
readonly attribute unsigned int numChildren;
Node child (in unsigned long index);
Attr getAttrFromName (in string name);
readonly attribute unsigned int numAttrs;
Node attr (in unsigned long index);
};
interface TextNode : Node {
readonly attribute string text;
};
interface Path {
readonly attribute Node node;
readonly attribute Document document;
readonly attribute Path parent;
Path firstChild ();
Path lastChild ();
Path previousSibling ();
Path nextSibling ();
};
interface Document {
readonly attribute Path root;
};
----
So, what we've done above is to separate the structure of our tree
into a set of *placeless* Nodes (some of which have children) and a
Document tree defined in terms of Paths which define where Nodes are
*placed* in a Document.
Given this basic structure, we can navigate easily and nodes need have
*no parent whatsoever*. All of the tree-based relationships are
defined wrt. the Path used to access a Node and not by the Node
itself.
Now this division of labour does introduce some issues
(and usually simple ways to resolve them).
1) Node IDs: Since we give each node a unique ID, searching for a node
is most easily done be providing a mapping from Document->Node via
the ID. We can extend Document as follows:
interface Document {
...
Path getNodeFromID (in ID id);
};
At a level higher than documents, we will also need to provide a map
from
2) Attribution: Clearly it is the nodes themselves which should be
tagged with creator/contributor tags. Thus:
interface User;
interface Timestamp;
interface Node {
...
readonly attribute User creator;
readonly attribute UserList contributors;
readonly attribute Timestamp creationTime;
readonly attribute Timestamp modificationTime;
};
3) Versioning: Again, it is the nodes which are the units for
maintaining version information. Thus:
interface Node {
...
readonly attribute Node olderVersion;
readonly attribute Node newerVersion;
Node versionDated (in Timestamp time);
};
In order to view a document as it was at some previous time, we may
need to access dated root, so we'll need to modify the Document as
well:
interface Document {
...
Path rootDated (in Timestamp time);
};
4) Editing: For a number of reasons (more on this later) it is useful
to separate the interfaces for access from editing (and maybe even
to granularize editing interfaces more finely). But since editing
is *very* contextual, the editing interfaces need to be derived
from Paths (not Nodes).
interface Path {
...
EditablePath edit (in User user);
};
interface EditableNode : Node {
};
interface EditableElementNode : EditableNode {
void setTag (in string name);
void setAttr (in string name, in string value);
void removeAttr (in string name);
};
interface EditableTextNode : EditableNode {
void setText (in string text);
void splitText (in unsigned long index);
};
interface EditablePath : Path {
EditableNode editNode ();
Path insertBefore (in Node node);
Path insertAfter (in Node node);
void remove ();
};
Note: All attribution and version updates are managed automatically
by these methods. Since an EditablePath is created with a User
specifier we have all ensured that all necessary information is
available at that time.
Note: Since all insertions are performed wrt. a Path, it is easy to
verify both permission and validity of the operation.
5) Creating: All Nodes are created via the EditablePath interface.
(N.B. A newly-created node has not actually been inserted into
the graph and must be passed as an argument to insertBefore or
insertAfter).
interface EditablePath : Path {
...
ElementNode createElement (in string tag);
TextNode createText (in string text);
};
General notes:
* As far as possible, this is a minimal interface description. It
provides (usually) exactly one way to ask that a particular action
be performed (unlike the DOM for example). It should be easy to see
how some of the alternate DOM interface methods could be implemented
with this framework.
* The public interfaces expose only readonly attributes. Without the
ability to create an EditablePath this is entirely readonly. All
modifications are done via methods of EditablePath and its subclasses.
* One subtle issue here is how remote node references change when a
new version of a Node is created by modification of an existing
Node. A naive implementation would leave old references to Nodes in
place in the document tree, so there is a little bit of subtle
implementation involved here. But since we had the foresight to
establish a locus for representing the mapping from document tree to
Nodes, this can all be hidden from the user in the Path
implementation...
-------------------------------------------------------------------------------
Lee Iverson SRI International
leei@ai.sri.com 333 Ravenswood Ave., Menlo Park CA 94025
http://www.ai.sri.com/~leei/ (650) 859-3307
This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:57:58 PDT