Re: [unrev-II] Multiple Parents, Slash and Backslash

From: Lee Iverson (leei@ai.sri.com)
Date: Wed Jan 03 2001 - 13:43:56 PST

Next message: Jack Park: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Previous message: Grant Bowman: "Re: [unrev-II] Attribution"
Next in thread: Jack Park: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Maybe reply: Jack Park: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Reply: Eric Armstrong: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[N.B. I've moved this discussion to the OHS development list...]

In message <3A525655.FF59E087@eng.sun.com>, Eric Armstrong writes:
>In conversation with Eugene, he's been asking
>the question: Are multiple parents for a node
>really necessary? Basically, if that concept
>adds complexity, maybe we should plan on not
>implementing it.
>
>I have felt that it was vital, but so far have
>not presented a compelling enough use case to
>settle the argument. This message is an attempt
>to do so. (And it led to another interesting
>insight that prompted me to jot down my thoughts.)

I'm sort of on Eric's side on this one. I do believe that content
re-use is so fundamental as to require some kind of means of managing
multiple parenthood. Of course as has been pointed out, with
node-level versioning this can get complicated.

Now, there is a simple way of handling this without causing many
problems (I believe) and that is to conceptually separate the nodes
from the paths used to access them (this is a strategy used in Scene
Graphs such as Inventor etc. where reuse of scene components for
essential for efficiency).

So, now I'd finally like to insert myself into the design discussions
in a serious way. Let's assume that we want to build a system that
allows (at minimum) the integration of arbitrary XML documents. What
I'll outline below gets us nearly there (and could easily be used as a
framework upon which to build DOM or SAX interfaces).

Let's walk through some simple IDL:

----
interface Node {
  readonly attribute ID id;
};
interface ElementNode : Node {
  readonly attribute string tag;
  readonly attribute unsigned int numChildren;
  Node child (in unsigned long index);
  Attr getAttrFromName (in string name);
  readonly attribute unsigned int numAttrs;
  Node attr (in unsigned long index);
};
interface TextNode : Node {
  readonly attribute string text;
};
interface Path {
  readonly attribute Node node;
  readonly attribute Document document;
  readonly attribute Path parent;
  Path firstChild ();
  Path lastChild ();
  Path previousSibling ();
  Path nextSibling ();
};
interface Document {
  readonly attribute Path root;
};
----
So, what we've done above is to separate the structure of our tree
into a set of *placeless* Nodes (some of which have children) and a
Document tree defined in terms of Paths which define where Nodes are
*placed* in a Document.
Given this basic structure, we can navigate easily and nodes need have
*no parent whatsoever*.  All of the tree-based relationships are
defined wrt. the Path used to access a Node and not by the Node
itself.
Now this division of labour does introduce some issues
(and usually simple ways to resolve them).
1) Node IDs: Since we give each node a unique ID, searching for a node
   is most easily done be providing a mapping from Document->Node via
   the ID.  We can extend Document as follows:
   interface Document {
     ...
     Path getNodeFromID (in ID id);
   };
   At a level higher than documents, we will also need to provide a map
   from 
   
2) Attribution: Clearly it is the nodes themselves which should be
   tagged with creator/contributor tags.  Thus:
   interface User;
   interface Timestamp;
   interface Node {
     ...
     readonly attribute User creator;
     readonly attribute UserList contributors;
     readonly attribute Timestamp creationTime;
     readonly attribute Timestamp modificationTime;
   };
3) Versioning: Again, it is the nodes which are the units for
   maintaining version information. Thus:
   interface Node {
     ...
     readonly attribute Node olderVersion;
     readonly attribute Node newerVersion;
     Node versionDated (in Timestamp time);
   };
   In order to view a document as it was at some previous time, we may
   need to access dated root, so we'll need to modify the Document as
   well:
   interface Document {
     ...
     Path rootDated (in Timestamp time);
   };
4) Editing:  For a number of reasons (more on this later) it is useful
   to separate the interfaces for access from editing (and maybe even
   to granularize editing interfaces more finely).  But since editing
   is *very* contextual, the editing interfaces need to be derived
   from Paths (not Nodes).
   interface Path {
     ...
     EditablePath edit (in User user);
   };
   interface EditableNode : Node {
   };
   interface EditableElementNode : EditableNode {
     void setTag (in string name);
     void setAttr (in string name, in string value);
     void removeAttr (in string name);
   };
   interface EditableTextNode : EditableNode {
     void setText (in string text);
     void splitText (in unsigned long index);
   };
   interface EditablePath : Path {
     EditableNode editNode ();
     Path insertBefore (in Node node);
     Path insertAfter (in Node node);
     void remove ();
   };
   Note: All attribution and version updates are managed automatically
   by these methods.  Since an EditablePath is created with a User
   specifier we have all ensured that all necessary information is
   available at that time.
   Note: Since all insertions are performed wrt. a Path, it is easy to
   verify both permission and validity of the operation.
5) Creating: All Nodes are created via the EditablePath interface.
   (N.B. A newly-created node has not actually been inserted into
   the graph and must be passed as an argument to insertBefore or
   insertAfter).
   interface EditablePath : Path {
     ...
     ElementNode createElement (in string tag);
     TextNode createText (in string text);
   };
General notes:
* As far as possible, this is a minimal interface description.  It
  provides (usually) exactly one way to ask that a particular action
  be performed (unlike the DOM for example).  It should be easy to see
  how some of the alternate DOM interface methods could be implemented
  with this framework.
* The public interfaces expose only readonly attributes.  Without the
  ability to create an EditablePath this is entirely readonly. All
  modifications are done via methods of EditablePath and its subclasses.
* One subtle issue here is how remote node references change when a
  new version of a Node is created by modification of an existing
  Node.  A naive implementation would leave old references to Nodes in
  place in the document tree, so there is a little bit of subtle
  implementation involved here.  But since we had the foresight to
  establish a locus for representing the mapping from document tree to
  Nodes, this can all be hidden from the user in the Path
  implementation...
-------------------------------------------------------------------------------
Lee Iverson     		SRI International
leei@ai.sri.com			333 Ravenswood Ave., Menlo Park CA 94025
http://www.ai.sri.com/~leei/	(650) 859-3307

Next message: Jack Park: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Previous message: Grant Bowman: "Re: [unrev-II] Attribution"
Next in thread: Jack Park: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Maybe reply: Jack Park: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Reply: Eric Armstrong: "Re: [unrev-II] Multiple Parents, Slash and Backslash"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:57:58 PDT