[unrev-II] Towards an Atomic Data Structure

From: Eric Armstrong (eric.armstrong@eng.sun.com)
Date: Sun Apr 23 2000 - 21:50:50 PDT

  • Next message: Eric Armstrong: "[unrev-II] New Schedule"

    Background: Object-Oriented (O-O) Terminology
    ---------------------------------------------
      A "class" is a template you use to construct an "object".
      The class defines the behaviors that members of the class
      are capable of, and defines the data items that distinguish
      one member of the class from another. Individual objects
      use the behaviors defined by the class, and have specific
      data values that make them (usually unique) "individuals".
      The Car class, for example, might define behaviors for
      start and stop. It also might include data types for
      model, color, and acceleration characteristics. So a Red
      Ferrari would have one set of data items, while a Green
      Edsel would have another set. But both would have
      start and stop behaviors.

    Text Nodes
    ----------
      The fundamental unit of a DKR is an item of information.
      Since the ideal in writing is to have "one idea per
      paragraph", an "information node" can be thought of as
      a paragraph of text. Headings stand apart from other
      text, as well, so a heading is a special (short)
      paragraph, or information node.

      Node behaviors are defined in a class (object template).
      Every text node must contain an attribution -- a pointer
      to the author, or an identifying string. A copy of that
      node may be edited, which suggests the need for a split
      operation, for example. After node is split into one or
      more fragments, and edit operation could replace some
      fragments or insert new ones that have a different author.
      Some of the operations appropriate to a node might therefore
      include split, delete, replace, and insert.

      Note that when the node is split, two objects exist where
      one did before. Every node must therefore be capable of
      being the root of a subtree. Although it may start out life
      as a simple node that contains or points to an item of text,
      it must also be capable of pointing to a list of text
      elements. (That list might also include markup elements,
      like HTML bold tags: <b>.) Since each item in that list may
      itself point to a list of subitems, the resulting structure
      is a tree.

    Categories
    ----------
      Since every node will contain some kind of content (text
      initially, but eventually media objects), it is relatively
      clear that the fundamental class should provide operations
      for split, delete, replace, and insert, and that it should
      allow for a tree-shaped substructure.

      However, nodes may also acquire other subelements, such as
      comments. So a node needs the ability to serve as the root
      of multiple subtrees.

      In addition, some categories require special behaviors and
      data structures. For example, if a node is "rated" then it
      needs the ability to acquire a subtree consisting of
      evaluations (ratings). It also needs the ability to "average"
      the evaluations it has received for presentations.

    Dynamic Behavior
    ----------------
      Following classic O-O principles, it is tempting to construct
      separate classes for evaluations, comments, and information
      nodes. But that process moves us further away from identifying
      an "atomic" structure. More importantly, it runs into the
      problem that node-categorization (classification) is a dynamic
      process.

      If node is not "rated", then it should not be possible to add
      evaluations to it. If it is "rated", it should be possible to
      add evaluations, and to have the average of those evaluations
      summarized as the node's rating. Once a rating has been added,
      the node can no longer be unrated. But until then, the node
      could be switched at will back and forth from "unrated" to
      "rated".

      That dynamic classification poses problems for a static object
      oriented class structure. Unless some language system exists
      that allows behaviors to be added on and taken away in a
      dynamic classification process, the only alternative is to build
      all of the behaviors into the fundamental class -- making the
      identification of an "atomic" data structure all the more
      important.
      
    Extensibility
    -------------
      If the behaviors are defined in a single class, then adds to
      the system by extending that class. To put that class into
      effect, the system must be designed to create nodes using a
      "factory method". To create a new information object, then,
      you don't merely use an existing class to make one. Instead,
      you ask the factory method to create one and hand it back to
      you. Runtime parameters can then configure the "factory" to
      tell it which class to use. So, if the BasicNode class is
      the standard class for an information node, and if you create
      an ExtendedNode class, then the factory would be instructed
      (via a command line switch or configuration file) to use the
      ExtendedNode class when constructing an information object.

    Versioning
    ----------
      The basic structure for a node, then, is that it contains a
      pointer to a previous version of itself. For versioning to
      be useful, however, it must be possible for old links to
      acquire the newest version. That requires an indirect link
      -- a "virtual node". At a minimum, then, the data structure
      must allow for two atomic types: The virtual node that points
      to the latest version, and the node itself.

      Some actions like edits, rearranging sublist items, or deleting
      those items would produce a new version. It should be possible
      to perform multiple operations of that kind without having a
      separate (persistent) version number. When "published", the
      node would have the next sequential version number, regardless
      of the number of changes. (For "undo", however, multiple
      non-persistent "revisions" would be kept, so that changes can
      be backed out. When published, all but the last revision would
      be removed.)

    Data and Sublists
    -----------------
      The node must contain, at a minimum, sublists (or subtrees)
      for content (text nodes), for comments (text nodes), and
      evaluations (text nodes with a rating). It may need to keep
      an author-list (people who are authorized to perform direct
      edits). It also needs a list of the categories to which it
      belongs. (Implementing categories as lists provides fast
      searching, as demonstrated by the Traction system). And it
      needs a substructure list. (For a heading, for example,
      the "content" would be the text of the heading while the
      "substructure" would be subheadings and paragraphs.)
      
      A node therefore contains a variety of sublists, and at least
      some data items. The data items include a rating slot (for
      rated nodes), a version identifier, and a pointer to the
      previous version. (Alternatively, one of the sublists could
      be a version list.) A reference count would also be a good
      idea, in case nodes wind up with no links at all, so they
      can be removed. A pointer to the previous revision (during
      editing) would also be needed, until the node is published.

      To make the structures extensible, the data items may well
      be kept in a tuple, where the nature of the tuple depends
      on the type of the node. (A text node, for example, would
      have a text string and an author identifier.)

    Types
    -----
      To make an "atomic" data structure, it would ideally be
      possible to construct a node that contains a list of
      of subtrees. Each list would be identified as, for
      example: content, structure, comment, evaluation, categories,
      and authors. Every such node would be capable of having
      its own list of sublists. Even if only one sublist was
      present, the result would be a tree.

      It should also be possible to add lists dynamically. That
      allows a "question" node to have a sublist of "alternatives",
      for example, and for each alternative to have a sublist of
      "arguments" and/or "evaluations".

      The question is: What is the best way to represent those
      types? They could be kept as a value in the node. (Then
      the list of sublists would contain pairs: type, list.)
      Another possibility is to keep them in a list header for
      each list. A third possibility is to link to them, the
      same way that a node's category sublist entries link to
      individual categories.

      Another question: Do nodes need types, as well? For example,
      the list of arguments could have arguments for and arguments
      against (pro/con, plus/minus). Or possibly the arguments
      should be kept in separate lists? But that would make
      reconstruction of the original chronological sequence
      difficult, although it would expedite plus/minus summaries.
      [Overall, it seems desirable to add "type" as a data
      item in a node.]

      If a node contains a type, and a node contains a list of
      sublists, then any node can be a list header. It only needs
      a type value that identifies it as a content list, or a
      structure list, etc.

    Atomic Structure
    ----------------
      The basic atomic structures, then, might look like this:
       
         VirtualNode
            pointer to Node (most recent version)

         Node
            type
            prevVersion
            data (tuple)
            sublists

      A node of type "Info" would have multiple sublists,
      including content, structure, etc. A node with one of
      those types, on the other hand, would have only one sublist.
      So a "Content" node would have a single list containing
      "Text" (zero sublists) or "Markup" nodes (one sublist with
      zero or more entries of type Text or Markup).

    ------------------------------------------------------------------------
    Your high school sweetheart-where is he now? With 4.4 million alumni
    already registered at Classmates.com, there's a good chance you'll
    find her here. Visit your online high school class reunion at:
    http://click.egroups.com/1/3139/3/_/444287/_/956551848/
    ------------------------------------------------------------------------

    Community email addresses:
      Post message: unrev-II@onelist.com
      Subscribe: unrev-II-subscribe@onelist.com
      Unsubscribe: unrev-II-unsubscribe@onelist.com
      List owner: unrev-II-owner@onelist.com

    Shortcut URL to this page:
      http://www.onelist.com/community/unrev-II



    This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:58:23 PDT