[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Adding <meta> tags to XML documents



Eric Armstrong wrote:
> 
> "The ability to create fine-grained links to a document is known as
> granular addressability, which is an important characteristic of
> advanced knowledge management systems. Unfortunately, HTML's support
> of this feature is limited.     (01)

Let's first of all assume we're starting with the XML version of HTML,
XHTML. That doesn't *solve* any problems, but it gets us into at least
processable documents.    (02)

> In order to create granular links to 
> subsets of an HTML document, the author of that document must 
> explicitly create named anchors within that document. If the author
> does not do this, then you will be unable to link to anything more 
> granular than the entire document."    (03)

No, the author need only provide IDs on components at the level they
want to be addressable. Absent that, it's still possible to use an
XPointer to locate a component. This gets enormously more complicated
if we need to deal with ranges, but if we make another assumption,
that components exist as at least elements (such as <div> or <sect>),
then we can manipulate them with impunity using commonly available
XML-based tools.     (04)

The modifications I'm making to plink will favour an ID over a plink-
based anchor when it exists, but use plink in its absence. Absent
a plink-based anchor, it'll create an XPointer expression. Obviously,
the latter is the most difficult, time-consuming, and prone to 
breakage during document maintenance cycles. The only change we'd
need to make to XHTML to resolve *all* of these issues (within the
system we design) would be to make IDs on those components we feel
need to be directly addressed #REQUIRED. I can make that change in
five minutes and distribute an "XHTML Authoring DTD."    (05)

>   From Eugene's paper at
>   http://www.eekim.com/cgi-bin/dkr?fn=/ohs/purplecasestudy.html
> 
> Now, doesn't the ability to add <meta> tags to an XML document produce
> the same kinds of problem? The only <meta> data you can access is data
> the original author to provide.    (06)

You take a dim view of this. Open up your mind, man. Think of possibities
of this, not the possible abuses. If an editing environment automatically
added an ID to every paragraph element, inserted some [invisible] metadata
about creation date, node identifier, etc. then it wouldn't make any
difference if the document existed in a serialization form (i.e., as an
XHTML document) or in a document component database. The metadata and
the component it is associated with are isomorphic expression-wise.     (07)

> But if you have semi-solid granular addresses (which allow a document
> to be modified and yet retain a high probability of link-accuracy),
> then meta-data can be specified outside the document, with links into
> it.
> 
> That allows someone other than the original author to add ontological
> information at a later date, without disturbing the author's
> interaction with the document.    (08)

If you read through my current draft you'll find you can have *both*.
Either-or, if you will. The metadata contained in <meta> elements can
be inline, top-of-document, or in another document entirely.     (09)

> Now, file systems introduce a problem. The system breaks down when
> material is moved from one document to another. The addresses in the
> old document are now gone, and the material gets new addresses in
> the document it is added to.    (010)

Only if the system is poorly designed. The binding between metadata
and document component is only as fragile as the system handling it.
In the case of a relatively unsophisticated Web author with no
"document component database", only a copy of vi, the solution I've
proferred works fine. It also scales completely into the former. I'm
hoping to be able to demonstrate this. Put it this way, I see no
conflict with Lee's vision of NODAL and what I've suggested.    (011)

> Here again, the idea of a "node soup" solves the problem. If the nodes
> exist in a primordial soup, and they are strung together to make views
> known as "documents", then they continue to exist at their original
> address, however they are combined.    (012)

It doesn't solve the problem, it only provides what is essentially
part of any component document system. This is not a new concept of
course. Sun's documentation system (built upon a highly-customized
version of Adept Editor) uses a fairly complicated object store.    (013)

> So we can see that considering a "document" as a "view" of nodes taken
> from a node soup has important advantages for maintaining link
> integrity.
[...]
> But if the XML documents contains <meta> tags, the same problem
> is highly likely to exist! The author could easily make the
> same changes, leaving behind the original (and now inaccurate)
> <meta> tags.    (014)

Nothing in what I'm suggesting precludes taking this view of documents.
Your example points out the problems that exist in any system, and
these require management.     (015)

The author can also delete the document from his hard disk. We're
trying to design an intelligent system, if someone doesn't RTFM
about how the system functions, no matter how we try it will break.
One has to assume that editing system-managed documents by hand
will break them.    (016)

> However, if the meta data is stored *outside* the documents,
> then a librarian/ontologist is free to correct the problem
> without modifying the documents. With <meta> tags embedded
> in the documents:
>   a) The documents themselves must be modified.
>   b) The permissions structure is likely to require
>      the original author(s) make the modifications.    (017)

For both (a) and (b), I've already pointed out this not to be the 
case. When access permissions or system integrity prohibit direct
modification of a document, there are ways to associate metadata
with a component.    (018)

>   c) Two sets of changes have to be made. The <meta> tag(s)
>      must be removed from one document, and then inserted
>      into the other. (With external meta data, on the other
>      hand, only a single link needs to be changed. Such is
>      the value of indirect addressing.)    (019)

You only introduce a *different* set of problems. Indirect
addressing has the added maintenance issue, and can just as
easily become out of date as any other methodology. You must
decide where the "canonical" metadata resides in any given
system, and I expect that to differ system-by-system.    (020)

> In summary, now that the thoughts have gelled in my mind, I
> have to say that I am not in favor of embedding <meta> tags
> in XML documents. I suspect that using granular addressing
> and maintaining meta data externally makes a lot more sense.    (021)

Your solution has the same problems as any closed design. We've
not seen them succeed in the past thirty years or so. The world
is not closed, and there will be mistakes, errors, etc.    (022)

But I guess we'll just have to disagree. I am already working on 
designs that use this functionality, and I expect it to work. 
The biggest failure in your approach is that Web documents are 
*Web documents*. They exist primarily on the Web, and their
contents (and their embedded metadata) can be harvested by simple
tools.    (023)

If that metadata is in some other document, how in the heck can
a Web spider locate it? If it's in the document, Web search 
engines can do this easily. But this does suggest a feature I'm
lacking in my current design: the ability for a document to point
to an external document for its metadata. My design meets a primary
requirement of descriptions of the "Semantic Web", i.e, that the 
Web itself be usable as container for metadata. If one considers a
Web *file* as merely a serialization of a metadata-augmented 
document existing in a system that correctly manages such things, 
then there are no maintenance issues at all. Given that many existing
Web document management systems (such as xml.apache.org's Cocoon) 
already treat Web documents this way, the suggestions I'm making 
would be fairly simple modifications to existing tool sets.    (024)

Murray    (025)

...........................................................................
Murray Altheim, SGML/XML Grease Monkey     <mailto:altheim&#64;eng.sun.com>
XML Technology Center
Sun Microsystems, 1601 Willow Rd., MS UMPK17-102, Menlo Park, CA 94025    (026)

         america was once a paradise 
         of timberland and stream
         but it is dying because of the greed
         and money lust of a thousand little kings -- archy (1927)    (027)