BEYOND HTML: XML AND AUTOMATED WEB PROCESSING (by Tim Bray, co-author of
Do any of the following 3 examples help anyone ? (Tim Bray can be
contacted at firstname.lastname@example.org)
"Suppose I wanted to add some intelligence to the vast amount of e-mail
stored on my computer. With XML I could mark it as shown in Example 1.
<from> <name>Tim Bray</name> <address>email@example.com</address>
<to> <name>Paul Dreyfus</name> <address>firstname.lastname@example.org</address>
<subject> First draft of XML intro </subject>
<p>Here's a draft of that XML article. I'll be on the road but connected
to e-mail. Let me know if it hits the right level (i.e., are
major revisions in order?). If it's fine, proceed with editorial
<attach encoding="mime" name="xml-draft.html"/>
This example should be pretty obvious. The <attach> tag looks a little
weird, but we'll cover that in a moment. Some of the advantages should
also be obvious. To start with, a Web robot could do a smart job of
indexing this, and a Java applet could do all sorts of intelligent
formatting (such as build a table-of-contents summary of a bunch of
The basic idea here is called descriptive markup: the tags around a
chunk of text don't say how to format it, or what to do when people
click on it; they just say what it is. This is in dramatic contrast to
HTML, where the tags do all these things at once.
The big win with descriptive markup is a bit subtle. Suppose you're
processing some e-mail and you want to be able to display it both with
Navigator on a big monitor and on the teeny screen of a cell phone. If
the e-mail were marked up in XML, you could write one set of rules for
the monitor and another for the cellphone, another to produce a
professional-quality paper printout, and still another to drive a fax
The idea is that you've decoupled the document from its presentation.
This doesn't make designing good documents or good presentations easy,
but it does mean that you can attack the problems separately, which is a
big step forward.
Publish and Constrain Your Tags
Obviously, you don't want to make up a new set of tags every time you
write a document. Furthermore, since this is the Web, you'd probably
like to share your work with others.
XML has something called a document type definition (usually called a
DTD) that allows you to define the tags you've created, for future use
by yourself or others. Example 2 is the DTD for the e-mail shown in the
<!element email (head, body)>
<!element head (from, to+, cc*, subject)>
<!element from (name?, address)>
<!element to (name?, address)>
<!element name (#PCDATA)>
<!element address (#PCDATA)>
<!element subject (#PCDATA)>
<!element body (p | attach)*>
<!element p (#PCDATA)>
<!element attach EMPTY>
<!attlist attach encoding (mime|binhex) "mime"
name CDATA #REQUIRED>
This should be easy to read, too. In English, it says:
An EMAIL has to have a HEAD and a BODY.
The HEAD has to have a FROM, one or more TOs, zero or more CCs, and a
The FROM and the TO can both include a NAME, and they have to include an
The NAME, ADDRESS, and SUBJECT are all just text.
The BODY is a mixture of Ps and ATTACHes.
A P contains just text.
An ATTACH doesn't contain anything, but it has an ENCODING attribute
whose value can be either mime or binhex; if it's not there, the default
is mime. An ATTACH also has a NAME attribute whose value can be any
text, but has to be there.
I'm not going to explain all the details of the DTD syntax, but the
ideas are pretty obvious. Clearly, you'd normally have one DTD that
describes a lot of different documents; think of it as an SQL database
schema for documents.
If this DTD were stored at some location -- say,
http://home.netscape.com/DTDs/email.dtd -- then to associate the DTD
with the e-mail message you'd insert a first line like this:
<!doctype email SYSTEM "http://home.netscape.com/DTDs/email.dtd">
<from> <name>Tim Bray</name> <email>email@example.com</email> </from>
<to> <name>Paul Dreyfus</name> <email>firstname.lastname@example.org</email>
The DTD might be useful to a program that received one of these e-mail
messages and wanted to find out in advance what tags would be in it and
how they fit together. But its most important use is to support smart
editing programs, which could read the DTD and simply not let the author
create a document that didn't match the DTD. (This isn't imaginary; such
authoring tools already exist.)
An XML document for which there is a DTD, and which conforms to that
DTD, is called valid. But a document doesn't have to be valid to be
useful, as we'll see in a moment.
Extensible Hyperlinks, Too
Adding your own tags is nice, but that's only part of what makes the Web
useful and XML interesting. Hyperlinks make the Web go; the <A
HREF="whatever"> idiom has become universal. XML extends Web hyperlinks
in a couple of useful directions. Example 3 is taken from a description
of a tournament game of Go (which is an old, complex, popular Asian
board game, Sakata being one of the most famous players of this
<P>Faced with a tight situation, Sakata found a
<X><L ROLE="EG" TITLE="English translation"
SHOW="NEW" HREF="/cgi-bin/xlate?term=tesuji" />
<L ROLE="ToMove" TITLE="Jump to move in game record"
SHOW="REPLACE" HREF="game.html#Move127" />
<L ROLE="PIC" TITLE="Illustration"
<L ROLE="CourseNotes" TITLE="Course Notes"
Once again, we'll skip the syntactic details, which are explained in the
Linking part of the XML Specification. In a browser, this would look
Faced with a tight situation, Sakata found a tesuji.
When you clicked on "tesuji," though, instead of the usual Web behavior
of charging off after that link, you'd get a menu with four entries:
English translation, Jump to move in game record, Illustration, and
Choosing English translation would run an ordinary CGI script. The
attribute SHOW="NEW" means that rather than replacing the current page,
the results of the script would show up in a new window (as if you'd
said TARGET=_NEW in an HTML page). By the way, the translation would
reveal that "tesuji" is a Go term meaning a clever tactical maneuver.
Jump to move in game record, a link into an HTML page, would behave
exactly as the Web does today. The Illustration option is more
interesting. First of all, it's a link into an XML file. The text after
the # in the URL says that the link is to the first FIG element that has
the attribute CAPTION="TESUJI". Also, because of the attribute
SHOW="EMBED", rather than replacing the current page with the target of
the link, that target material would be inserted in the display right
here at the location of the link. The Course Notes option links to a
"span" of text in an XML file -- specifically, the first three
paragraphs following a tag that has the attribute ID="def-Tesuji".
These straightforward extensions of the Web's current linking
facilities, in my opinion, add a lot of richness and cost very little.
(But then, I helped write the spec.)
This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:57:52 PDT