On Tue, 5 Jun 2001 cdent@burningchrome.com wrote:
> I don't think I'm understanding you here. Which embedded structure are
> you saying raw text contains? Do you mean punctuation, paragraph
> breaks, what?
Yes. We use carriage returns, spaces, tabs, and punctuation in raw text
in the same way that XML uses tags, entitites, and attributes. When we're
communicating with humans, we can get away with having an arbitrary
syntax, because humans are pretty good at figuring out intended semantics.
For instance, in this e-mail, I distinguish paragraphs by separating them
with blank lines. If I chose instead to distinguish them by indenting the
first line of each paragraph, you, the human reader, would have no trouble
recognizing that the two representations are semantically equivalent.
So what's the best way of representing content in a document, where a
document consists of a sequence of paragraphs? Xanadu represents all
documents as a sequence of bytes at the content layer, but it seems to me
advantageous to represent it as a sequence of paragraphs.
Here's a real-world example. CVS differences documents by lines of text.
So if I have the source code:
if (x > y) {
doSomething();
}
and I change it to:
if (x > y)
{
doSomething();
}
CVS tells me that these two documents are different. Well, that's true;
not all lines in this document are the same. But semantically, these two
excerpts of code are exactly the same. So do you really want your version
control system saying that these chunks of code are different?
I'm not sure what the answer is. Intuitively, I think that I'd like my
version control systems to be smarter, so that if I run some code through
lint, and I want to do a diff between a pre-lint version of a file and a
post-lint version, I get something actually useful in return. However, at
the same time, I don't want to ignore style completely, even if it is
semantically redundant.
> I do think that Nelson is on to something when he suggests that
> structure must be a layer above the content. It's very hard, though,
> to express exactly why.
I also think this is valid. But it's clearly futile to completely
separate content from structure. So the challenge is, how granularly do
we separate these layers?
-Eugene
-- +=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+ | "Writer's block is a fancy term made up by whiners so they | +===== can have an excuse to drink alcohol." --Steve Martin ===========+Community email addresses: Post message: unrev-II@onelist.com Subscribe: unrev-II-subscribe@onelist.com Unsubscribe: unrev-II-unsubscribe@onelist.com List owner: unrev-II-owner@onelist.com
Shortcut URL to this page: http://www.onelist.com/community/unrev-II
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
This archive was generated by hypermail 2b29 : Tue Jun 05 2001 - 13:47:03 PDT