[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

[ba-ohs-talk] backlink database data

I've added a new feature to the list archiving code that the Bootstrap
Alliance mailing lists (ba-unrev-talk, ba-ohs-talk) use: backlink
extraction.  Whenever an e-mail gets converted into HTML, the archiver
also extracts URLs from the e-mail body and appends it to a text file.    (01)

You can see live results of the links from this list and from
ba-unrev-talk at:    (02)

    http://www.bootstrap.org/lists/backlinks.txt    (03)

This is an extremely crude, early stage experiment.  The hypothesis is
that this backlink data, combined with a useful front-end, can serve as a
useful and automatic way of integrating data in a repository.    (04)

Restated, an e-mail with a link is an annotation.  For example, this very
e-mail could be considered an annotation to the document located in the
URL above.  In fact, you should find the above URL in the file.    (05)

Unfortunately, this annotation is usually not visible when viewing the
URL, because the Web has no notion of back-links.  However, you can create
this notion by extracting the links from local documents on a Web server
and recording those links in a database.  (This is essentially what Google
does.)  Extracting back-links from e-mail archives has the added bonus
that e-mail is a static document, meaning that at least one end of the
link (the e-mail end) will rarely break.  I say rarely, because you could
change the location of the archives on the web site, or delete them
altogether.    (06)

What does this mean for all of you?  Well, I'm not a front-end kind of
guy, but I know there are people on this list who are.  So, this is an
open challenge to create useful front-ends to this data.    (07)

One early observation: There is a lot of "useless" data in the file.  For
instance, in my .sig below, I have a URL to my home page.  So every e-mail
I send to this list creates a back-link to my home page, even though my
home page isn't really relevant to the content of these e-mails.  The same
goes for quoted text -- you have a lot of redundant text in e-mail
threads, and hence, a lot of redundant links in the back-link database.    (08)

Here is my roadmap for further developing this feature:    (09)

1. People on ba-ohs-talk build useful front-ends to this data.  In the
process of doing this, people make useful suggestions as to what other
metadata need be stored in the file.    (010)

2. Replace the text file with a real backlink database.  Hopefully, this
backlink database can be used by other projects as well, including a2h
(my Augment-to-(X)(HT)ML translator) and the Hyperscope.    (011)

3. Eventually bind the backlink database to a peer-to-peer infrastructure,
so that multiple databases can be installed all over the Net, with all of
them sharing data.    (012)

-Eugene    (013)

+=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+
|       "Writer's block is a fancy term made up by whiners so they        |
+=====  can have an excuse to drink alcohol."  --Steve Martin  ===========+    (014)