[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] backlink database data


Hey Eugene,    (01)

What about a filter to exclude what is below the sig line?  That data should
usually be reasonably redundant and unrelated to the body text.    (02)

Thoughts?    (03)

---Sheldon    (04)


Eugene Eric Kim wrote:    (05)

> I've added a new feature to the list archiving code that the Bootstrap
> Alliance mailing lists (ba-unrev-talk, ba-ohs-talk) use: backlink
> extraction.  Whenever an e-mail gets converted into HTML, the archiver
> also extracts URLs from the e-mail body and appends it to a text file.
>
> You can see live results of the links from this list and from
> ba-unrev-talk at:
>
>     http://www.bootstrap.org/lists/backlinks.txt
>
> This is an extremely crude, early stage experiment.  The hypothesis is
> that this backlink data, combined with a useful front-end, can serve as a
> useful and automatic way of integrating data in a repository.
>
> Restated, an e-mail with a link is an annotation.  For example, this very
> e-mail could be considered an annotation to the document located in the
> URL above.  In fact, you should find the above URL in the file.
>
> Unfortunately, this annotation is usually not visible when viewing the
> URL, because the Web has no notion of back-links.  However, you can create
> this notion by extracting the links from local documents on a Web server
> and recording those links in a database.  (This is essentially what Google
> does.)  Extracting back-links from e-mail archives has the added bonus
> that e-mail is a static document, meaning that at least one end of the
> link (the e-mail end) will rarely break.  I say rarely, because you could
> change the location of the archives on the web site, or delete them
> altogether.
>
> What does this mean for all of you?  Well, I'm not a front-end kind of
> guy, but I know there are people on this list who are.  So, this is an
> open challenge to create useful front-ends to this data.
>
> One early observation: There is a lot of "useless" data in the file.  For
> instance, in my .sig below, I have a URL to my home page.  So every e-mail
> I send to this list creates a back-link to my home page, even though my
> home page isn't really relevant to the content of these e-mails.  The same
> goes for quoted text -- you have a lot of redundant text in e-mail
> threads, and hence, a lot of redundant links in the back-link database.
>
> Here is my roadmap for further developing this feature:
>
> 1. People on ba-ohs-talk build useful front-ends to this data.  In the
> process of doing this, people make useful suggestions as to what other
> metadata need be stored in the file.
>
> 2. Replace the text file with a real backlink database.  Hopefully, this
> backlink database can be used by other projects as well, including a2h
> (my Augment-to-(X)(HT)ML translator) and the Hyperscope.
>
> 3. Eventually bind the backlink database to a peer-to-peer infrastructure,
> so that multiple databases can be installed all over the Net, with all of
> them sharing data.
>
> -Eugene
>
> --
> +=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+
> |       "Writer's block is a fancy term made up by whiners so they        |
> +=====  can have an excuse to drink alcohol."  --Steve Martin  ===========+    (06)