[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] backlink database data

Hi Sheldon,    (01)

I think whether they are redundant enough depends on the mechanisms we
use to process the data collected.  There presently aren't any tools to
use the backlinks that are being collected in an automated way that I 
am aware of, so this is a bit of a future oriented discussion, but 
that's fine.    (02)

Without the signatures and without any other mechanism (right now), a
reasonable process to find out information about someone is to use their
email address and "google" for it.  That will usually get you to their
web page and alot of other things.    (03)

If including .sig URLs would help an automated tool to associate an
email address with a web page, that would provide some value.  I could
see it go either way, but my first reaction would be to not exclude any
information until there's a reason to do so.  Tools can exclude data
easily enough.    (04)

-- Grant Bowman                                   <grantbow@svpal.org>    (05)

* Sheldon Brahms <nighthawk476@attbi.com> [011208 13:52]:
> Hey Eugene,
> What about a filter to exclude what is below the sig line?  That data should
> usually be reasonably redundant and unrelated to the body text.
> Thoughts?
> ---Sheldon
> Eugene Eric Kim wrote:
> > I've added a new feature to the list archiving code that the Bootstrap
> > Alliance mailing lists (ba-unrev-talk, ba-ohs-talk) use: backlink
> > extraction.  Whenever an e-mail gets converted into HTML, the archiver
> > also extracts URLs from the e-mail body and appends it to a text file.
> >
> > You can see live results of the links from this list and from
> > ba-unrev-talk at:
> >
> >     http://www.bootstrap.org/lists/backlinks.txt
> >
> > This is an extremely crude, early stage experiment.  The hypothesis is
> > that this backlink data, combined with a useful front-end, can serve as a
> > useful and automatic way of integrating data in a repository.
> >
> > Restated, an e-mail with a link is an annotation.  For example, this very
> > e-mail could be considered an annotation to the document located in the
> > URL above.  In fact, you should find the above URL in the file.
> >
> > Unfortunately, this annotation is usually not visible when viewing the
> > URL, because the Web has no notion of back-links.  However, you can create
> > this notion by extracting the links from local documents on a Web server
> > and recording those links in a database.  (This is essentially what Google
> > does.)  Extracting back-links from e-mail archives has the added bonus
> > that e-mail is a static document, meaning that at least one end of the
> > link (the e-mail end) will rarely break.  I say rarely, because you could
> > change the location of the archives on the web site, or delete them
> > altogether.
> >
> > What does this mean for all of you?  Well, I'm not a front-end kind of
> > guy, but I know there are people on this list who are.  So, this is an
> > open challenge to create useful front-ends to this data.
> >
> > One early observation: There is a lot of "useless" data in the file.  For
> > instance, in my .sig below, I have a URL to my home page.  So every e-mail
> > I send to this list creates a back-link to my home page, even though my
> > home page isn't really relevant to the content of these e-mails.  The same
> > goes for quoted text -- you have a lot of redundant text in e-mail
> > threads, and hence, a lot of redundant links in the back-link database.
> >
> > Here is my roadmap for further developing this feature:
> >
> > 1. People on ba-ohs-talk build useful front-ends to this data.  In the
> > process of doing this, people make useful suggestions as to what other
> > metadata need be stored in the file.
> >
> > 2. Replace the text file with a real backlink database.  Hopefully, this
> > backlink database can be used by other projects as well, including a2h
> > (my Augment-to-(X)(HT)ML translator) and the Hyperscope.
> >
> > 3. Eventually bind the backlink database to a peer-to-peer infrastructure,
> > so that multiple databases can be installed all over the Net, with all of
> > them sharing data.
> >
> > -Eugene
> >
> > --
> > +=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+
> > |       "Writer's block is a fancy term made up by whiners so they        |
> > +=====  can have an excuse to drink alcohol."  --Steve Martin  ===========+    (06)