Re: [ba-ohs-talk] backlink database data
Hey Grant, (01)
Good to hear from you. I'm glad you wrote this, as it's giving me an opening to
state and clarify a couple of things. (02)
First, I am amazed and enthused about the amount of traffic over this change in
code that Eugene made, and the comments that he announced it with. (03)
I want to revisit the very beginning of this for a second. Neither I nor Eugene,
I certainly don't believe, advocated the removal of any data that would prove
useful to a user. (04)
What started this exchange was a simple comment by Eugene that his hack would add
extraneous data to the simulated backlink file. He used the specific example of
the information below the sig line as "useless." I simply responded to that with
the idea of a filter to exclude below the sig line. Easy enough, and just an
obvious thought. It was certainly no evaluation of the value of what was below
the sig line, just a solution to a specific situation posed in his post. (05)
I don't advocate excluding data from below, nor do I advocate including data from
below. In fact, I pretty much don't really care one way or the other. I was
simply offering the simplest solution I could think of coming from a front-end
top-down perspective, since that's what was asked for by Eugene. Obviously, this
is a hot area, so some other mechanisms need to be figured out. Fine by me. (06)
The truth is, as someone had noted somewhere in the middle of all this, was that
most of the talk was completely unrelated to the original post. It had gotten
sidetracked in all the discussion about sigs. (07)
As far as my usage of a sig, I had been using one for a number of months, but due
to recent e-mail and URL changes, have dropped it temporarily. I couldn't even
begin to tell you what convention it used. (08)
Thanks for giving me the chance to say this. (09)
Grant Bowman wrote: (011)
> Hi Sheldon,
> I think whether they are redundant enough depends on the mechanisms we
> use to process the data collected. There presently aren't any tools to
> use the backlinks that are being collected in an automated way that I
> am aware of, so this is a bit of a future oriented discussion, but
> that's fine.
> Without the signatures and without any other mechanism (right now), a
> reasonable process to find out information about someone is to use their
> email address and "google" for it. That will usually get you to their
> web page and alot of other things.
> If including .sig URLs would help an automated tool to associate an
> email address with a web page, that would provide some value. I could
> see it go either way, but my first reaction would be to not exclude any
> information until there's a reason to do so. Tools can exclude data
> easily enough.
> -- Grant Bowman <email@example.com>
> * Sheldon Brahms <firstname.lastname@example.org> [011208 13:52]:
> > Hey Eugene,
> > What about a filter to exclude what is below the sig line? That data should
> > usually be reasonably redundant and unrelated to the body text.
> > Thoughts?
> > ---Sheldon
> > Eugene Eric Kim wrote:
> > > I've added a new feature to the list archiving code that the Bootstrap
> > > Alliance mailing lists (ba-unrev-talk, ba-ohs-talk) use: backlink
> > > extraction. Whenever an e-mail gets converted into HTML, the archiver
> > > also extracts URLs from the e-mail body and appends it to a text file.
> > >
> > > You can see live results of the links from this list and from
> > > ba-unrev-talk at:
> > >
> > > http://www.bootstrap.org/lists/backlinks.txt
> > >
> > > This is an extremely crude, early stage experiment. The hypothesis is
> > > that this backlink data, combined with a useful front-end, can serve as a
> > > useful and automatic way of integrating data in a repository.
> > >
> > > Restated, an e-mail with a link is an annotation. For example, this very
> > > e-mail could be considered an annotation to the document located in the
> > > URL above. In fact, you should find the above URL in the file.
> > >
> > > Unfortunately, this annotation is usually not visible when viewing the
> > > URL, because the Web has no notion of back-links. However, you can create
> > > this notion by extracting the links from local documents on a Web server
> > > and recording those links in a database. (This is essentially what Google
> > > does.) Extracting back-links from e-mail archives has the added bonus
> > > that e-mail is a static document, meaning that at least one end of the
> > > link (the e-mail end) will rarely break. I say rarely, because you could
> > > change the location of the archives on the web site, or delete them
> > > altogether.
> > >
> > > What does this mean for all of you? Well, I'm not a front-end kind of
> > > guy, but I know there are people on this list who are. So, this is an
> > > open challenge to create useful front-ends to this data.
> > >
> > > One early observation: There is a lot of "useless" data in the file. For
> > > instance, in my .sig below, I have a URL to my home page. So every e-mail
> > > I send to this list creates a back-link to my home page, even though my
> > > home page isn't really relevant to the content of these e-mails. The same
> > > goes for quoted text -- you have a lot of redundant text in e-mail
> > > threads, and hence, a lot of redundant links in the back-link database.
> > >
> > > Here is my roadmap for further developing this feature:
> > >
> > > 1. People on ba-ohs-talk build useful front-ends to this data. In the
> > > process of doing this, people make useful suggestions as to what other
> > > metadata need be stored in the file.
> > >
> > > 2. Replace the text file with a real backlink database. Hopefully, this
> > > backlink database can be used by other projects as well, including a2h
> > > (my Augment-to-(X)(HT)ML translator) and the Hyperscope.
> > >
> > > 3. Eventually bind the backlink database to a peer-to-peer infrastructure,
> > > so that multiple databases can be installed all over the Net, with all of
> > > them sharing data.
> > >
> > > -Eugene
> > >
> > > --
> > > +=== Eugene Eric Kim ===== email@example.com ===== http://www.eekim.com/ ===+
> > > | "Writer's block is a fancy term made up by whiners so they |
> > > +===== can have an excuse to drink alcohol." --Steve Martin ===========+ (012)