[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] Keyword Indexing


Alex Shapiro wrote:    (01)

> At 02:28 PM 4/29/02 +0100, you wrote:
> 
>> Eric Armstrong wrote:
>>
>>> Murray Altheim wrote:
>>>
>>>> I suggest this:
>>>>
>>>>    [KEYS: word1, word2, word3 ]
>>>
>>> Makes sense to me. Especially if not case sensitive.
>>
>> The big thing is being able to (case insensitively) grep on "[keys:]".
>> One can't simply use square brackets because they show up all the
>> time in both program code and prose, eg., [Humbert, 1999].
> 
> Ok, Murray, come on now.  I read the email spec, and I see that 
> separating headers from the email body might not be so trivial as to be 
> accomplishable by using grep, (you can grep for the first occurrence of 
> a double new line, but then you have to look at the Next line), but why 
> should we let that stop us.
> 
> So the parsing program is going to have to be a bit longer, but so 
> what.  I think that the small bit of effort that it's going to take to 
> write a longer parser far outweighs the collective nuisance of having to 
> type "keys:" before every keyword section.
> 
> If the first line of a post with a "[" then I think that we can just 
> assume that it's the start of a keyword section.  (A check for a closing 
> bracket would make this even more certain)  I do not recall any emails 
> that have started with an open square bracket.  Maybe there will be a 
> slight bit of noise some unusual posts, but I think that we can manually 
> delete any keywords generated in this way.    (02)


If we're trying to develop technologies that are useful outside of
this small group, they need to be robust. Can you guarantee that
the square-bracketed email keywords will show up in column one of
the first non-whitespace line in an email? What about replies that
include the keywords of a previous message? What about the keywords
after they have been processed into an email archive (HTML) message?
If adding the five characters "keys:" is really too much to ask
to disambiguate keywords, then I don't think there's much chance
that this idea will catch on. One simply can't have any square-
bracketed content in plaintext be considered keywords without
introducing more problems that are solved. The use and reuse of
that text will put it into many more contexts than that first non-
whitespace line of an email message. If people were more regular in
their use, this might be work, but we can hardly advocate something
that doesn't survive the introduction of ">>" or markup around it.
Remember, we want the keys to survive multiple replies, so that
once a thread is started people don't have to retype it, or
manipulate quoted text (which is a no-no).    (03)

The biggest problem I see with "[keys:" is that it's in English,
but I hesitate to suggest a symbolic solution because it's harder
to remember (uh, is it "{[" or "[{"?, etc.). I suppose we could
come up with an i18n solution... *sigh*    (04)

Murray    (05)

......................................................................
Murray Altheim                  <http://kmi.open.ac.uk/people/murray/>
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK    (06)

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu    (07)