[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] Keyword Indexing


At 09:59 AM 4/30/02 +0100, Murray Altheim wrote:
>Alex Shapiro wrote:
>>Ok, Murray, come on now.  I read the email spec, and I see that 
>>separating headers from the email body might not be so trivial as to be 
>>accomplishable by using grep, (you can grep for the first occurrence of a 
>>double new line, but then you have to look at the Next line), but why 
>>should we let that stop us.
>>So the parsing program is going to have to be a bit longer, but so 
>>what.  I think that the small bit of effort that it's going to take to 
>>write a longer parser far outweighs the collective nuisance of having to 
>>type "keys:" before every keyword section.
>>If the first line of a post with a "[" then I think that we can just 
>>assume that it's the start of a keyword section.  (A check for a closing 
>>bracket would make this even more certain)  I do not recall any emails 
>>that have started with an open square bracket.  Maybe there will be a 
>>slight bit of noise some unusual posts, but I think that we can manually 
>>delete any keywords generated in this way.
>
>
>If we're trying to develop technologies that are useful outside of
>this small group, they need to be robust.    (01)

Are we developing technologies with the purpose that they be useful outside 
of this group?  Certainly, so more active groups, like this one 
http://www.info-arch.org/lists/sigia-l/0201/ could benefit from this 
technology.  However, I think that we should start with the smaller goal of 
making this list better, and then if the solution works we can think about 
how to generalize it, and package it so that others could use it.    (02)

>  Can you guarantee that
>the square-bracketed email keywords will show up in column one of
>the first non-whitespace line in an email? What about replies that
>include the keywords of a previous message? What about the keywords
>after they have been processed into an email archive (HTML) message?    (03)

I can guarantee that when someone sends out an e-mail for the first time, 
and they want to put in keywords, they will put those keywords as the top 
line.  The instructions of putting keywords in the top line are pretty 
simple to follow.    (04)

I see where you are going with questioning the fact that a reply (or the 
delivered message) may no longer have the keywords as the top line.  One 
could see this as a problem.  However, my vision of this tools use is that 
only the initial key-worded posts need to be catalogued.  The rest of the 
posts in the thread that follows could be accessed by clicking on the rout 
thread, and then from there to the discussion archived by thread.    (05)

>If adding the five characters "keys:" is really too much to ask
>to disambiguate keywords, then I don't think there's much chance
>that this idea will catch on. One simply can't have any square-
>bracketed content in plaintext be considered keywords without
>introducing more problems that are solved.    (06)

I agree that one simply can't have any square bracketed content be 
considered keywords, but one can do this for the first line.    (07)

>  The use and reuse of
>that text will put it into many more contexts than that first non-
>whitespace line of an email message. If people were more regular in
>their use, this might be work, but we can hardly advocate something
>that doesn't survive the introduction of ">>" or markup around it.
>Remember, we want the keys to survive multiple replies, so that
>once a thread is started people don't have to retype it, or
>manipulate quoted text (which is a no-no).    (08)

Right, I address this issue above.  We do not want keywords to survive 
multiple threads, because threads tend to be subject to topic creep.  If 
the response contributes a significant new idea, then the author can go the 
extra step and repeat the keywords at the top.  Most of the follow up 
discussion in a thread would not have to be catalogued, however, because it 
contains information that is expected to be updated with time.  This 
discussion for instance.  Later on, the back and forth that went on in 
coming up with a format will not be interesting, only the results will be.    (09)

>The biggest problem I see with "[keys:" is that it's in English,
>but I hesitate to suggest a symbolic solution because it's harder
>to remember (uh, is it "{[" or "[{"?, etc.). I suppose we could
>come up with an i18n solution... *sigh*    (010)

You mean that the biggest problem is that it's five letters.  (I don't 
think that french would be better :)    (011)

... Well, what about double square brackets? [[ ... ]] That's not too hard 
to remember, and they don't occur as frequently as single brackets.  I 
still say that if we only consider the first line of the post as a valid 
location for keywords, then single square brackets are fine.    (012)

Cheers,
--Alex    (013)


>Murray
>
>......................................................................
>Murray Altheim                  <http://kmi.open.ac.uk/people/murray/>
>Knowledge Media Institute
>The Open University, Milton Keynes, Bucks, MK7 6AA, UK
>
>      In the evening
>      The rice leaves in the garden
>      Rustle in the autumn wind
>      That blows through my reed hut.  -- Minamoto no Tsunenobu    (014)