[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: Great? idea for improving this list (was Re: [ba-ohs-talk]Freezope learning environments)



[archive_access.practical]    (01)

On Fri, 26 Apr 2002, Peter  Jones wrote:    (02)

> What I was suggesting was a system that:
>
> a) Reads an email and sucks out each word in turn.
> b) Each new word has a database record created, and the
> locations of occurrence of the term in another related table.
> Leaving aside the issue of polysemy for a moment, the
> record structure would be something like
> PK_ID, word_string <--relation--> FK_ID, location(s).
> c) To improve the scanning process, have a subroutine that
> discards the stop-words chosen, and clean the database of
> these.
> d) Repeat for each mail.
> e) If a word is re-encountered then only the new location for
> the word is inserted in the database in the appropriate new tuple.    (03)

In what ways are you imaginging this being different from a free
text index of the mail archive that gets reindexed every time a
new message comes in?    (04)

> What you then get is an index for every mail in the archive that
> contains all the interesting words in all the mails in the archive and
> the locations in the mails of all those words.    (05)

Is it that the list of words indexed is more limited?    (06)

> Sophistication could be added in the read-in phase.
> For example, polysemy might be attacked by some algorithm that
> makes guesses about the word type based on a grammar.
> Locations might be narrowed to paragraphs by chunking them beforehand.
> And so on.    (07)

You make this sound easy. After watching the list for a while it
is clear that we don't have the collective time for this measure
of complexity.  Are we talking about implementing something to
use now and experiment and develop, or are we talking about an
ideal eventual system that would work in a variety of capacities?    (08)

We can talk the theory (I'd love to) but that stuff has been
beaten to death here and elsewhere. How do we distinguish between
the speculative talk and the plans for action?    (09)

-- 
Chris Dent  <cdent@burningchrome.com>  http://www.burningchrome.com/~cdent/
"Mediocrities everywhere--now and to come--I absolve you all! Amen!
 -Salieri, in Peter Shaffer's Amadeus    (010)