[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

[ba-ohs-talk] Organic Structure to Organize and Retrieve the Record


Gary,    (01)

This is getting us into a useful subject.    (02)

Rod    (03)

*************    (04)

"Garold (Gary) L. Johnson" wrote:
> 
> FWIW,
> 
> By taking sequences of non-noise words to some number, you begin to build a
> phrase dictionary as well as a word dictionary, and that can prove more
> useful.
> 
> Give this phrase list and the tools to establish relationships within the
> list, it is possible to develop a faceted thesaurus, which is not too far
> distant from a topic map.
> One of Neil Larson's DOS programs does this and he used it to organize huge
> hypertext systems.
> 
> Single words are a start, but this extension should be easy to add (more so
> that the initial work).
> 
> Thanks,
> 
> Garold (Gary) L. Johnson
> 
> -----Original Message-----
> From: owner-ba-ohs-talk@bootstrap.org
> [mailto:owner-ba-ohs-talk@bootstrap.org]On Behalf Of Peter Jones
> Sent: Saturday, April 27, 2002 9:44 AM
> To: ba-ohs-talk@bootstrap.org
> Subject: Re: Great? idea for improving this list (was Re: [ba-ohs-talk]
> Freezope learning environments)
> 
> It's not that different from free text indexing except that the
> data connecting words to paragraph ids will be available.
> 
> In case it's of any interest, after a couple of hours work I now
> have a perl script that will call the archives and pull out
> all the new (non-reply) text from a message together with
> the relevant paragraph nid information for each paragraph.
> It will do this for all the messages currently in the web archive.
> 
> All I have to do now is:
> Grab a suitable list of stopwords off the net to feed the
> hashing exclusion.
> Knock together a few hash data structures to
> build the index data.
> Build an output routine to throw this into some neat HTML pages
> and bingo! we'll all have a keyword access to the archive, and
> secondly we will all be able to make whatever lovely graphs we
> all feel like making out of the lexical-locator data.
> 
> Maybe those bits are difficult. Maybe they aren't.
> But I'm not quitting yet.
> 
> Oh yeah, the end result should generalise to any mhonarc mail output.
> 
> Enough talking. I'm busy.
> 
> --
> Peter
> 
> ----- Original Message -----
> From: <cdent@burningchrome.com>
> To: <ba-ohs-talk@bootstrap.org>
> Sent: Friday, April 26, 2002 11:29 PM
> Subject: Re: Great? idea for improving this list (was Re: [ba-ohs-talk]
> Freezope learning environments)
> 
> >
> > [archive_access.practical]
> >
> > On Fri, 26 Apr 2002, Peter  Jones wrote:
> >
> > > What I was suggesting was a system that:
> > >
> > > a) Reads an email and sucks out each word in turn.
> > > b) Each new word has a database record created, and the
> > > locations of occurrence of the term in another related table.
> > > Leaving aside the issue of polysemy for a moment, the
> > > record structure would be something like
> > > PK_ID, word_string <--relation--> FK_ID, location(s).
> > > c) To improve the scanning process, have a subroutine that
> > > discards the stop-words chosen, and clean the database of
> > > these.
> > > d) Repeat for each mail.
> > > e) If a word is re-encountered then only the new location for
> > > the word is inserted in the database in the appropriate new tuple.
> >
> > In what ways are you imaginging this being different from a free
> > text index of the mail archive that gets reindexed every time a
> > new message comes in?
> >
> > > What you then get is an index for every mail in the archive that
> > > contains all the interesting words in all the mails in the archive
> and
> > > the locations in the mails of all those words.
> >
> > Is it that the list of words indexed is more limited?
> >
> > > Sophistication could be added in the read-in phase.
> > > For example, polysemy might be attacked by some algorithm that
> > > makes guesses about the word type based on a grammar.
> > > Locations might be narrowed to paragraphs by chunking them
> beforehand.
> > > And so on.
> >
> > You make this sound easy. After watching the list for a while it
> > is clear that we don't have the collective time for this measure
> > of complexity.  Are we talking about implementing something to
> > use now and experiment and develop, or are we talking about an
> > ideal eventual system that would work in a variety of capacities?
> >
> > We can talk the theory (I'd love to) but that stuff has been
> > beaten to death here and elsewhere. How do we distinguish between
> > the speculative talk and the plans for action?
> >
> > --
> > Chris Dent  <cdent@burningchrome.com>
> http://www.burningchrome.com/~cdent/
> > "Mediocrities everywhere--now and to come--I absolve you all! Amen!
> >  -Salieri, in Peter Shaffer's Amadeus
> >
> >    (05)