[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] SUN's Conceptual Indexing Project for Precision Content Retrieval


This is worth a read
http://research.sun.com/research/knowledge/technology.html    (01)

"Key Ideas
behind the
Technology
------------------------------------------------------------------------
--------
"Making a difference  We have found that techniques from knowledge
representation and natural language processing can make a useful
contribution to solving the paraphrase problem. By searching a
structured conceptual taxonomy of the words and phrases extracted from a
collection of documents, our algorithms can effectively connect terms in
a query with appropriate related terms in document passages.    (02)

"The problem with synonyms
  A common approach to the paraphrase problem is to use tables of
synonyms to automatically expand queries by adding terms that are
recorded as "synonymous." However, there are few real synonyms in
English, so the common practice is to include related words as if they
were synonyms. However, treating terms this way when they are not really
synonyms introduces a level of granularity that trades off precision for
recall. There is no a priori correct level for this tradeoff - different
information needs require different levels of generality - so this
technique often degrades retrieval rather than improving it.
As an alternative to synonym classes, we use taxonomic subsumption
algorithms that exploit generality (subsumption) rather than synonymy to
connect terms in queries with passages that contain more specific terms
as well as the requested terms. These algorithms do not automatically
explore more general terms, so the level of generality is controlled by
your choice of query terms. For example, if you ask for "motor vehicles"
you would get trucks, buses, cars, etc., but if you ask for
"automobiles" you would get cars and taxicabs, but not trucks and buses.    (03)


"Taxonomies
  Using knowledge bases of general semantic facts, structured conceptual
taxonomies (a type of semantic network) can be constructed from words
and phrases. These words and phrases can be extracted automatically from
text and parsed into conceptual structures. The taxonomy can be
organized by the most-specific-subsumer (MSS) relationship, where each
concept is linked to the most specific concepts that subsume it - i.e.,
that are more general than it is. Terms in a query are individually
matched with corresponding concepts in the taxonomy together with their
subconcepts.
For example, given the general semantic facts that "washing" is a kind
of "cleaning" and "car" is a kind of "automobile", an algorithmic
classification system can automatically classify "car washing" as a kind
of "automobile cleaning". A query for "automobile cleaning" or
"automobile wahing" will immediately retrieve hits for "car washing"."    (04)

--
Peter    (05)


----- Original Message -----
From: "Murray Altheim" <m.altheim@open.ac.uk>
To: <ba-ohs-talk@bootstrap.org>
Sent: Monday, March 25, 2002 11:01 AM
Subject: Re: [ba-ohs-talk] SUN's Conceptual Indexing Project for
Precision Content Retrieval    (06)


> John J. Deneen wrote:
>
> > Anybody know if this is open source technology, like StarOffice?
> >
> > Knowledge Technology Group
> > Bill Woods, PI
> > <  http://research.sun.com/people/william.woods >
> >
> > Conceptual Indexing Project for Precision Content Retrieval
> > <
> >
http://research.sun.com/nova/cgi-bin/index.cgi?a=b&c=research&amp;con=co
ntent&amp;i=1016854275
>
>
> I can't speak from direct knowledge, but knowing that this is
> Jacek Ambrosiak's project, that there's a lot of invested time
> and effort, I'm guessing it might be distributed as a binary,
> but I would seriously doubt they'd dump it into open source, or
> even distribute the source. It's a possible money maker for Sun,
> and in today's tighter times I don't think so many people are as
> likely to be giving the show away as they were two years ago.
>
> Murray
>
> ......................................................................
> Murray Altheim                         <mailto:m.altheim @ open.ac.uk>
> Knowledge Media Institute
> The Open University, Milton Keynes, Bucks, MK7 6AA, UK
>
>       In the evening
>       The rice leaves in the garden
>       Rustle in the autumn wind
>       That blows through my reed hut.  -- Minamoto no Tsunenobu
>
>    (07)