[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

[ba-ohs-talk] Greenstone as a HyperScope

http://www.greenstone.org/english/home.html    (01)

I have mentioned Greenstone before.  The more I play with it, the more I 
tend to think that it is a HyperScope.
Here is what I know about it from playing with it and reading its 
documentation.    (02)

It can suck up entire directories (including sub directories) from your 
hard disk.
It can suck up entire web sites (including sub directories <I think>).    (03)

What it does:
It reads the file (types include pdf, ps, doc, txt, html, and some gif/jpg 
type files) and converts them to an intermediate file (gml).
It indexes the gml files.
It also appears to do n-gram and other statistical stuff.
It also appears to have some phrase detection tools.
It says (I haven't seen it yet) it has a corba interface.    (04)

If you want to add file types for it to handle, you just write a small perl 
script to do the job and include that script in your "collection" 
configuration file.    (05)

Greenstone and all its internal programs are GPL.  With a corba interface, 
we can create a HyperScope interface and just let it do all the internal work.    (06)

There is another initiative behind Greenstone, that of doing datamining in 
the Greenstone collections.  That's precisely where I hope it will go soon, 
though Greenstone appears to be linked tightly into some PhD projects, 
meaning it might be several years before it gets the datamining tools out 
for us to play with.    (07)

I suspect that Greenstone is a great candidate (I've said this before) for 
a prototype HyperScope infrastructure.  We just need to learn how to use it 
and to extend it.    (08)

Jack    (09)