Re: [ba-ohs-talk] Greenstone as a HyperScope
Sounds like a winner, Jack. Now that Java 1.4 has regular
expressions, though, I've done my last Perl hacking. I now
have the best of both worlds! (01)
Jack Park wrote: (02)
> I have mentioned Greenstone before. The more I play with it, the more I
> tend to think that it is a HyperScope.
> Here is what I know about it from playing with it and reading its
> It can suck up entire directories (including sub directories) from your
> hard disk.
> It can suck up entire web sites (including sub directories <I think>).
> What it does:
> It reads the file (types include pdf, ps, doc, txt, html, and some gif/jpg
> type files) and converts them to an intermediate file (gml).
> It indexes the gml files.
> It also appears to do n-gram and other statistical stuff.
> It also appears to have some phrase detection tools.
> It says (I haven't seen it yet) it has a corba interface.
> If you want to add file types for it to handle, you just write a small perl
> script to do the job and include that script in your "collection"
> configuration file.
> Greenstone and all its internal programs are GPL. With a corba interface,
> we can create a HyperScope interface and just let it do all the internal work.
> There is another initiative behind Greenstone, that of doing datamining in
> the Greenstone collections. That's precisely where I hope it will go soon,
> though Greenstone appears to be linked tightly into some PhD projects,
> meaning it might be several years before it gets the datamining tools out
> for us to play with.
> I suspect that Greenstone is a great candidate (I've said this before) for
> a prototype HyperScope infrastructure. We just need to learn how to use it
> and to extend it.
> Jack (03)