[Date Prev] [Date Next] [Thread Prev] [Thread Next] Indexes: Main | Date | Thread | Author

Re: [ba-ohs-talk] NekoHTML scanner-balancer


Jack Park wrote:    (01)

> http://www.apache.org/~andyc/nekohtml/doc/index.html
> Java, Apache
> 
> "NekoHTML is a simple HTML scanner and tag balancer that enables 
> application programmers to parse HTML documents and access the 
> information using standard XML interfaces. The parser can scan HTML 
> files and "fix up" many common mistakes that human (and computer) 
> authors make in writing HTML documents. NekoHTML adds missing parent 
> elements; automatically closes elements with optional end tags; and can 
> handle mismatched inline element tags.
> NekoHTML is written using the Xerces Native Interface (XNI) that is the 
> foundation of the Xerces2 implementation. This enables you to use the 
> NekoHTML parser with existing XNI tools without modification or 
> rewriting code. "
>     (02)

> This is likely a required widget for any HyperScope project.    (03)


I'm currently using JTidy in Ceryle, but am not particularly
happy with it. I've sent a message to Andy Clark at Apache
asking if there are any plans to have NekoHTML produce XHTML
rather than simply an API to an HTML document. I'm currently
storing HTML in Ceryle as plaintext, so it'd be nice to have
well-formed HTML even if it's not XHTML. I'll check this out.    (04)

Murray    (05)

......................................................................
Murray Altheim                  <http://kmi.open.ac.uk/people/murray/>
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK    (06)

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu    (07)