MSWord2k Cleanser Tool: Re: [ba-ohs-talk] (not) eating your own dogfood
I've had this sitting on my site for a while.
It works pretty well on the whole. The new (untested) alpha script is
probably better if it works (it should).
I haven't been able to test it because I don't have Word on this machine so
I can't check outputs properly.
I wrote it precisely because Raggett's Tidy totally killed the utility of
the files I needed to clean at the time. (02)
It uses perl, James Clark's SP and Michael Kay's InstantSaxon XSLT engine
(on Windows OS). Small tweaks required to get it going on Linux. (03)
If you want to turn it into a Java App with greater control over whether it
cleans MSWord HTML or other HTML, feel free.
It's pretty well documented, but if you need extra explanations just let me
----- Original Message -----
From: "Murray Altheim" <email@example.com>
Sent: Monday, February 25, 2002 10:45 AM
Subject: Re: [ba-ohs-talk] (not) eating your own dogfood (07)
> Eugene Eric Kim wrote:
> > A fella in Finland decided to check the homepages of the W3C's 506
> > organizations for valid HTML or XHTML. Only 18 sites validated.
> > http://homepage.mac.com/marko/20020222.html
> The big problem in web design is that almost nobody hand edits their
> markup or even pays attention to it, and the GUI WYSIWYTYG (what you
> see is what you think you get) editors in general produce some really
> I challenge anyone to export "HTML" from MS Word and look at what it
> creates. Amazing.
> But I don't see that there's much to be done about this, given that
> the emphasis from the W3C has never been much along the lines of
> valid markup. It sometimes seems that they've done everything they
> could to kill the use of the DTD, such that as a DTD and validation
> advocate I often felt I was swimming upstream. While Tidy was initially
> produced by Dave Raggett of the W3C, it itself doesn't produce valid
> markup in many cases -- I've had to edit its output as well.
> My guess is that those 18 sites may be managed by a validation zealot
> like me, or had some type of company policy dictated by one. In the
> end all one can do is produce better tools, or agitate for them, such
> as this guy in Finland.
> With the existence of XHTML and XML tools, it's actually pretty easy
> to check one's markup nowadays, and even clean it up, so it's sad to
> see so many corporate sites with poor design under the counter,
> concentrating on flash rather than substance or interoperability.
> But that's not unusual in business, is it?
> Murray Altheim <mailto:m.altheim @ open.ac.uk>
> Knowledge Media Institute
> The Open University, Milton Keynes, Bucks, MK7 6AA, UK
> In the evening
> The rice leaves in the garden
> Rustle in the autumn wind
> That blows through my reed hut. -- Minamoto no Tsunenobu