transcode_email.pl

From: Eugene Eric Kim (eekim@eekim.com)
Date: Thu Aug 03 2000 - 12:06:07 PDT


Okay, I whipped up some perl code to do some very preliminary transcoding
of RFC822 e-mail to the email.dtd schema. It's available at:

    http://www.eekim.com/ohs/devel/transcode_email.pl

Normal disclaimers apply.

Couple of things. For the purposes of bootstrapping, the only really
important function is transcode_body. There are already a bunch of good
e-mail archiving packages that parse RFC822. What we want to do is create
a function that can be easily integrated into these packages so that all
e-mail archiving programs can support our transcoded e-mail. The two
packages I currently have in mind are MHonArc (Perl) and Mailman (Python).

With this in mind, I didn't bother implementing some header parsing code,
specifically dates and miscellaneous headers. I also didn't implement
MIME attachment parsing code. Both of these should be easy enough using
the appropriate CPAN module, but I was in a hurry.

The transcode_body function is very basic. "Paragraphs" are separated by
newlines. Statement IDs (SID) start at 0 and increment by one. I didn't
include any citation parsing code; that's something we can discuss more
today at the meeting and on the list. (It'll also require some clever
interaction with existing archiving programs so we don't have to reinvent
threading algorithms.)

-Eugene

-- 
+=== Eugene Eric Kim ===== eekim@eekim.com ===== http://www.eekim.com/ ===+
|       "Writer's block is a fancy term made up by whiners so they        |
+=====  can have an excuse to drink alcohol."  --Steve Martin  ===========+



This archive was generated by hypermail 2.0.0 : Tue Aug 21 2001 - 17:57:48 PDT