Architecture for detecting and removing obfuscating clutter from the
subject and/or body of a message, e.g., e-mail, prior to filtering of the
message, to identify junk messages commonly referred to as spam. The
technique utilizes the powerful features built into an HTML rendering
engine to strip the HTML instructions for all non-substantive aspects of
the message. Pre-processing includes pre-rendering of the message into a
final format, which final format is that which is displayed by the
rendering engine to the user. The final format message is then converted
to a text-only format to remove graphics, color, non-text decoration, and
spacing that cannot be rendered as ASCII-style or Unicode-style
characters. The result is essentially to reduce each message to its
common denominator essentials so that the junk mail filter can view each
message on an equal basis.