The invention is a computer-based method for analyzing two versions of an HTML document that identifies new or changed areas of the document while preserving the original textual formatting, including embedded graphics. An HTML document is scanned and the information organized into groupings of HTML tags and text. A set of rules determines which HTML tags are permitted within a group, and which mark the start of a new group. Tags that mark the start of a new group are usually those that break the flow of text when an HTML page is rendered. As a result, the text that constitutes a paragraph, embedded hypertext links, and any associated HTML character-formatting elements are contained within a single group. A modified version of the same HTML document is similarly processed. At this point, the two can be compared group by group in order to detect differences. Any group that does not match the associated group in the original is considered to be a modified group. The modified groups can then be inserted as sections into a new HTML document, and these sections appear to have nearly all of the original formatting intact. Thus, they appear as clipped sections from the original HTML document, and are useful for depicting regions of interest.

 
Web www.patentalert.com

< Method and system for instruction tracing with enhanced interrupt avoidance

> Visual administrator providing java management bean support

> Method and system for analyzing source code

~ 00577