A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input document into a plurality of document content elements in accordance with a predetermined set of document attributes identifiable from the input document format. The content elements are clustered into selective sets having similar document attributes. The clustered sets are validated with reference to common textual properties organizational content common in documents in the collection. The clustered sets are then categorized into predetermined categories comprising structured elements of the structured document format and the document content elements are organized by hierarchical dependency from the predetermined categories wherein the organized document elements comprise the desired structured document format.

 
Web www.patentalert.com

< Image-forming apparatus having an approach and separation mechanism

> High temperature silicone processing of fuser structure

> Imaging member

~ 00596