Disclosed is a method for customizable schema-guided conversion of plain-text documents, rich-text documents and textual data records to an XML-compatible structured form. The method makes substantial use of element content model definitions from a chosen target XML schema/DTD to optimize, closely guide, and disambiguate element pattern matching and recognition. Highly granular structure can be inferred, in best possible conformance with the schema. One embodiment operates based on a finite state machine derived via recursive aggregation of the schema element content models. Additionally disclosed is a method for automated document structuring within the environment of an XML-enabled wordprocessor application. The method entails using the host's API to perform element pattern search and matching and to apply markup to the document in accordance with the inferred XML structure. A GUI framework integrated in the wordprocessor workspace can be provided for developing and executing document conversion/structuring definitions.

 
Web www.patentalert.com

> System and method for tokening documents

~ 00366