The process generates a parser to extract records from a set of documents. The process operates on a sample document from the set. The sample document is an XML document or is converted to an XML document. Simple Xpaths of the XML document are identified. Complex extensions of the simple Xpath are clustered according to common substructures. The complex Xpath clusters are scored according to content in instances or differences in content among instances. Candidate parsers are created. Each candidate consists of a single record Xpath and one or more field value Xpaths that are descendents of the record Xpath. The candidate parsers are ranked using the Xpath scores.

 
Web www.patentalert.com

> System and method for performing error recovery in an integrated development environment

~ 00342