Method and apparatus for formatting OCR text

   
   

Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).

 
Web www.patentalert.com

< Piezoelectric element driving circuit and driving method

< System and method for automatically detecting edges of scanned documents

> Logic based tagging for hyperacuity rendering of an input image with a 5.times.5 context

> Peripheral device for image forming apparatus, image forming apparatus, image forming system and its control method

~ 00113