Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.

 
Web www.patentalert.com

< Multi-language document search and retrieval system

> Information processing apparatus and method, and recording medium on which a program for executing the information processing is recorded

~ 00430