A data processing method and system for retrieving a subset of k items from a database of n items (n.gtoreq.k) firstly determines a limited set of bk items (b>1) in the database which have the greatest similarity to an input query t according to a given similarity function S. A result subset is then constructed by including as a first member the item having the greatest similarity S to the query t, and iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of both similarity to the input query t and relative diversity RD with respect to the items already in the results subset. In this way the diversity of the results subset is greatly increased relative to a simple selection of the k most similar items to the query t, with only a modest additional increase in processing requirements.

 
Web www.patentalert.com

> Processing fixed-format data in a unicode environment

~ 00329