A computer method, system and code, for representing a natural-language document in a vector form suitable for text manipulation operations are disclosed. The method involves determining (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term.

 
Web www.patentalert.com

< Transaction recognition and prediction using regular expressions

> System and method for inter-relating multiple data types relating to pharmaceutical products, therapies and companies

~ 00438