A method and apparatus are provided for determining when electronic documents stored in a large collection of documents are similar to one another. A plurality of similarity information is derived from the documents. The similarity information may be based on a variety of factors, including hyperlinks in the documents, text similarity, user click-through information, similarity in the titles of the documents or their location identifiers, and patterns of user viewing. The similarity information is fed to a combination function that synthesizes the various measures of similarity information into combined similarity information. Using the combined similarity information, an objective function is iteratively maximized in order to yield a generalized similarity value that expresses the similarity of particular pairs of documents. In an embodiment, the generalized similarity value is used to determine the proper category, among a taxonomy of categories in an index, cache or search system, into which certain documents belong.

 
Web www.patentalert.com

< Reviewing cached user-group information in connection with issuing a digital rights management (DRM) license for content

< Method and apparatus for displaying intermediate content messages in the unused portion of a web browser display space

> Serial number mask and check digit for electronic registration system (ERS)

> Creating web pages category list prior to the list being served to a browser

~ 00233