A computing system and method clean a set of hypertext documents to minimize violations of a Hypertext Information Retrieval (IR) rule set. Then, the system and method performs an information retrieval operation on the resulting cleaned data. The cleaning process includes decomposing each page of the set of hypertext documents into one or more pagelets; identifying possible templates; and eliminating the templates from the data. Traditional IR search and mining algorithms can then be used to search on the remaining pagelets, as opposed to the original pages, to provide cleaner, more precise results.

 
Web www.patentalert.com

< Diagrammatic control of software in a version control system

< Installing software on a mobile computing device using the rollback and security features of a configuration manager

> Automated provisioning framework for internet site servers

> Auto playlist generator

~ 00222