Methods and systems for information extraction are disclosed. In one such method and system, a sample of related articles is obtained, and an article is selected as a seed article. The distances between sample articles are calculated to determine a set of one or more closest articles to the seed article. The set of closest articles is used to identify information fields containing variable data within the seed article. There are a variety of techniques by which this may be performed, one of which is by using dynamic programming alignment to compute alignments between articles. The information fields are labeled, and a template is generated using the labeled fields. The template is used to extract data from a source article by comparing the source article with the template and associating the variable data of the source article with the labeled fields.

 
Web www.patentalert.com

< Aggregating content of disparate data types from disparate data sources for single point access

> System and method for creating customized ontologies

> Speculative search result based on a not-yet-submitted search query

~ 00504