A scale-out supercomputing environment includes a plurality of interconnected nodes arranged in a three-dimensional cubic grid and configured to perform a method of duplicate detection. The method includes at least computing a fingerprint of at least one document in the supercomputing environment to generate data packets from the at least one document and to generate a fixed size tuple of information from the at least one document, distributing the data packets to each node of the plurality of nodes to ensure all elements of the fixed size tuple fit into memory of the plurality of nodes, applying localized detection techniques to data packets on each node of the plurality of nodes to remove data packet duplicates, redistributing the data packets to each node of the plurality of nodes based on the document fingerprint, and performing a global merge of results of the localized detection techniques.

 
Web www.patentalert.com

< Method and apparatus for implementing a real-time event management platform

> Extendable memory work-stealing

~ 00439