Method and apparatus providing a binary representation of a document
storing unstructured data. A unique word identifier is obtained for each
word included in the document. A word select vector includes positions
identified by different word identifiers. A 1-bit value is stored at
positions identified by the word identifiers of the words included in the
document. A unique position identifier is further assigned to each word
appearing in the document. A word use set includes vectors for each
unique word identifier for which a 1-bit is stored in the word select
vector. Each vector in the word use set indicates the position
identifiers of the instances of a particular word included in the
document. Once the binary representation is generated, it may be
efficiently searched to determine whether particular words appear in the
document.