A method of estimating set-expression cardinalities over data streams with
guaranteed small maintenance time per data-element update. The method
only examines each data element once and uses a limited amount of memory.
The time-efficient stream synopsis extends 2-level hash-sketches by
randomly, but uniformly, pre-hashing data-elements prior to
logarithmically hashing them to a first-level hash-table. This generates
a set of independent 2-level hash-sketches. The set-union cardinality can
be estimated by determining the smallest hash-bucket index j at which
only a predetermined fraction of the b hash-buckets has a non-empty union
|A.orgate.B|. Once a set-union cardinality is estimated, general
set-expression cardinalities may be estimated by counting witness
elements for the set-expression, i.e., those first-level hash-buckets
that are both a singleton for the set-expression and a set-union
singleton. The set-expression cardinality is the set-union cardinality
times the number of witness elements divided by the number of
hash-buckets.