For context based compression techniques, for example Context Based YK
compression, a method and system for grouping contexts from a given
context model together to create a new context model that has fewer
contexts, but retains acceptable compression gains compared to the
context model with more contexts is provided. According to an exemplary
embodiment a set of files that are correlated to the file to be
compressed (hereafter called training files) are read to determine, for
an initial context model, the empirical statistics of contexts and
symbols. In some embodiments, this includes determining the estimated
joint and conditional probabilities of the various contexts and symbols
(or blocks of symbols). The initial context model is then reduced to a
desired number of contexts, for example, by applying a grouping function
g to the original set of contexts to obtain a new and smaller set of
contexts. In some embodiments the step of applying a grouping function
comprises iteratively grouping a pair of contexts together to form a
grouped context, wherein each grouped context represents a local minimum
based on the empirical statistics.