A method for the joint optimization of language model performance and size
is presented comprising developing a language model from a tuning set of
information, segmenting at least a subset of a received textual corpus
and calculating a perplexity value for each segment and refining the
language model with one or more segments of the received corpus based, at
least in part, on the calculated perplexity value for the one or more
segments.