A programmable processor and system for improving the performance of
processors by incorporating an execution unit operable to decode and
execute single instructions specifying three registers each containing a
plurality of data elements, the execution unit operable to multiply the
first and second registers and add the third register to produce a
catenated result containing a plurality of data elements. Additional
instructions provide group floating-point subtract, add, multiply, set
less, and set greater equal operations. The set less and set greater
equal operations produce alternatively zero or an identity element for
each element of a catenated result, the result facilitating alternative
selection of individual data elements using bitwise Boolean operations
and without requiring conditional branch operations.