A VLIW processor has a hierarchy of functional unit clusters that
communicate through explicit control in the instruction stream and store
data in register files at each level of the hierarchy. Explicit
instructions transfer values between sub-clusters through a cluster level
switch network. Transfer instructions issue in dedicated instruction
issue slots in parallel with instructions that perform computation in
functional units. The switch network can perform permutations on the data
being moved. The switch network enables for operands to be broadcast
between the sub-clusters, global register file and memory.