A system and method for using wider data paths within Processing Elements
(PEs) of a Massively Parallel Array (MPP) to speed the computational
performance of the PEs and the MPP array while still allowing for use of
the simple 1-bit interconnection network to transfer data between PEs in
the MPP is disclosed. A register having a data width equal to the data
width of the PE for holding data for movement from one PE to another is
provided in each PE. The register can be loaded in parallel within the
PE, and operated as a shift register to transfer a fill data width word
from one PE to another PE using a 1-bit wide serial interconnection.