A method (and structure) of executing a matrix operation, includes, for a
matrix A, separating the matrix A into blocks, each block having a size
p-by-q. The blocks of size p-by-q are then stored in a cache or memory in
at least one of the two following ways. The elements in at least one of
the blocks is stored in a format in which elements of the block occupy a
location different from an original location in the block, and/or the
blocks of size p-by-q are stored in a format in which at least one block
occupies a position different relative to its original position in the
matrix A.