Matrix multiplication in linear algebra provides a useful problem through which one can investigate optimizations based on local access to memory rather than scattered access, and on the use of pointers in places of array subscripting. Benchmarking results favor a pointer-based implementation with a reordering of the three loops in the definition.
/lp/association-for-computing-machinery/program-optimization-enforcement-of-local-access-and-array-access-via-bsG4z1eysm