With GCC 14, unnecessary move and lxvp instructions appear when unrolling the inner loop for larger sizes. Reducing the loop unroll factor restores performance to GCC 11.