Most SIMD instructions require a specific memory alignment to work optimally. As discovered on bug 4328, we're not always doing this properly. It could be worth investigating if we can improve alignment and thereby improve performance.
This bug is just an idea/investigation at this point. Reducing it to a comment on bug 5106 instead.