we can see that they are now the packed variants, which simply means that we're taking advantage of processing multiple elements at once. These particular instructions are using the XMM registers, which load 128-bits, or 4 floats at a time. But how is that possible if values need to be
4x Code Performance with SIMD