instructions, and most importantly, plenty of usage of ZMM registers. The nomenclature here is that XMM registers are for 128-bit instructions, YMM registers are for 256-bit instructions, and ZMM are for 512-bit instructions, which means we're up to processing 16 floats at once.
4x Code Performance with SIMD