even scalar versions of the code. This isn't always an issue, but can contribute to larger binaries and potentially regress performance. Our toy example isn't complicated enough to show such an issue, but we can take a look at another approach that mitigates it. We only have these tail
4x Code Performance with SIMD