and potentially regress performance. Our toy example isn't complicated enough to show such an issue, but we can take a look at another approach that mitigates it. We only have these tail calculations since we can't guarantee that for every iteration there's always enough data to fill
4x Code Performance with SIMD