optimization at all shouldn't be much of a surprise. If we take a look with a profiler, we'll see a significant amount of time is just spent on iterator functionality. We could drill in deeper, but let's carry on to a more reasonable compilation.
4x Code Performance with SIMD