CPU rabbit hole we'll actually find that such calculations can be improved nearly 4-fold all while keeping logic single threaded and as portable as possible. Better yet, the ideas we'll cover are applicable to more than just a toy example and should give helpful considerations
4x Code Performance with SIMD