For Box2D version 3.0 I decided to finally try using SIMD as it is meant to be used for solving contacts. Making contacts solve faster could yield large performance gains so I decided it would be worth the effort.

But how can I gather 4 or 8 contact pairs to be solved simultaneously? The key is graph coloring. The idea is to have a handful of colors to be assigned to all the contact constraints. For example, suppose I have 6 colors and I want to assign all the contacts to one of those 6 colors. Contact constraints act upon two bodies at a time. With graph coloring the restriction is that within a single color a body can only appear once or not at all.

I did all this work to enable SIMD processing. Did it help? Box2D has a benchmarking console application to help answer this question. I implemented SSE2, Neon, and AVX SIMD instruction sets in the Box2D contact solver. I also implemented a scalar reference implementation. I have 5 benchmarks scenarios that push Box2D in various ways. See the benchmark results here.

The large pyramid benchmark with 4 workers has the following numbers:

  • AVX2 : 1117 fps = 0.90 ms
  • Neon : 1058 fps = 0.95 ms
  • SSE2 : 982 fps = 1.02 ms
  • scalar (AMD): 524 fps = 1.91 ms
  • scalar (M2): 679 fps = 1.47 ms

The bottom line is that making good use of SIMD can be a lot of work but it is worth the effort because it can make games run significantly faster and handle more rigid bodies.