Add SIMD for non zero checks
Both counting non-zeroes and finding non-zero values. Great performance gain
when decoding. The same gain is also for the encoder when not using the
operation vector.
Signed-off-by: Anders Martinsson <anders.martinsson@intinor.se>