A safer way to get/set parts of a compact number using contracts

by Finnian Reilly (modified: 2023 Feb 02)

Introduction

It is a common memory saving technique to combine multiple numeric parts into a single integer or natural number. A good example is compact_time: INTEGER from class TIME_VALUE which combines the seconds, minutes and hours into a single integer. This is achieved using bit-masks and some bit-arithmetic. However these techniques generally do not have adequate contracts to protect against incorrect mask values or shift-count values. The Eiffel-Loop base library now has some utility classes to make setting numeric parts, not only safer, but easier. They have no dependency outside of the Eiffel kernel classes so can easily be just copied into your project.

TIME_VALUE as an example

The following code fragment demonstrates getting and setting the hour part of the compact_time integer.

class TIME_VALUE feature -- Access hour: INTEGER do Result := (compact_time & hour_mask) |>> hour_shift end compact_time: INTEGER -- Hour, minute, second coded. feature -- Element change set_hour (h: INTEGER) do compact_time := compact_time & hour_mask.bit_not compact_time := compact_time | (h |<< hour_shift) end feature {NONE} -- Implementation Hour_mask: INTEGER = 0x00FF0000 Hour_shift: INTEGER = 16 end -- class TIME_VALUE

We can identify the following possible sources of programmer error which are not protected by any precondition or postcondition.

  1. hour_mask might not consist of a sequence of consecutive binary ones.
  2. hour_mask might not be big enough to accommodate the hour value
  3. The value of Hour_shift might not be consistent with the number of trailing binary zeros in hour_mask
  4. There is no "reversible" post-condition on hour function.

Of course it is possible to add all these contracts to TIME_VALUE but it would be tedious and add a lot of extra code.

A bit-routines class to the rescue

By using the class EL_INTEGER_32_BIT_ROUTINES the above code can be rewritten as follows, and as a bonus you get contracts for checking the mentioned sources of error for free. See ancestor class EL_NUMERIC_BIT_ROUTINES

class TIME_VALUE feature -- Access hour: INTEGER local i32: EL_INTEGER_32_BIT_ROUTINES do Result := i32.isolated (compact_time, hour_mask) end compact_time: INTEGER -- Hour, minute, second coded feature -- Element change set_hour (h: INTEGER) local i32: EL_INTEGER_32_BIT_ROUTINES do compact_time := i32.inserted (compact_time, hour_mask, h) end feature {NONE} -- Implementation Hour_mask: INTEGER = 0x00FF0000 end -- class TIME_VALUE

You will notice that there is no longer any need to specify the hour_shift as this is implicit in the number of trailing zeros of the mask. So this possible source of error is eliminated.

Thanks to a nice little algorithm found on Stackoverflow the shift count is calculated in only 7 steps. (8 for a 64-bit number).

Benchmark

I tried caching the shift_count in a hash table but it actually made the routine slower not faster. This shows how efficient the branching algorithm is. The iterative method uses repeated right-shifting but it is very slow compared to the branching one on Stackoverflow.

Output of class BIT_SHIFT_COUNT_COMPARISONPasses over 1000 millisecs (in descending order) NATURAL branching method : 360.6 times (100%) NATURAL cached branching method : 343.3 times (-4.8%) NATURAL iterative method : 169.7 times (-53.0%)

Class Hierarchy

To date there are only routines for 32-bit and 64-bit numbers.

EL_NUMERIC_BIT_ROUTINES*

EL_NATURAL_32_BIT_ROUTINES

EL_NATURAL_64_BIT_ROUTINES

EL_INTEGER_BIT_ROUTINES*

EL_INTEGER_32_BIT_ROUTINES

EL_INTEGER_64_BIT_ROUTINES

ADDENDUM

Since writing this article I discovered two other ways to calculate the position of an LSB. Using De Bruijn sequences was suggested to me by Chat GPT. I also had the idea to look for built-in compiler functions.

gcc: __builtin_ctz

MSVC (since 2005): _BitScanReverse

As you can see, built-in functions are way faster.

Passes over 1000 millisecs (in descending order) NATURAL gcc __built_in_ctz : 4713.0 times (100%) NATURAL de Bruijn method : 1682.3 times (-64.3%) NATURAL branching method : 412.8 times (-91.2%) NATURAL cached branching method : 398.2 times (-91.6%) NATURAL iterative method : 200.0 times (-95.8%)