Automating Object Data Compaction to Expanded Numeric Types

by Finnian Reilly (modified: 2023 Nov 03)

Introduction

A common memory saving technique in programming is to represent the expanded attributes (or fields) of an object as a single numeric type using bit-masks and bit-shifting. Compacted fields enable faster sorting as well as facilitate saving data to a binary file.

This method is error prone and tedious to code. This article presents the idea of using reflection mechanisms to automate the bit-twiddling code necessary to pack and unpack the attribute into a single numeric type.

Manual Compaction

The manual method of compaction is best illustrated by the DATE class from $ISE_LIBRARY/time/time.ecf which most Eiffel coders should be familiar with. As you can see in the listing, this method necessitates the introduction of 2 extra constants per attribute to specify the bit-mask and bit-shift. Even using a programming calculator, calculating the correct masks and shifts is error prone. You then have to write a lot of bit-twiddling code to convert to and from the compacted object. On top of that the fields may not necessarily be of the same type as the compacted number, requiring additional type conversion operations.

DATE inherits DATE_VALUE

class DATE_VALUE feature -- Access day: INTEGER -- Day of the current object do Result := ordered_compact_date & day_mask end month: INTEGER -- Month of the current object do Result := (ordered_compact_date & month_mask) |>> month_shift end year: INTEGER -- Year of the current object do Result := ((ordered_compact_date & year_mask) |>> year_shift) & 0x0000FFFF end ordered_compact_date: INTEGER -- Year, month, day coded for fast comparison between dates. feature -- Element change set_date (y, m, d: INTEGER) -- Set `year' with `y', `month' with `m' and `day' with `d'. local l_date: like ordered_compact_date do -- Same as l_date := l_date & day_mask.bit_not l_date := l_date | d l_date := l_date & month_mask.bit_not l_date := l_date | (m |<< month_shift) l_date := l_date & year_mask.bit_not l_date := l_date | (y |<< year_shift) ordered_compact_date := l_date end feature {NONE} -- Implementation day_mask: INTEGER = 0x000000FF month_mask: INTEGER = 0x0000FF00 year_mask: INTEGER = 0xFFFF0000 -- Mask used to extract/set `year', `month' and `day'. year_shift: INTEGER = 16 month_shift: INTEGER = 8 day_shift: INTEGER = 0 -- Shift needed to extract/set `year', `month' and `day'. end -- class DATE_VALUE

Automated Compaction

To save development time, what is needed is a way of mapping any type of expanded field by name to bit-masks specified as a 1 based range. The range 1 .. 8 for example, can be mapped to either a NATURAL_8 type or CHARACTER_8. The range 9 .. 9 can be mapped to type BOOLEAN. After the map is specified we should not have to write any bit-twiddling code at all.

The test class illustrates an automated technique using reflection mechanisms and design-by-contract to spot potential errors.

In this example, the Eiffel-Loop implementation of EL_REFLECTED_FIELD_BIT_MASKS is used in conjunction with COMPACTABLE_DATE to compact 3 INTEGER fields into one NATURAL_64. However the types can be any the 14 expanded types found in the ELKS library.

As you can see in the example the code for compact_value and make_by_compact comes for free.

class COMPACTABLE_DATE inherit EL_COMPACTABLE_REFLECTIVE rename compact_integer_32 as compact_date, make_from_integer_32 as make_from_compact_date, set_from_integer_32 as set_from_compact_date end create make, make_from_compact_date feature {NONE} -- Initialisation make (a_year, a_month, a_day: INTEGER) do year := a_year; month := a_month; day := a_day end feature -- Access day: INTEGER -- Day of the current object month: INTEGER -- Month of the current object year: INTEGER -- Year of the current object feature {NONE} -- Constants Field_masks: EL_REFLECTED_FIELD_BIT_MASKS once create Result.make (Current, "[ day := 1 .. 8 month := 9 .. 16 year := 17 .. 32 ]") end end -- class COMPACTABLE_DATE

Supporting Contracts

The classes EL_REFLECTED_FIELD_BIT_MASKS and COMPACTABLE_DATE contain the following design contracts to ensure that the bit-compaction and conversion is error-free.

  1. Require that bit-formatting text is correctly indented and the field names are valid
  2. Require that all expanded fields have been included in the formatting text.
  3. Require that ranges are properly formatted as either a .. b or a, where a and b are natural numbers.
  4. Require that any compacted NATURAL_64 value being converted to another NUMERIC type is less than or equal to 2^n - 1 where n is the upper-most mask-bit position (upper_bit).
  5. Require that any compact value used to initialise a compact-able object and conforming to NUMERIC must be less than or equal to 2^n - 1 where n is the upper-most mask-bit position (upper_bit).
  6. Ensure the generated bit-masks do not not overlap
  7. Ensure bit-ranges fit into a NATURAL_64 type
  8. Check that each bit-mask is wide enough to accommodate the field value currently being assigned.

A Real World Example

The class EL_HACKER_INTERCEPT_SERVLET is used to intercept hacking attempts into the websites myching.software and matryoshka.software. The servlet maintains a hash table of IP addresses mapped to a compact NATURAL_64 representation of class EL_FIREWALL_STATUS. This class has 3 attributes of types: INTEGER, NATURAL_16 and BOOLEAN. Any time you want to work with a table item, you must first initialise the shared constant of type EL_FIREWALL_STATUS. If you make any changes to the status you must then update the table with the new compacted value. This saves on garbage collection.

If the request url matches a configured pattern, the firewall is instructed to block packets from that IP address for a number of days.

(For the sake of brevity, many class features have been omitted in this source listing)

class EL_HACKER_INTERCEPT_SERVLET inherit FCGI_HTTP_SERVLET feature -- Access firewall_status_table: HASH_TABLE [NATURAL_64, NATURAL] -- map IP number to compact form of `EL_FIREWALL_STATUS' data feature {NONE} -- Constants Firewall_status: EL_FIREWALL_STATUS once create Result end end -- class EL_HACKER_INTERCEPT_SERVLET class EL_FIREWALL_STATUS inherit EL_COMPACTABLE_REFLECTIVE rename compact_natural_64 as compact_status, make_from_natural_64 as make_from_compact, set_from_natural_64 as set_from_compact end EL_SHARED_SERVICE_PORT create default_create, make_from_compact feature -- Access blocked_ports: EL_ARRAYED_LIST [NATURAL_16] -- list of blocked ports do create Result.make (3) if http_blocked then Result.extend (Service_port.HTTP) end if smtp_blocked then Result.extend (Service_port.SMTP) end if ssh_blocked then Result.extend (Service_port.SSH) end end feature -- Measurement compact_date: INTEGER -- date of intercept feature -- Status query http_blocked: BOOLEAN smtp_blocked: BOOLEAN ssh_blocked: BOOLEAN feature {NONE} -- Constants Field_masks: EL_REFLECTED_FIELD_BIT_MASKS once create Result.make (Current, "[ compact_date := 1 .. 32 http_blocked := 33 smtp_blocked := 34 ssh_blocked := 35 ]") end end -- class EL_FIREWALL_STATUS

Benchmark Comparison

However the convenience of automated compaction does come at the cost of performance, with manual compaction substantially outperforming automated compaction. The benchmark test uses a round-trip of initialising by component, then initialising by compacted value, and finally comparing the attributes to original inputs.

COMPACTABLE_DATE versus DATE_VALUE

Passes over 1000 millisecs (in descending order) Manual compaction : 6187.0 times (100%) Automated compaction : 1535.0 times (-75.2%)

Round-trip Test Code

feature {NONE} -- Implementation automated_date_compaction local date: COMPACTABLE_DATE; i, y, m, d: INTEGER compact_date: NATURAL_64 do y := 2023; m := 10; d := 25 from i := 1 until i > 1000 loop create date.make (y, m, d) compact_date := date.compact_date create date.make_from_compact_date (compact_date) if date.day /= d or date.month /= m or date.year /= y then lio.put_line ("Conversion failed") i := 10_000 else i := i + 1 end end end