The five ways to compute something

by Bernd Schoeller (modified: 2014 Sep 11)

Contents

Example: Statistics on DOUBLE
Pattern 1 - Functional programming
Pattern 2 - Compute on create
Pattern 3 - Compute on demand
Pattern 4 - Explicit compute
Pattern 5 - Lazy compute
Discussion

Optimally, there should always only be one way to do something. Or there should be only one obvious way, which should be the best. And if you diverge from that obvious way, it is "your gun, your foot, your choice".

But, sometimes there are just many ways to achieve the same goal. This is when we have to step back and look at the options at our disposal, analyse pros and cons of the different paths ahead, and develop insights and best practices.
something". So, it purpose is not to model some abstract data type, or to have a longer lifespan within the system. Instead, its only purpose is to derive some output values from some input values, using computation.

There might be reasons for not implement this computation in the class that abstracts the data. For example, it might be too problem specific. Or we want to keep the interface of the class contain the data clean. Or we want to reduce dependencies. It might also be that the computation requires inputs from numerous abstractions and could not easily place the computation in any one of them.

Whatever the reason, we end up with one "calculator" class whose only job is to contain the mathematical function we need.

Example: Statistics on DOUBLE

A good example for such a problem is a class that computes statistics for a given ARRAY[DOUBLE]. We need three statistic values on the array of doubles: minimum, maximum and average.

We do not want to subclass ARRAY[DOUBLE] to include our code, because this would not represent an abstraction by itself.

We might already have an application that uses ARRAY[DOUBLE] everywhere, and we have no control on object creation.

Adding it to ARRAY is also no option, because not all generic arguments are numbers.

I have identified five different patterns to do such a computation in Eiffel. In the following sections, I will describe each one of these patterns, and list what I perceive as the pros and cons for using this pattern.

Pattern 1 - Functional programming

class STATISTICS feature -- Support functions average (a_array: ARRAY[DOUBLE]): DOUBLE -- Average of `a_array' do ... Compute average ... end maximum (a_array: ARRAY[DOUBLE]): DOUBLE -- Maximum of `a_array' do ... Compute maximum ... end minimum (a_array: ARRAY[DOUBLE]): DOUBLE -- Minimum of `a_array' do ... Compute minimum ... end end

PRO: Very simple
PRO: Can be used by creating an instance or inheriting from STATISTICS
PRO: Creates very compact code when used
CONTRA: No benefits of OO
CONTRA: Option/operand seperation not possible
CONTRA: All values are computed individually

Pattern 2 - Compute on create

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Create statistics for `a_array' of inputs. do ... Compute average, minimum, maximum ... end feature -- Access average: DOUBLE -- Average value maximum: DOUBLE -- Maximum value minimum: DOUBLE -- Minimum value end

PRO: Simple structure
PRO: Values can be computed together
CONTRA: Option/operand separation not possible
CONTRA: Unintuitive to have complex computations on creation

Pattern 3 - Compute on demand

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Initialize statistics for `a_array' of inputs. do target := a_array ensure target_set: target = a_array end feature -- Access target: ARRAY[DOUBLE] -- Target of the computation average: DOUBLE -- Average value of `target' do ... Computate average ... end maximum: DOUBLE -- Maximum value of `target' do ... Compute maximum ... end minimum: DOUBLE -- Minimum value of `target' do .. Compute minimum ... end end

PRO: Well readable OO code
PRO: Compute when the value is needed
PRO: Allows the underlying data to change
CONTRA: All values are computed individually
CONTRA: Needs to keep a reference to the target (prevents GC)

Pattern 4 - Explicit compute

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Initialize statistics for `a_array' of inputs. do target := a_array computed := False ensure target_set: target = a_array not_computed: not computed end feature -- Access target: ARRAY[DOUBLE] -- Target of the computation computed: BOOLEAN -- Have the results been computed? average: DOUBLE -- Average value of `target' require computed: computed do Result := internal_average end maximum: DOUBLE -- Maximum value of `target' require computed: computed do Result := internal_maximum end minimum: DOUBLE -- Minimum value of `target' require computed: computed do Result := internal_minimum end feature -- Computation compute -- Compute the statistics require not_computed: not computed do ... Compute internal_average, internal_minimum, internal_maximum ... computed := True target := Void ensure computed: computed end feature {NONE} -- Implementation internal_average: DOUBLE -- Average value internal_maximum: DOUBLE -- Maximum value internal_minimum: DOUBLE -- Minimum value end

PRO: Very explicit when the computation is performed
PRO: Clean, contracted abstraction of the computation
PRO: Option/operand separation very intuitive
CONTRA: Verbose when written
CONTRA: Verbose when used

Pattern 5 - Lazy compute

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Initialize statistics for `a_array' of inputs. do target := a_array computed := False ensure target_set: target = a_array end feature -- Access target: ARRAY[DOUBLE] -- Target of the computation average: DOUBLE -- Average value of `target' do ensure_computed Result := internal_average end maximum: DOUBLE -- Maximum value of `target' do ensure_computed Result := internal_maximum end minimum: DOUBLE -- Minimum value of `target' do ensure_computed Result := internal_minimum end feature {NONE} -- Implementation ensure_computed -- Ensure the results are available do if not computed then ... Compute internal_average, internal_minimum, internal_maximum ... computed := True end ensure computed: computed end computed: BOOLEAN -- Have the results been computed? internal_average: DOUBLE -- Average value internal_maximum: DOUBLE -- Maximum value internal_minimum: DOUBLE -- Minimum value end

PRO: Clean, contracted abstraction of the computation
PRO: Option/operand separation possible
CONTRA: Hides when the computation is performed
CONTRA: Extremely verbose when written

Discussion

I have seen all of these patterns being used somewhere in real code. From my personal experience, I try to use pattern 4 when possible. While this is the most verbose when used, it also makes itvery explicit when the computation is performed. The benefit of this is that option/operand separationfeels very natural and easy to understand from a client perspective (just configure everything andthe "push the compute button").

Pattern 1 is nice if the computations are extremely simple and probably not going to change, forexample for primitive mathematical operations like 'absolute' or 'square_root'.

I had to struggle with pattern 2 a few times, because it really does not feel intuitive that complexcomputations are done on object creation. Also, the inability to use option/operand separation hurtsclean code in the long run. I try to avoid it.

Pattern 3 feels like a "view" on top of the data. It is nice if the different computations are independent. Aliasing creates its own issues and dangers, and it is also easy to unnecessarily compute the value againand again.

Last, but not least, I consider pattern 5 as "over-engineered". While it looks beautiful from an OO point of view, it hides the point the values are computed. Option/Operand separation becomes tricky.

What do you think?

Comments

David Le Bansais (9 years ago 11/9/2014)
Multithreading

You might want to consider your pattern from the concurrent access point of view as well. I see 3 concerns:
I guess the combination of concerns 2 and 3 means having 4 equally useful patterns. One would use the pattern that best fits their needs.
- Bernd Schoeller (9 years ago 11/9/2014)
  You comments on MT are good, though a discussion on this topic would probably worth an article by itself.
  
  I do not understand merging 2 and 3 - did you mean merging 2 (compute on make) and 4 (explicit compute)?
  - David Le Bansais (9 years ago 11/9/2014)
    Sorry, by 2 and 3 I meant concerns in my own post, not patterns in yours. My point was that finding a balance between performance and memory consumption is one concern, and deciding if the computation should be separated from collecting the result is another, and that it would mean possibly 4 different patterns.
    
    But, in fact, the pure functional approach (your pattern #1) works both in single-threaded and multi-threaded environments, so that makes 3 possibilities, not 4.
    
    I have another observation about your own approach: pattern #5 implicitly requires that data must be available before the computation result is requested, i.e. the call to, say, average, has no parameter. It makes sense. However, in pattern #4 you could have "a_array" be a parameter of "compute" and not stored at creation time. Typically, if your computation is explicit, you will perform it as soon as data is available, and therefore use pattern #2 compute on create.
    - Bernd Schoeller (9 years ago 12/9/2014)
      If the memory consumption and thread-safety are your criteria, then you are perfectly right with your argument.
      
      I would never write an Eiffel application using multi-threading, the language is just not made for that. SCOOP might be a possibility, though. And yes, I know that I have to enable multi-threading for some libraries, but that is not the point.
      
      My arguments are purely on questions of interface design, maintainability and simplicity.
      
      For example, there is a huge difference between pattern 2 and pattern 4. Even if we add a 'make_computed' to pattern 4, it is still much more complex than 2. This is because instances of pattern 4 can always be in one of two states: not computed and computed. This problem does not occur with pattern 2, contractual obligation are significantly reduced.
      
      And to be honest, few people write a classes with two different modes of operation. Normally, you just want to get the job done and continue.
Victorien Elvinger (9 years ago 17/9/2014)
SImple creation procedure and API

Great article!

For simplicity and redeability, I think that the creation procedures should only set up the context within which each of the queries then operate.

I like queries without preconditions on the object-state. Pattern 4 make more complex the API and is more error-prone. However I prefer it over the pattern 5.

I prefer the pattern 1 in simple cases and patterns 3 in more complex ones.

Note: Pattern 4 could be more compact:

class STATISTICS create make feature {NONE} -- Initialization make (a_array: ARRAY [REAL_32]) -- Initialize statistics for `a_array' of inputs. do target := a_array ensure target_set: target = a_array not_computed: not computed end feature -- Status report computed: BOOLEAN -- Have the results been computed? do Result := target = Void end feature -- Access average: REAL_32 -- Average value. require computed: computed attribute end maximum: REAL_32 -- Maximum value. require computed: computed attribute end minimum: REAL_32 -- Minimum value. require computed: computed attribute end feature -- Computation compute -- Compute the statistics. require not_computed: not computed do ... Compute statistics ... target := Void ensure computed: computed end feature {NONE} -- Implementation target: detachable ARRAY [REAL_32] -- Target of the computation. end
- Bernd Schoeller (9 years ago 17/9/2014)
  Thanks for the feedback - good comments.
  
  Hmm - it might be useful to expose 'make' to reinitialize the calculation for a second computation - preventing object creation. But few people do that, so having an extra 'initialize (array)' that is called from make might be better. Undecided on that one ...
  
  While I was coding, I tried to remember the 'attribute' syntax, but kept to the old style. You are right, with 'attribute', it is much cleaner.