Fun with Generics
We all know about generics as a wonderful help to implement container data structures like lists, sets, bags etc. Much later than Eiffel, object-oriented programming languages like Java and C# have understood that generic parameters are a powerful extension to the type system and remove the need for many casts. So, what else can we do with generics beyond containers?
Lets first look at the basics. With generics, we have to differentiate between class and type. OO languages without generics use these two words interchangeably, but with generics they become different concepts.
A class is what you write: it is a piece of Eiffel code text, describing features, inheritance relations, invariants etc. Classes are instantiated into types. If a class is not generic, then there is exactly one type related to it. For example the class INTEGER defines exactly one type, also called INTEGER. We can think about a non-generic classes to be some kind of type singletons.
When we have generic parameters, we have to provide types to create a type from a class. So, we take the class LINKED_LIST[G], provide it with the type INTEGER and we get the type LINKED_LIST[INTEGER].
First observation: this definition is recursive. To get a type from a generic class, we have to provide it with other types. Recursive, that means that every new type can itself be used again to instantiate a class.
But how many types can we form from a fixed set of classes? There are two answers, depending on whether we talk about static types and dynamic types.
Static types are the types that we use while writing Eiffel code to declare types for entities (arguments, locals, return values) and when creating new objects. Should we reference a generic parameter of the surrounding type, then we replace it by the type of the constrain (or ANY if there is none).
How many static types are there? As the number of static types is bound by the program text, there is always a fixed, finite set of static types for a given Eiffel program. There might be more static types than classes (for example LIST[INTEGER] and LIST[BOOLEAN] creates two types from the single class LIST[G]), but a static analyzer can still easily compute the set of all static types for a given program.
Dynamic types on the other hand are created at run-time. Whenever a new object is created, it gets assigned to be of some type. This type information is then used to resolve dynamic binding of feature invocations and conformance checks on assignment attempts.
How many dynamic types are there? The number of dynamic types is unbounded: we can create any number of types during the execution of a program. Look at the following Eiffel program that creates arbitrary powersets:
Obviously the dynamic type of the resulting set will depend in the input n: if n = 3, the resulting type will be SET[SET[SET[SET[X]]]] (whatever X was when the set was initially created). Of course, the static type of the result is still just SET[ANY] ...
Counting with generics
Now that we have look at generics, lets have a look what we can do with it. We know that we can have an unbounded number of types, so lets use them to implement natural numbers using Peanos axioms:
Now we can use NAT[BOOLEAN] as zero, NAT[NAT[BOOLEAN]] as one, NAT[NAT[NAT[BOOLEAN]]] as two and so forth.
Is the successor of the successor of 1 really 3? Or is 3 the direct successor of one? The type checker will tell us:
The type accepts statement 1, but not statement 2, proving to us this trivial case of mathematics.
Of course, dynamically we can create any integer value now:
But we loose the precise value on the level of the static type, as integer_to_nat still returns only NAT[ANY]. But dynamically, we can recompute the value using assignment attempts:
Now we have managed to store arbitrary integers into objects without using fields. NAT is an immutable INTEGER_REF, as the dynamic type of an object cannot be changed.
Of course, I really do now want to know the overhead connected to this, but this example is made for fun, and purely academic (I am a researcher and I am allowed to do this, but should I ever discover production-level code using this encoding of naturals, I will personally shoot or hang the developer, or both).
Is it useful ?
When does it make sense to define such data-types? Specially as we know that the static type set, thus the part of the software that is statically checkable, remains always bound by the software text. Relying on assignment attempts is very close to the casting-mess we face with languages like Java or C++.
One possible application: we might use the static checker to check properties of data structures by encoding them into types. For example, imagine we have lots of cases where we work with trees and want to make sure that the structures passed in have certain properties.
A concrete example would be mathematical expressions made out of NUMBERs, PLUSes and MULTs. Here is a sketch:
We want to write a function applying distributivity laws. Of course, such code only works on very specific expressions. The signature would look like this:
Now, it would be impossible to pass an expression to distributed that would not have the required form.
Some last remarks
In ETL2, Bertrand Meyer wrote the following definition of the dynamic type set:
The set of possible dynamic types for an entity or expression x is called the dynamic type set of x. [...] It is possible to determine the dynamic type set of x through analysis of the classes in the system to which x belongs, by considering all the attachment and reattachment instructions involving x or its entities. (ETL2, p. 323)
The ECMA Eiffel standard does not make such a claim. We can now see that the dynamic type set cannot be computed at compile time, as it depends on the execution of the program. We have also seen how we can use dynamic and static types to improve the type safety of our code, and to produce horribly inefficient natural numbers ;-)
Last but not least: Although all code presented here is 100% legal Eiffel, do not try to compile this code with any other compiler than EiffelStudio. And do not store nats larger than 1000 using NAT. Bad things will happen ... we are here definitely at the border of what Eiffel compilers can handle.