Automorphism

by David Le Bansais (modified: 2010 Aug 15)

eiffel
conversion

Contents

Introduction
Automorphism brings nothing that Eiffel doesn't already have
Examples
1. Strings
2. Integers
Implementation challenges
Syntax
1. Anchored conversion
Practical use
1. Strings
2. Integers
3. Lists
Conclusion

Introduction

Practitioners of Eiffel are used to polymorphism, the separation between the view some piece of code may have over an object, and the actual implementation of that object.

Let's consider a quick example that hopefully will make things clearer for the reader:

class A feature some_query: INTEGER do result := 0 end end class B inherit A redefine some_query end feature some_query: INTEGER do result := 1 end end class CLIENT feature query_result (x: A): INTEGER do result := x.some_query end end

This classic example demonstrates what polymorphism is. When query_result is called with a parameter that is an object of class A, it returns {A}.some_query, i.e. zero. When query_result is called with a parameter that is an object of class B, it returns {B}.some_query, i.e. one.

The fact that, in this example, x is an object of class A or B is not under the control of A, B, or even CLIENT.

Automorphism is the ability for an object of class A to turn into an object of class B on its own will.

Automorphism brings nothing that Eiffel doesn't already have

Before defining precisely how automorphism works (as I have mixed types and classes in the introduction), it's worth mentioning that automorphism isn't new, and everything that it does could still be done before.

To understand this, note that for an object of class A to turn into an object of class B, the code of class A must be aware of the existence of B. It can, therefore, be ready to become a 'B' from the beginning.

class A feature some_query: INTEGER do if I_am_a_B then result := some_query_as_B else result := some_query_as_A end end feature I_am_a_B: BOOLEAN do result := attached {B} b; end feature -- Impersonating class A some_query_as_A: INTEGER do result := 0 end feature -- Impersonating class B some_query_as_B: INTEGER do result := b.some_query end b: detachable B end

I didn't write the creation features, but basically any time the object wants to turn into an object of class B, it can create a new B and attach it to 'b'. Subsequent calls to some_query will return the same result an object of class B would have returned.

In practice, however, it is of course better to just have two classes A and B, and pick the right one when creating objects. The code that turns a A into a B isn't optimized:

It has a call to I_am_a_B at the begining of every features.
It takes more than twice the memory an object of class A would take.

Automorphism is therefore an optimization:

It applies when the optimization becomes necessary during the lifetime of the object (otherwise, just creating an object of class B is easier)
It saves execution time: a conditional statement, and a feature call.
It might save memory, if implemented properly.

Examples

Strings

Eiffel has several STRING classes: STRING_8, STRING_16 and STRING_32. The idea is to associate them to CHARACTER_8, CHARACTER_16 and CHARACTER_32 respectively, and assume that each of these CHARACTER classes is mapped to a sized variant of INTEGER. STRING_8 would be made of characters that can be encoded on 8 bits, STRING_16 on 16 bits and so on.

The standard, then, defines that STRING is mapped to one of STRING_xx classes, left to the choice of the implementer. Nothing stops a programmer from manipulating STRING_32 objects directly, if they want to be sure their code uses unicode strings and is portable. A better option would be to have one STRING class, and let the code of that class choose which storage to use: 8-bits integers for simple English sentences, 16-bits or 32-bits integers for other languages, UTF-8 or UTF-16 encoding being an option.

A problem can arise, though, if a string storing characters on 8 bits needs to store characters from other languages, like Chinese. In practice, it requires the code of STRING to maintain some flag indicating which encoding is used, and the corresponding data structure.

This is typically what programmers are encouraged to avoid, by using polymorphism and classes like STRING_8, STRING_16, STRING_32, STRING_UTF_8, and so on. This is when automorphism can play a role.

With automorphism, STRING objects can begin their life as STRING_8 on an English system, or STRING_UTF_16 on a Chinese system, or any user-defined choice. They can subsequently evolve, becoming STRING_32 for quick pattern searches, STRING_UTF_8 to save memory or for exchange purpose over the Internet, and so on.

The benefit, from the programmer perspective, is the isolation from implementation details, and guaranteed portability from one implementation of Eiffel to another. Although, of course, the quality of optimization that these implementations offer may vary.

Integers

I mentioned it already, integers in Eiffel have sized variants: the INTEGER class is actually a mapping (a shortcut) to one of INTEGER_8, INTEGER_16, INTEGER_32,...

The mapping depends on the implementation, and usually the choice will be to use the size giving the best optimization for speed: INTEGER_16 on a 16-bits processor for instance. This can become an issue, however, if a programmer tests their code on one platform and it's ported to a different processor with smaller integers. It's also a problem if the program at some points generates integers too large to be stored on the selected INTEGER variant, and are silently rounded.

Consider the following code:

percent := (downloaded * 100) // total

It gives a percentage of downloaded data, from the Internet for instance. What isn't obvious except to the seasoned programmer, is that this code will break on most PC if the downloaded file is larger than 22Mb. When downloaded reaches 21 474 837, the value of (downloaded * 100) becomes too large for a 32-bits, signed integer and is rounded to a negative value. And therefore percent quickly becomes negative!

All these problems would disappear if INTEGER was allowed to store values of arbitrary size. There are plenty of libraries providing this support (check the library section), the Eiffel standard could change the definition of INTEGER and each implementation use the library of its choice.

Of course, such an implementation of INTEGER is much slower than the usual INTEGER_32, made to be fast for small numbers. This is when automorphism, again, comes to the rescue.

Most numbers will start their life as a small INTEGER_32. If, during execution, one of them exceeds the size a 32-bits integer can store, it can become an INTEGER_64. If necessary, it can even become an INTEGER, that can store any number, limited only by the memory available on the computer. Automorphism ensures that more than 99.99% of the time, the optimized INTEGER_32 will be used.

The biggest advantage of automorphism in the case of INTEGER is actually portability. The programmer no longer needs to be aware of the target processor, code will execute on 8-bits processor as well as 64-bits processor with no need to worry about it.

The use of an arbitrary-sized integer can be a problem, though, for programmers that expect the Eiffel runtime to generate a boundary overflow exception if an integer becomes too large. This case is handled easily though, if automorphism is turned off for sized variants of INTEGER, or set to generate an exception instead of growing.

Implementation challenges

Before I elaborate on the changes in the language to support automorphism, we need to make sure that it can actually be implemented, and what restrictions must be taken into account.

First of all, we must separate classes that can automorph into each other in several sets. If A can turn into B, and B into C, then A, B and C belong to the same set. As long as the automorphism feature is declarative, this step presents no particular challenge. This will be our first restriction. Once all classes are parsed and grouped into sets, the compiler associates to all classes of the set the minimum number of bytes required to store any object of any class of the set. Say, for instance, that the class that consumes the most memory is B, then every time a new object of class A, B or C is created enough memory is allocated to accomodate an object of class B.

Note that this can already be an argument against automorphism, since it consumes memory. In the case of INTEGER, potentially by a considerable amount. We will see below what improvements can be done.

Once automorphism is triggered, from A to B for instance, the following sequence of actions is executed:

A new object of class B is created, by way of a conversion feature that makes sure both objects contain the same information.
The content of the memory allocated for the B object is copied over the memory taken by the A object.
The type of the object is changed from A to B, usually this can be done at the same time as step 2.
The memory temporarily used for the B object is marked as discardable, if that can help the garbage collector.

Note that among the information associated to an object, is the result of once ("OBJECT") features. They must behave as already called even for the new object of class B, if appropriate.

When should the conversion be performed? Obviously, it must not happen while a feature of the converted object is being executed. Therefore, the implementation must be able to track the first call to a feature of that object, and when it ends. The proper time to perform the conversion it then either upon return of the feature, or upon the next call of a feature of that object.

The later case has some benefits:

The new object can be created concurrently in a concurrent execution environment.
If the object is never used again, we avoid a useless conversion.

It's probably more challenging to implement, though.

Syntax

The proposed syntax consists of reusing the convert keyword, since automorphism is a conversion. Anywhere in the body declaration of a feature, a programmer can add a convert {type} statement. For instance:

class A inherit PARENT feature some_feature do ... convert {B} ... endclass B inherit PARENT feature another_feature do ... convert {C} ... end

This syntax is valid, if the following conditions are met:

A, B and C all inherit from a common parent. It doesn't have to be a direct inheritance, but there has to be some common PARENT class. The set of classes that A, B and C have in common is called the parent set. Note that ANY satisfies this condition, so the parent set always include ANY. However, the next condition rules out ANY as a useful parent.
A, B, C and all the classes they inherit from, excluding the parent set, are not used as types in the code, other than in creation procedure and expressions.

For instance the following sentences are invalid:x: A some_feature(x: B) class X[G->A]And the following sentences are valid:x: PARENT; create {A} x.make y: ANY; y := create {B}.make x: PARENT; create x

Whenever a convert {Type} sentence is encountered, the object receives an invisible flag, and as soon as the out-most feature call to the object returns, the conversion is performed:

local x: PARENT do create {A} x.make ... -- x is of type A x.some_feature -- x is now of type B ... end

When designing a set of classes using automorphism, one can use either of two options:

Make the parent deferred. In that case, all objects must be created with an explicit type by clients.
Make the parent effective, but have a convert instruction in the creation procedure. This let the parent choose between A, B and C at creation, and is probably the preferred design for most uses of automorphism.

Anchored conversion

The following use of anchored types should be permitted:

some_feature(p: PARENT) do ... convert {like p} ... end

Usually, the declaration of an anchored type is evaluated at compile time to find out what the final type is. But since the convert {Type} statement is executed at run time, evaluation of the final type can be delayed as well.

Practical use

Strings

The example below demonstrates how the STRING object can be implemented to save memory by default, but use more memory for strings in which the application searches substrings, to make them fast.

class STRING create make_empty, make_from_other feature {NONE} -- Initialization make_empty do make_from_other(create {STRING_UTF8}.make_empty) end make_from_other (other: STRING) do typeless_data := other.typeless_data count := other.count convert {like other} end feature {STRING} -- Data storage typeless_data: ARRAY[NATURAL_8] -- Format depending on the descendant feature {STRING} -- Conversion features Seq32_to_Utf8 (other_data like typeless_data): like typeless_data -- Encode the data from 'other_data' as a sequence of NATURAL32 into a sequence of NATURAL_8 according to the UTF-8 format -- Reference: The Unicode Standard, Version 5.0, §3.9 D92, §3.10 D95 do ... end Utf8_to_Seq32 (other_data like typeless_data): like typeless_data -- Decode the data from 'other_data' as a sequence of NATURAL_8 encoded in UTF-8 format into a sequence of NATURAL_32 -- Each NATURAL_32 in the result is one character. do ... end feature count: NATURAL -- Number of characters in the string deferred end search (substring: STRING): like count -- Look for the first occurence of 'substring' -- Return the 1-based index of the first matching character, 0 if not found deferred end -- All other features of the STRING interface are skipped. endclass STRING_UTF8 inherit STRING redefine make_empty, count end create make_empty convert convert_from_typeless_data({STRING}), convert_from_seq32_data({STRING_32}) feature {NONE} -- Initialization make_empty do create typeless_data.make_empty char_count := 0 end feature {NONE} convert_from_typeless_data (other: STRING) do typeless_data := other.typeless_data.twin char_count := other.char_count end convert_from_seq32_data(other: STRING_32) do typeless_data := Seq32_to_Utf8(other.typeless_data) char_count := other.count end feature count: NATURAL -- Number of characters in the string -- Since UTF_8 uses a variable number of bytes per character, we maintain a separate character count in 'char_count'. do result := char_count end search (substring: STRING): like count -- Look for the first occurence of 'substring' -- Return the 1-based index of the first matching character, 0 if not found do ... if many_searches then convert {STRING_32} end end -- All other features of the STRING_UTF8 implementation are skipped. endclass STRING_32 inherit STRING redefine make_empty, count end create make_empty convert convert_from_typeless_data({STRING}), convert_from_utf8_data({STRING_UTF8}) feature {NONE} -- Initialization make_empty do create typeless_data.make_empty end feature {NONE} convert_from_typeless_data (other: STRING) do typeless_data := other.typeless_data.twin end convert_from_utf8_data(other: STRING_UTF8) do typeless_data := Utf8_to_Seq32(other.typeless_data) end feature count: NATURAL -- Number of characters in the string -- 'typeless_data' is the raw sequence of NATURAL_32, one per character. do result := typeless_data.count // {NATURAL_32}.size end search (substring: STRING): like count -- Look for the first occurence of 'substring' -- Return the 1-based index of the first matching character, 0 if not found do ... end -- All other features of the STRING_32 implementation are skipped. end

Some clarifications:

{STRING}.make_empty creates an object of type STRING_UTF8. Manifest strings could be based on different types, though, depending on their content.
Since {STRING}.make_from_other has the only conversion statement from STRING to one of the other classes, all implementations of convert_from_typeless_data are guaranteed to receive a string of the same type. Therefore, there is no need to parse, convert or manipulate in any way the content of the string itself. A simple call to twin takes care of everything.
The example uses NATURAL_8 instead of NATURAL yet this post is also about integer automorphism. I just didn't want to confuse the reader with needless complexity. In the example, the meaning of NATURAL_8 and NATURAL_32 is the current meaning.
{STRING_UTF8}.search has a convert instruction that will force the STRING_UTF8 object to be converted to a STRING_32 after several searches. STRING_32 doesn't have convert instructions, an indication that if most strings begin their life as STRING_UTF8 to save memory, once they become a STRING_32 they might stay that way until collectible by the GC. This is of course entirely up the implementation.

Integers

Integer automophism is very challenging to implement well, for several reasons:

It's a built-in type that is highly optimized. Adding support for automophism can easily break this optimization without bringing reasonable benefits.
The memory used to store integers is usually a significant percentage of the total memory used by applications. Increasing the amount of memory each integer takes would negatively impact most applications.

If automorphism can be implemented without loosing the core optimizations of integers, and keeping the usual 4 bytes of data per integer on a 32-bit computer, then its benefits might be worth considering.

There are a few avenues for investigations. One is that a processor exception triggers the conversion process, so the optimization for speed can probably be kept. A second consideration is that, in order to distinguish between regular INTEGER_32 attributes and big integers, there could be a separate, hidden bit table in the object data. Which means each integer only require the highest number of bytes between what is needed for an expanded INTEGER_32, and what is needed for a reference pointer. As far as I know, they are different only on 64-bits computers.

Lists

Currently, programmers select what kind of list they manipulate. It can be an ARRAYED_LIST, or a LINKED_LIST for instance. However, in practice all they are interested in is the concept of lists of items, and they choose the type based on operations performed on these lists.

For instance, if a programmer needs to insert or delete items in the middle of the list, they probably want to pick a LINKED_LIST. If, on the contrary, they perform random access to items within the list, they will choose an ARRAYED_LIST instead.

With automorphism, they could simply create a LIST item but with a creation procedure that provides an hint about what list type to choose. In the case of the two examples above, the code would look like this:

list1, list2: LIST create list1.make_for_insert_in_the_middle create list2.make_for_random_access

I believe that coding lists with automorphism provides a better abstraction, and allows the desired optimization in a more understandable way.

Note that LIST can also offer on-the-fly change of mind:list1.optimize_for_random_access -- Done with insertion, let's use an arrayed list now

Conclusion

My proposal for the language feature I called automorphism offers some benefits to programmers:

They don't have to specify which implementation to pick for classes that have a common conceptual interface and different implementations.
It allows runtime optimization of object storage or performance.
If an implementation is preferred, a programmer can clearly state their intent instead of choosing a specific implementation by name.

Unfortunately, implementation in the case of built-in expanded classes is far from trivial, but I believe it might be doable.