A case against inline agents
- Context: Programs as documents
- Textual representations of Trees
- Levels of the AST
- Criticizing Inline Agents
Context: Programs as documents
An Eiffel program is a complex data structure. It is made out of attributes, local variables, types, classes, instructions, feature calls, assignment attempts and a lot more. These constructs reference each other: attributes are used by features, features contain instructions, instructions use local variables, local variables reference types, types are built from classes, and so forth.
When we look at the textual representation (the code text) of an Eiffel program, we identify two major ways that these relations are expressed.
The second way to express these relations is by the use of identifiers. Identifiers are arbitrary words (with a few limitations, for example they must not be the same as a keyword) that allow references to other parts of the code text. For example:
The local variable definition uses the identifier
These two ways of expressing relations in a program have a direct effect on how we write compilers to read in code text from a file. It done in two steps.
The first step – understanding references through the use of textual context – is called the syntactial analysis of a programm, using a scanner and a parser. The result is a tree, called the abstract syntax tree.
The second step is called the semantic analsysis of a program (I always found the word syntax and semantics very missleading in this context, as both parts establish the structure of the code, and the second part has nothing to do with what that code actually means). The semantic analysis enriches the abstract syntax tree node with references to other tree nodes, creating the actual graph that defines the program.
This blog article is about the code text and its way to express the abstract syntax tree, as established during syntactic analysis. We are looking at it from the perspective of a human coder who has to read the text, not from that of a compiler that has to parse it.
Textual representations of Trees
Developers use screens (I am talking about the physical thing in front of you) to develop code. A screen allows the display of a 2-dimensional image. The code text is a linear sequence of characters, but with the introduction the NEWLINE character, it can be broken into individual lines. This way, code text has a 2D representation, using both dimension on the screen.
Now there is a challenge: how can we layout the program as a 2D text in such a way that it is easy for the developer to see the underlying abstract syntax tree, thus making it possible for the developer to understand the program?
In general, there are two ways this is done: indentation and bracketing.
Using indentation, we distribute the nodes of the abstract syntax tree over the rows in the text, each node on a new line. We do this in a depth-first fashion, by first writing the parent node, and then the child nodes. The child nodes are then indented in relation to the parent. Here is an example:
The second way of expressing the tree structure is by the use of bracket pairs. A bracket pair are two symbols or keywords that together mark the beginning and end of a set of sub-nodes in the tree. Examples of bracket pairs are the paranthesis
The use of paranthesis makes it clear that
A tree defined using bracket pairs is much harder to understand than a tree using indentation levels. It requires the human eyes to scan the complete line, to recognise individual characters and words, and to have a mental “stack” counting how many opening and closing brackets we have seen.
Most trees expressed in bracket pairs become confusion after a few levels. This is even true for paranthesises, even though we are highly trained to spot these pairs from years of math training in school.
So, why not use indentation for everything? We could write the expression above using (for example) the following:
We do not do it, because vertical real-estate is precious and programs written this way would become very long, which has its own problems with readability. If we are able to keep the numbers of brackets low, the use of brackets is a very compact way of expressing trees, growing horizontally instead of vertically.
Most modern programming languages (including – with one exception – Eiffel) do the following: they use bracket pairs as the prime element to express tree properties, but allow the free use of newlines and spaces (known as whitespace) to express the tree structure through the use of indentation.
The language comes with a strong culture on how to indent, even though the developer is allowed to deviate from these rules (most of the time to prevent a line from becoming too long, or to write compact code). The culture makes code readable between different developers.
Levels of the AST
When looking at tree structures, we can look at the recursion points, that is tree composition rules that allow the creation of recursive structures by containing themselves.
There are two main recursion structures in Eiffel (as in most other programming languages): instruction and expression.
Instructions contain the imperative part of the programming paradigm; they describe the computation by state transformations. Recursion is caused by rules for sequences, conditionals and loops.
Expressions contain the functional part of the programming paradigm; they describe the computation by function evaluations and their return values. Recursion is caused by the rules for function arguments.
Eiffel (with the exception of inline agents, discussed below) does not allow expressions to contain instructions. This always causes the syntax tree to have the following form:
The imperative paradigm and the functional paradigm are very different, each one having it's own set of challenges to the human brain. One of the challanges of programming language design is to find the right balance and relation between the two.
Eiffel, using option/operand and command/query separation, emphasises the use of instructions over the use of expressions. Data-structures have built-in cursors. The number of arguments to feature calls are minimized.
This has a huge advantage: we can use the indentation purely for the imperative portion of the code. With expressions small, we have no problem using brackets to structure them, keeping them compact. The imperative control flow of the program is not interrupted by lines that are only continuations of the expressions from previous lines.
For the reader of the program, this is not only elegant: together with the rejection of return statements and other “hidden GOTOs”, the structure of an implementation becomes clear immediately. Code is easy to change, problems are easy to spot.
Criticizing Inline Agents
Inline agents allow instructions to appear in expressions. This breaks the invariant that expressions are always below instructions in the tree. Instead, we now have the ability to freely switch between instructions and expressions while going down the tree, resuling in the following structure:
This has severe consequences for the structure and layout of the program. It starts with the obvious problems that indentation rules now are completely thrown overboard. It becomes unclear how to indent the inline agent. If we write
then we are wasting a huge amount of space in front of the feature. Considering that horizontal scrolling is more tedious than vertical scrolling, this can make the code very unreadable. Also, it violates the guideline that expressions should be short and single lines.
On the other hand, if we write
then we obfuscate the fact that the agent is indeed a parameter to the
The second criticism of inline agents is the fact that we are loosing an important element of structured programming: the indention of a feature visualizes the structure of the underlying algorithm. This structure is now “polluted” with control flow of inline agents, which are code that is not actually executed as part of the routine, but might be executed much later.
This show article illustrates my main concerns with inline agents for the control flow and the readablity of the programming language.
It is clear to me that inline agents are very dangerous to structured programming and to clean code. We have to understand how they can undermine our efforts to produce a culture of code that is highly readable, one of the main advantages of using Eiffel.
Whenever advocating the use of inline agents, we are not doing us a favor. There might be cases where the inline agent can produce more readable results and compact code. But in the long run, they are not going to improve the programming language.