Stumbling across errors in language design

by Franck Arnaud (modified: 2010 Feb 10)

The recently floated proposal to add the `across' looping construct to Eiffel seems seriously mistaken. I will go through the reasons that make it undesirable for me, and as a bonus end with a general theory of why such proposals occur.

What happened to the minimalism principle?

Eiffel used to have one good way of doing each thing, for clarity and homogeneity and common understanding in the context of team work, and it’s well worth the modest cost of the odd slightly painful border cases. We now have not one but 3 constructs: legacy loops, which I’ll call neanderthal loops in the rest of this post; across loops, let’s call them neo-neanderthal loops; and agent-based iterators, the one you use in modern Eiffel.

Not an everyday construct

The new construct is being justified by saying that loops in their most general form are an everyday programming construct. In fact, loops in regular programming are all from a very small family of very regular loops, usually on containers or similar like INTEGER_INTERVAL, that are very well captured by iteration agents (do_all and friends). A full time Eiffel programmer would probably need to write an explicit loop once a year or less. The very rare cases I meet where I have to write a neanderthal loop is because of omissions in libraries who sometimes miss some obvious looping construct (almost always general enough, I’m not advocating adding convoluted iteration routines to containers that apply only to one case specifically). Lazy readers might think “but I write loops all the time”, but if they look at all the loops you’ve written, and think how it could be replaced with an iteration agent, they will find there are very few cases that require an explicit loop. Neanderthal-loop fans often tell me that some loops can’t be written with a generic agent iterator but when I ask them to show me an example all they can find is examples of where they were being lazy or used a sadly incomplete library (I don’t think anyone who promised me an example ever delivered a credible one, though I’m not saying a few cases don’t exist, they’re just very rare).

The great advantage of using agent-based iterators is that their behaviour (termination etc) is much easier to reason about, by hand or machine. If your system has a handful of generic loops (in the implementation of the container library) it’s much easier to reason about instances of such loops once you’ve proven and modelled the core library routines' behaviour.

Core language constructs shouldn't depend on libraries

The use of a keyword for library syntactic sugar is upside down. Core language constructs should have minimal dependencies if any at all on upper level libraries in general, and on individual routine/classes fairly down the abstraction chain at that. What next, syntactic sugar for log routines?

This gratuitous syntactic sugar also pollutes the keywords namespace, which ideally should be kept small and manageable, and certainly not used for superfluous constructs that are well dealt with by the core language.

Eiffel is verbose on purpose

The verbosity argument is so anti-Eiffel as well, Eiffel is about clarity not about line count. If we’re going that way we could as well go punctuation {*o:i|print(i)*}. With a bit of effort the gold standard of syntax mediocrity, perl, could be within our reach!

The verbosity comparison between fully expanded neanderthal loops and compact neo-neanderthal variants in the proposal are disingenuous at best, the fair comparison would be to present neanderthal loops like this:

from c := list.new_cursor until c.after loop print (c.item) c.forth end

which is a style both used by leading practitioners and the one consistent with the style of the neo-neanderthal examples. This reduces the line count "advantage" to hardly anything.

A bad take at a known construct

If I was trying to add a loop construct of that genre, I’d hide the cursor totally and do something like:

across container as item loop print (item) end

which is what other languages with neo-neanderthal looping constructs do, for a good reason. We don’t need to be gratuitously worse than the common practice for the sake of originality.

The construct above could be generalised to the hash table case with something like ‘across table with item, key loop’ — were it not anyway totally superfluous given that do_all_with_key does it much better (the only reason it’s not in HASH_TABLE is that as far as I know the maintainer got arrested for earlier crimes against software quality — the class is a crime hotspot — so the class has been unmaintained since the lengthy court martial proceedings started). Low-crime hash table classes like the one in Gobo are well equipped.

Going past the Good Idea Entropy Axiom

To finish I’ll expose my theory why this proposal happened.

One of the fundamental laws of the universe is the Good Idea Entropy Axiom, which reads:

– The aggregate quality of ideas available in the universe must never increase.

and I think that Bertrand has taken this theorem individually and, to compensate for the recently introduced and excellent theory of aliasing, has had to introduce neo-neanderthal loops so that his net contribution to the world of ideas is not positive. But the universe is full of people who produce more than their fair shares of bad ideas, so to each to their ability: I urge Bertrand leave to others the jobs of creating junk language proposals and keeps to the supply of good ideas.

Comments
  • Berend de Boer (14 years ago 9/2/2010)

    I can't say I have followed the debate, but really, are we getting another please? Please no. Pascal had 3 loops and one of them was used so infrequently I had to lookup the semantics when using it or reading code.

    The minimalist approach is there for a reason. Syntactic sugar has its place, but should be clearly warranted. Can Eiffel programmers vote on the proposal?

    • Grant Rettke (14 years ago 9/2/2010)

      Agreed

      While it is often hard to resist new syntax, hey, Eiffel is elegant (sure I don't use it daily, and maybe that has skewed my vision).

      One might say that Eiffel could make an even bigger contribution by adding statically-typed macros! :)

  • Colin LeMahieu (14 years ago 9/2/2010)

    Some people such as myself don't like to use iteration agents because they statically strip contracts; I prefer an adapter object.

    Almost all languages have a kernel library that's used, C without a C runtime, which includes malloc (), isn't very useful. To handle release of external resources, all GC'ed languages need to expose some type of finalizer class.

    Don't get me wrong, I'm wasn't a proponent of this language construct but I don't find some of these arguments valid discourse.

    I also don't like the 'once' keyword. I stay away from 'once' and 'agent' as much as possible.

  • Neal Lester (14 years ago 9/2/2010)

    Neanderthal-loop problem

    Neanderthal-loop fans often tell me that some loops can’t be written with a generic agent iterator when I ask them to show me an example all they can find is examples of where they were being lazy or used a sadly incomplete library

    It seems to me that the neanderthal-loop approach to the following problem is more straightforward than the agent-iterator approach (but I concede it can be done with an agent-iterator).

    Given: a: ARRAY [STRING] a := <<"red", "blue", "green">> output a string which contains the array contents in English list form assuming a.count >= 3 (in this case "red, blue, and green").

    • Colin Adams (14 years ago 9/2/2010)

      Not to me

      To me this seems a clear win for the iterator:

      l_result := "" if a.count >= 3 then a.do_all (agent l_result.append_string) end How can a loop read neater than that? It's a direct translation of your specification.

      • Neal Lester (14 years ago 9/2/2010)

        Not quite

        This gives equal (l_result, "redbluegreen") but it should be assert_equal ("l_result correct'', "red, blue, and green", l_result)

  • Bernd Schoeller (14 years ago 9/2/2010)

    No agent loops

    I disagree with Frank's criticism on the new loop construct. I think it is a valuable contribution to the language.

    • From all languages I know, if there is a forall or across construct, this is heavily used. It is used even more often than the old neanderthal loops. The usage pattern of doing something with all elements of a container is so common that it justifies a construct of its own in the language.
    • The use of agents is absolutely no alternative: agents are inherently difficult to reason. And inline agents make the situation much worse, blurring the line between instructions and expressions, and create ugly code.
  • Colin Adams (14 years ago 9/2/2010)

    Pros and cons of agents

    I am surprised at the absolute denigration of agents.

    I can see pros-and cons in their use.

    Against them is the overhead in ISE's compiler (but this is a problem to be tackled - gec produced faster code with do_all the one time we measured it). Also the catcall/conformance problem. But SmartEiffel got this right, I seem to recall.

    More serious is the problem of writing a contract for agent-based iterators. I think this requires static checking of contracts. We have a problem with implementing runtime checking. But the use of any particular iterator at a given point in client code is not so problematic - a static check by the code reviewer should be straight-forward. Certainly more so than for loops.

    And of course inline agents never have any justification whatsoever, but that is another issue.

    I occasionally code explicit loops. But I need to find a very strong reason to do so, or Franck will (rightly) laugh at me.

    Recently I code a four-way loop nested three-deep. My excuse was I was transcribing a C routine whose workings I did not understand, and nor was I eager to attempt such a task on such code. So I equipped my routine with so many loop invariant and variant clauses (incidentally discovering that you can't have multiple variant assertions in a single loop - which was annoying as I had co-varying variables) that I would not be surprised if I broke the world-record for the number of assertions in a routine. My intention was that once I had the thing fully working, I'd be able to understand it with the aid of the assertions, and translate it to an iterator version, so that other developers could understand it (I haven't reached that stage yet though - it doesn't always work correctly). Note that across would not have been usable at all in that particular case.

    The other chief reason I find for writing explicit loops is library deficiencies. These come in two varieties. The first is where the library just doesn't implement an obvious iterator. This is a simple defect, and if you are allowed to edit the library, that is not a problem. But sometimes you can't. The second is where the particular iterator you want is so obscure that it is hard to imagine anyone wanting to re-use it. In this case I code an explicit loop, and bemoan that I'm using Eiffel rather than Haskell. Note that in such a case across would be automatically ruled out.

    But on the question of adding a keyword I am 100% with Franck. Especially one tied to a particular library class. In addition to his arguments, which tree traversal order is it going to use? And in general, given a data structure that might be traversed in many ways, how do you make the decision that a particular route is most desirable?

    • Peter Gummer (14 years ago 10/2/2010)

      SmartEiffel agents

      I prefer SmartEiffel's covariant agents to standard Eiffel's contravariant (or did they end up being no-variant?) agents.

      But I wouldn't say SmartEiffel "got this right", because I recall that the objection to covariant agents was that it wouldn't work if the agent had an open target.

      As someone who's never used an agent with an open target, personally I'd rather have covariant agents and remove open targets from the language.

    • Alexander Kogtenkov (14 years ago 10/2/2010)

      Traversal order

      The traversal order issue is addressed by the library features that allow specifying the required one. For an indexable structure this is done by calling reversed on a cursor: across my_list.new_cursor.reversed as c loop print (c.item) end The similar technique is taken for tree traversal: when the desired order does not match the default one it's possible to specify it explicitly using appropriate features for preoder, postorder, inoder, level-order and other variants using the approach above.

  • Colin Adams (14 years ago 10/2/2010)

    I'll try again

    OK. Sorry I didn't read carefully enough.

    In that case it is necessary to write the agent:

    l_result := "" a.do_all_with_index (agent extend_english_list (l_result, ?, ?, a.count)) extend_english_list (a_result: STRING; a_word: STRING; a_index, a_last: INTEGER;) is -- Add `a_word' to end of `a_result' with English list separation. -- If `a_last' equals `a_index' then we are dealing with the last word. require a_result_not_void: a_result /= Void a_word_not_void: a_word /= Void do if a_index = a_last and a_index > 2 then a_result.append_string (" and ") elseif a_index > 1 then a_result.append_string (", ") end a_result.append_string (a_word) end

    Again, it is nice and neat, and no loop syntax to obscure the logic.

    • Peter Gummer (14 years ago 11/2/2010)

      Just agent syntax to obscure the logic ;-)

      Honestly, although I've written quite a few of these do_all loops, I find lines of code like this hard to read:

      a.do_all_with_index (agent extend_english_list (l_result, ?, ?, a.count))

      I have to think quite hard to figure out what those question marks are about, and why they are in that particular position in the argument list, cross-referencing with the routine signature in order to nut it out. And although I've now figured out why the word and the index are open arguments (they are the things that vary as you traverse the array), it's still not clear to me how they get to be passed in that order. I guess it would be clearer if I went and looked at do_all_with_index.

      So after spending five minutes understanding that line of code, I'm wiser about do_all_with_index. Heck, I might even get the bright idea of using it myself one day. But pity the poor schmuck (or more likely schmucks, plural, given that any line of code, although written only once, will be read many times) who has to comprehend my clever agent-based loop.

      Give me a neo-neanderthal loop please.

    • Neal Lester (14 years ago 11/2/2010)

      Not definitively more straightforward

      The logic in extend_english_list makes sense only in the context of an iteration. I find that removing it from the context (a loop) highlights the logic but obscures the purpose of the logic. It is also hard to see how one would use extend_english_list outside of an iteration, and it is not desirable as a reusable procedure in the context of a call to do_all_with_index. For clarity and ease of use, one still has to encapsulate the call to do_all_with_index within a function:

      as_english_list (a: ARRAY): STRING is -- Return the contents of a with English list separation require valid_a: a /= Void count_at_least_three: a.count > 3 do Result := "" a.do_all_with_index (agent extend_english_list (Result, ?, ?,.a.count)) ensure valid_result: Result /= Void end extend_english_list (a_result: STRING; a_word: STRING; a_index, a_last: INTEGER;) is -- Add `a_word' to end of `a_result' with English list separation. -- If `a_last' equals `a_index' then we are dealing with the last word. require a_result_not_void: a_result /= Void a_word_not_void: a_word /= Void do if a_index = a_last and a_index > 2 then a_result.append_string (" and ") elseif a_index > 1 then a_result.append_string (", ") end a_result.append_string (a_word) ensure valid_result: Result /= Void end

      So you end up with two features, only one of which (as_english_list) is really intended for reuse. Instead, we can do it with a single loop function:

      as_english_list (a: ARRAY): STRING is -- Return the contents of a with English list separation require valid_a: a /= Void count_at_least_three: a.count > 3 local index: INTEGER do Result := "" from index := 1 until index > a.count loop if index = a_last then Result.append_string (" and ") elseif index > 1 then Result.append_string (", ") end Result.append_string (a[index]) end end I suppose one can argue that the agent-iterator version is cleaner and neater, but I don't think it is unambiguously so. One can also argue that one function is cleaner and easier to understand than two inter-related features. Rather than obscuring the logic, the loop syntax helps provide context for it.

      • Colin Adams (14 years ago 11/2/2010)

        The agent is more reusable

        First, in reply to Peter's comment, you have no need to understand what the questions marks mean when reading that line of code (I never do) - you just read the agent name (unless you're reviewing the code for correctness that is - if the two arguments have the same type, the compiler won't do that for you). Writing the code is a different matter. You need to learn the meaning of the agent. But you only have to do this once.

        To Neal's point about (perceived lack of) reuse. There is a second use for this agent - it's in do_parallel_with_index (hypothetical, but not too difficult to implement). The loop is inherently sequential. Also, you are re-writing the loop, as it is already present in the body of do_all_with_index, and extensively tested (I use that iterator all the time. It was added to EiffelBase at my request. Incidentally, this provides a nice illustrative story about the benefits of only writing the loop once. The first version I submitted to Eiffel Software had a bug - when lower did not equal one. The bug is fixed, and remains fixed if you use that iterator. But if you write the loop yourself each time, you might make the same mistake as I did.) If I had to choose between Bernd's position (no agents), and Franck's (no loops), then I'd go with Franck every time. In fact for my home programming I now use a loop-free language (Haskell).

        P.S. Neal, your loop lacks invariant and variants. If you added the variant, it would catch the non-termination bug quickly. In "A Touch of Class", professor Meyer says don't even think of coding a loop without writing invariants and variants. Franck just drops the without clause.

        • Colin Adams (14 years ago 11/2/2010)

          Well, of course that particular agent couldn't be reused for a parallel computation. But the principle is right.

    • Franck Arnaud (14 years ago 15/2/2010)

      No new agent required

      You can also do it without writing any agent routine:

      as_english_list (a_array: ARRAY [STRING]): STRING -- Comma separated version of array with and for last item -- assumes ARRAY.transformed exists (one of those generic iterators that should be there) require a_array_not_empty: a_array /= Void and then not a_array.is_empty local l_array: like a_array do -- prefix last item with and, then all but last with comma, then join l_array := a_array.subarray (2, a_array.count - 1) l_array.extend ("and " + a_array.item (a_array.count)) l_array := l_array.transformed (agent ", ".plus) Result := a_array.item (1).twin l_array.do_all (agent Result.append_string) end

      It's more concise than even the variant-free loop version, and you can even run the comma-plussing in parallel! :-)