<p style="text-align: justify;"><sub><em>About this article: it originated as a series of posts on the Communications of the ACM blog. I normally repost such articles here. (Even though copy-paste is usually not good, there are three reasons for this duplication: the readership seems to be largely disjoint; I can use better formatting, since their blog software is more restrictive than WordPress; and it is good to have a single repository for all my articles, including both those who originated on CACM and those who did not.) The series took the form of nine articles, where each of the first few ended with a quiz, to which the next one, published a couple of days later, provided an answer. Since all these answers are now available it would make no sense to use the same scheme, so I am instead publishing the whole thing as a single article with nine sections, slightly adapted from the original.</em></sub></p>
<p style="text-align: justify;"><sub><em>I was too lazy so far to collect all the references into a single list, so numbers such as [1] refer to the list at the end of the corresponding section.</em></sub></p>
<hr />
<p style="text-align: justify;">A colleague recently asked me to present a short overview of axiomatic semantics as a guest lecture in one of his courses. I have been teaching courses on software verification for a long time (see e.g. <a href=" http://se.inf.ethz.ch/courses/2015b_fall/sv/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>), so I have plenty of material; but instead of just reusing it, I decided to spend a bit of time on explaining why it is good to have a systematic approach to software verification. Here is the resulting tutorial.</p>
<hr />
<p> </p>
<h2 style="text-align: justify;">1. Introduction and attempt #1</h2>
<p style="text-align: justify;">Say “software verification” to software professionals, or computer science students outside of a few elite departments, and most of them will think “testing”. In a job interview, for example, show a loop-based algorithm to a programmer and ask “how would you verify it?”: most will start talking about devising clever test cases.</p>
<p style="text-align: justify;">Far from me to berate testing [1]; in fact, I have always thought that the inevitable Dijkstra quote about testing — that it can only show the presence of errors, not their absence [2] — which everyone seems to take as an indictment and dismissal of testing (and which its author probably intended that way) is actually a fantastic advertisement <em>for </em>testing: a way to find bugs? Yes! Great! Where do I get it? But that is not the same as verifying the software, which means attempting to ascertain that it has <em>no </em>bugs.</p>
<p style="text-align: justify;">Until listeners realize that verification cannot just mean testing, the best course material on axiomatic semantics or other proof techniques will not attract any interest. In fact, there is somewhere a video of a talk by the great testing and public-speaking guru James Whittaker where he starts by telling his audience not to worry, this won’t be a standard boring lecture, he will not start talking about loop invariants [3]! (Loop invariants are coming in this article, in fact they are one of its central concepts, but in later sections only, so don’t bring the sleeping bags yet.) I decided to start my lecture by giving an example of what happens when you do <em>not </em>use proper verification. More than one example, in fact, as you will see.</p>
<p style="text-align: justify;">A warning about this article: there is nothing new here. I am using an example from my 1990 book <em>Introduction to the Theory of Programming Languages </em>(exercise 9.12). Going even further back, a 1983 “Programming Pearls” <em>Communications of the ACM</em> article by Jon Bentley [4] addresses the same example with the same basic ideas. Yet almost forty years later these ideas are still not widely known among practitioners. So consider these articles as yet another tutorial on fundamental software engineering stuff.</p>
<p style="text-align: justify;">The tutorial is a quiz. We start with a program text:</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 1 ; j := n — <strong>Result </strong>initialized to 0.</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i = j <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := (i + j) // 2 — Integer division</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] ≤ x <strong>then </strong>i := m <strong>else </strong>j := m <strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>x = t [i] <strong>then </strong>Result := i <strong>end</strong></span></p>
<p style="text-align: justify;">All variables are of integer type. <span style="color: #0000ff;">t</span> is an up-sorted array of integers, indexed from 1 to <span style="color: #0000ff;">n</span> . We do not let any notation get between friends. A loop <span style="color: #0000ff;"><strong>from </strong>p <strong>until </strong>e <strong>loop </strong>q <strong>end </strong></span>executes <span style="color: #0000ff;">p</span> then, repeatedly: stops if <span style="color: #0000ff;">e</span> (the exit condition) is true, otherwise executes <span style="color: #0000ff;">q</span>. (Like <span style="color: #0000ff;">{p ; <strong>while not </strong>e <strong>do</strong> {q}}</span> in some other notations.) “<span style="color: #0000ff;">:=</span>” is assignment, “<span style="color: #0000ff;">=</span>” equality testing. “<span style="color: #0000ff;">//</span>” is integer division, e.g. 6 <span style="color: #0000ff;">//</span>3 = 7 <span style="color: #0000ff;">//</span>3 = 2. <strong><span style="color: #0000ff;">Result</span> </strong>is the name of a special variable whose final value will be returned by this computation (as part of a function, but we only look at the body). <strong><span style="color: #0000ff;">Result</span> </strong>is automatically initialized to zero like all integer variables, so if execution does not assign anything to <strong><span style="color: #0000ff;">Result</span> </strong>the function will return zero.</p>
<p style="text-align: justify;">First question: what is this program trying to do?</p>
<p style="text-align: justify;">OK, this is not the real quiz. I assume you know the answer: it is an attempt at “binary search”, which finds an element in the array, or determines its absence, in a sequence of about log<sub>2</sub> (<span style="color: #0000ff;">n</span>) steps, rather than <span style="color: #0000ff;">n</span> if we were use sequential search. (Remember we assume the array is sorted.) <span style="color: #0000ff;"><strong>Result</strong> </span>should give us a position where <span style="color: #0000ff;">x</span> appears in the array, if it does, and otherwise be zero.</p>
<p style="text-align: justify;">Now for the real quiz: does this program meet this goal?</p>
<p style="text-align: justify;">The answer should be either yes or no. (If no, I am not asking for a correct version, at least not yet, and in any case you can find some in the literature.) The situation is very non-symmetric, we might say Popperian:</p>
<ul style="text-align: justify;">
<li>To justify a no answer it suffices of a <em>single </em>example, a particular array <span style="color: #0000ff;">t</span> and a particular value <span style="color: #0000ff;">x</span>, for which the program fails to set <strong><span style="color: #0000ff;">Result</span> </strong>as it should.</li>
<li>To justify a yes answer we need to provide a credible argument that for <em>every</em> <span style="color: #0000ff;">t</span> and <span style="color: #0000ff;">x</span> the program sets <strong><span style="color: #0000ff;">Result</span> </strong>as it should.</li>
</ul>
<h4>Notes to section 1</h4>
<p style="text-align: justify;">[1] The <a href="https://www.springerprofessional.de/en/tests-and-proofs/17196774" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">TAP conference series</span></a> (Tests And Proofs), which Yuri Gurevich and I started, explores the complementarity between the two approaches.</p>
<p style="text-align: justify;">[2] Dijkstra first published his observation in 1969. He did not need consider the case of infinite input sets: even for a trivial finite program that multiplies two 32-bit integers, the number of cases to be examined, 2<sup>64</sup>, is beyond human reach. More so today with 64-bit integers. Looking at this from a 2020 perspective, we may note that exhaustive testing of a finite set of cases, which Dijkstra dismissed as impossible in practice, is in fact exactly what the respected <em>model checking</em> verification technique does; not on the original program, but on a simplified — <em>abstracted —</em> version precisely designed to keep the number of cases tractable. Dijkstra’s argument remains valid, of course, for the original program if non-trivial. And model-checking does not get us out of the woods: while we are safe if its “testing” finds no bug, if it does find one we have to ensure that the bug is a property of the original program rather than an artifact of the abstraction process.</p>
<p style="text-align: justify;">[3] It is somewhere on YouTube, although I cannot find it right now.</p>
<p style="text-align: justify;">[4] Jon Bentley:<em> Programming Pearls: Writing Correct Programs</em>, in <em>Communications of the ACM</em>, vol. 26, no. 12, pp. 1040-1045, December 1983, available for example <a href="https://www.cs.tufts.edu/~nr/cs257/archive/jon-bentley/correct-programs.pdf" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<hr />
<h2 style="text-align: justify;">2. Attempt #2</h2>
<p style="text-align: justify;">Was program #1 correct? If so it should yield the correct answer. (An answer is correct if either <strong><span style="color: #0000ff;">Result</span> </strong>is the index in t of an element equal to <span style="color: #0000ff;">x</span>, or <strong><span style="color: #0000ff;">Result</span> </strong>= 0 and x does not appear in t.)</p>
<p style="text-align: justify;">This program is <em>not</em> correct. To prove that it is not correct it suffices of a single example (test case) for which the program does not “<em>yield the correct answer</em>”. Assume <span style="color: #0000ff;">x</span> = 1 and the array <span style="color: #0000ff;">t</span> has two elements both equal to zero (<span style="color: #0000ff;">n</span> = 2, remember that arrays are indexed from 1):</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">t = [0 0]</span></p>
<p style="text-align: justify;">The successive values of the variables and expressions are:</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"> m i j i + j + 1</span></p>
<p style="padding-left: 30px; text-align: justify;">After initialization: <span style="color: #0000ff;"> 1 2 3</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i ≠ j</span>, so enter loop: <span style="color: #0000ff;">1 1 2 6 </span> — First branch of “if” since t [1] ≤ <span style="color: #0000ff;">x</span><br />
— so i gets assigned the value of <span style="color: #0000ff;">m</span></p>
<p style="text-align: justify;">But then neither of the values of i and j has changed, so the loop will repeat its body identically (taking the first branch) forever. It is not even that the program yields an incorrect answer: it does not yield an answer at all!</p>
<p style="text-align: justify;">Note (in reference to the famous Dijkstra quote mentioned in the first article), that while it is common to pit tests against proofs, <em>a test can actually be a proof</em>: a test that fails is a proof that the program is incorrect. As valid as the most complex mathematical proof. It may not be the kind of proof we like most (our customers tend to prefer a guarantee that the program is correct), but it is a proof all right.</p>
<p style="text-align: justify;">We are now ready for the second attempt:</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">— Program attempt #2.</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 1 ; j := n</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i = j <strong>or</strong> <strong>Result</strong> > 0<strong> loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := (i + j) // 2 — Integer division</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] ≤ x <strong>then</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>elseif </strong>t [m] = x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;"><strong>Result </strong>:= m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong><strong> </strong> — In this case t [m] > x</span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m – 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="text-align: justify;">Unlike the previous one this version always changes i or j, so we may hope it does not loop forever. It has a nice symmetry between i and j.</p>
<p style="text-align: justify;">Same question as before: does this program meet its goal?</p>
<hr />
<h2 style="text-align: justify;">3. Attempt #3</h2>
<p style="text-align: justify;">The question about program #2, as about program #1: was: it right?</p>
<p style="text-align: justify;">Again no. A trivial example disproves it:<span style="color: #0000ff;"> n</span> = 1, the array <span style="color: #0000ff;">t</span> contains a single element <span style="color: #0000ff;">t [1]</span> = 0, <span style="color: #0000ff;">x</span> = 0. Then the initialization sets both <span style="color: #0000ff;">i</span> and <span style="color: #0000ff;">j</span> to 1, <span style="color: #0000ff;">i = j</span> holds on entry to the loop which stops immediately, but <strong><span style="color: #0000ff;">Result</span> </strong>is zero whereas it should be 1 (the place where <span style="color: #0000ff;">x</span> appears).</p>
<p style="text-align: justify;">Here now is attempt #3, let us see it if fares better:</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">— Program attempt #3.</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 1 ; j := n</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i = j <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := (i + j + 1) // 2</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] ≤ x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>1 ≤ i <strong>and </strong>i ≤ n <strong>then Result </strong>:= i <strong>end<br />
</strong> — If not, <strong>Result </strong>remains 0.</span></p>
<p style="text-align: justify;">What about this one?</p>
<hr />
<h2 style="text-align: justify;">3. Attempt #4 (also includes 3′)</h2>
<p style="text-align: justify;">The first two program attempts were wrong. What about the third?</p>
<p style="text-align: justify;">I know, you have every right to be upset at me, but the answer is no once more.</p>
<p style="text-align: justify;">Consider a two-element array t = [0 0] (so n = 2, remember that our arrays are indexed from 1 by convention) and a search value x = 1. The successive values of the variables and expressions are:</p>
<p style="padding-left: 30px; text-align: justify;"> <span style="color: #0000ff;"> m i j i + j + 1</span></p>
<p style="padding-left: 30px; text-align: justify;">After initialization: 1 2 4</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i ≠ j</span>, so enter loop: 2 3 2 6 — First branch of “if” since <span style="color: #0000ff;">t [2] < x</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i ≠ j</span>, enter loop again: 3 ⚠ — Out-of-bounds memory access!<br />
— (trying to access non-existent <span style="color: #0000ff;">t [3]</span>)</p>
<p style="text-align: justify;">Oops!</p>
<p style="text-align: justify;">Note that we could hope to get rid of the array overflow by initializing <span style="color: #0000ff;">i</span> to 0 rather than 1. This variant (version #3′) is left as a bonus question to the patient reader. (<em>Hint</em>: it is also not correct. Find a counter-example.)</p>
<p style="text-align: justify;">OK, this has to end at some point. What about the following version (#4): is it right?</p>
<p style="text-align: justify;"><span style="color: #0000ff;">— Program attempt #4.</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 0 ; j := n + 1</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i = j <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := (i + j) // 2</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] ≤ x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>1 ≤ i <strong>and </strong>i ≤ n <strong>then Result </strong>:= i <strong>end</strong></span></p>
<hr />
<h2 style="text-align: justify;">5. Attempt #5</h2>
<p style="text-align: justify;">Yes, I know, this is dragging on. But that’s part of the idea: witnessing how hard it is to get a program right if you just judging by the seat of your pants. Maybe we can get it right this time?</p>
<p style="text-align: justify;">Are we there yet? Is program attempt #4 finally correct?</p>
<p style="text-align: justify;">Sorry to disappoint, but no. Consider a two-element array <span style="color: #0000ff;">t = [0 0]</span>, so<span style="color: #0000ff;"> n</span> = 2, and a search value <span style="color: #0000ff;">x</span> = 1 (yes, same counter-example as last time, although here we could also use <span style="color: #0000ff;">x</span> = 0). The successive values of the variables and expressions are:</p>
<p style="padding-left: 30px; text-align: justify;"> <span style="color: #0000ff;"> m i j i + j</span></p>
<p style="padding-left: 30px; text-align: justify;">After initialization: 0 3 3</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i ≠ j</span>, so enter loop: 1 2 3 5 — First branch of “<strong>if</strong>”</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i ≠ j</span>, enter loop again: 2 3 3 6 — First branch again</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i = j</span>, exit loop</p>
<p style="text-align: justify;">The condition of the final “<span style="color: #0000ff;"><strong>if</strong></span>” is true, so <strong><span style="color: #0000ff;">Result</span> </strong>gets the value 3. This is quite wrong, since there is no element at position 3, and in any case <span style="color: #0000ff;">x</span> does not appear in <span style="color: #0000ff;">t</span>.</p>
<p style="text-align: justify;">But we are so close! Something <em>like </em>this <em>should </em>work, should it not?</p>
<p style="text-align: justify;">So patience, patience, let us tweak it just one trifle more, OK?</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">— Program attempt #5.</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 0 ; j := n</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i ≥ j <strong>or Result</strong> > 0 <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := (i + j) // 2</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] < x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>elseif </strong> t [m] > x<strong> then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;"><strong>Result </strong>:= m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="text-align: justify;">Does it work now?</p>
<hr />
<h2 style="text-align: justify;">6. Attempt #6</h2>
<p style="text-align: justify;">The question about program #5 was the same as before: is it right, is it wrong?</p>
<p style="text-align: justify;">Well, I know you are growing more upset at me with each section, but the answer is still that this program is wrong. But the way it is wrong is somewhat specific; and it applies, in fact, to all previous variants as well.</p>
<p style="text-align: justify;">This particular wrongness (fancy word for “bug”) has a history. As I pointed out in the first article, there is a long tradition of using binary search to illustrate software correctness issues. A number of versions were published and proved correct, including one in the justly admired <em>Programming Pearls</em> series by Jon Bentley. Then in 2006 Joshua Bloch, then at Google, published a now legendary blog article [2] which showed that all these versions suffered from a major flaw: to obtain m, the approximate mid-point between i and j, they compute</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">(i + j) // 2</span></p>
<p style="text-align: justify;">which, working on computer integers rather than mathematical integers, might overflow! This in a situation in which both<span style="color: #0000ff;"> i</span> and<span style="color: #0000ff;"> j</span>, and hence <span style="color: #0000ff;">m</span> as well, are well within the range of the computer’s representable integers, 2<sup>-n</sup> to 2<sup>n</sup> (give or take 1) where <span style="color: #0000ff;">n</span> is typically 31 or, these days, 63, so that there is no conceptual justification for the overflow.</p>
<p style="text-align: justify;">In the specification that I have used for this article, i starts at 1, so the problem will only arise for an array that occupies half of the memory or more, which is a rather extreme case (but still should be handled properly). In the general case, it is often useful to use arrays with arbitrary bounds (as in Eiffel), so we can have even a small array, with high indices, for which the computation will produce an overflow and bad results.</p>
<p style="text-align: justify;">The Bloch gotcha is a stark reminder that in considering the correctness of programs we must include all relevant aspects and consider programs as they are executed on a real computer, not as we wish they were executed in an ideal model world.</p>
<p style="text-align: justify;">(Note that Jon Bentley alluded to this requirement in his original article: while he did not explicitly mention integer overflow, he felt it necessary to complement his proof by the comment that that “<em>As laborious as our proof of binary search was, it is still unfinished by some standards. How would you prove that the program is free of runtime errors (such as division by zero, word overflow, or array indices out of bounds)?</em>” Prescient words!)</p>
<p style="text-align: justify;">It is easy to correct the potential arithmetic overflow bug: instead of (i + j) // 2, Bloch suggested we compute the average as</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i + (j – i) // 2</span></p>
<p style="text-align: justify;">which is the same from a mathematician’s viewpoint, and indeed will compute the same value if both variants compute one, but will not overflow if both i and j are within range.</p>
<p style="text-align: justify;">So we are ready for version 6, which is the same as version 5 save for that single change:</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">— Program attempt #6.</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 0 ; j := n</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i ≥ j <strong>or Result</strong> > 0 <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m :=<em><strong><span style="color: #993366;"> i + </span></strong></em><strong><span style="color: #993366;">(</span></strong><em><strong><span style="color: #993366;">j – i</span></strong></em><strong><span style="color: #993366;">) // 2</span></strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] < x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>elseif </strong> t [m] > x<strong> then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;"><strong>Result </strong>:= m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="text-align: justify;">Now is probably the right time to recall the words by which Donald Knuth introduces binary search in the original 1973 tome on <em>Sorting and Searching</em> of his seminal book series<em> The Art of Computer Programming</em>:<img class="alignright wp-image-5307 " src="https://bertrandmeyer.com/wp-content/upLoads/knuth.jpg" alt="knuth" width="70" height="125" srcset="https://bertrandmeyer.com/wp-content/upLoads/knuth.jpg 168w, https://bertrandmeyer.com/wp-content/upLoads/knuth-84x150.jpg 84w" sizes="(max-width: 70px) 100vw, 70px" /></p>
<blockquote><p><em>Although the basic idea of binary search is comparatively straightforward, the details can be somewhat tricky, and many good programmers have done it wrong the first few times they tried.</em></p></blockquote>
<p style="text-align: justify;">Do you need more convincing? Be careful what you answer, I have more variants up my sleeve and can come up with many more almost-right-but-actually-wrong program attempts if you nudge me. But OK, even the best things have an end. This is not the last section yet, but that was the last program attempt. To the naturally following next question in this running quiz, “<em>is version 6 right or wrong</em>”, I can provide the answer: it is, to the best of my knowledge, a correct program. Yes! [3].</p>
<p style="text-align: justify;">But the quiz continues. Since answers to the previous questions were all that the programs <span style="color: #993366;"><strong><em>were not</em></strong></span> correct, it sufficed in each case to find one case for which the program did not behave as expected. Our next question is of a different nature: can you find an argument why version #6 <strong><em><span style="color: #993366;">is</span> </em></strong>correct?</p>
<h4>References for section 6</h4>
<p style="text-align: justify;">[1] (In particular) Jon Bentley: <em>Programming Pearls — Writing Correct Programs</em>, in <em>Communications of the ACM</em>, vol. 26, no. 12, December 1983, pages 1040-1045, available <a href="https://dl.acm.org/doi/10.1145/358476.358484" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p style="text-align: justify;">[2] Joshua Bloch:<em> Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken</em>, blog post, on the Google AI Blog, 2 June 2006, available <a href="https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p style="text-align: justify;">[3] A caveat: the program is correct barring any typos or copy-paste errors — I am starting from rigorously verified programs (see the next posts), but the blogging system’s UI and text processing facilities are not the best possible for entering precise technical text such as code. However carefully I check, I cannot rule out a clerical mistake, which of course would be corrected as soon as it is identified.</p>
<hr />
<h2 style="text-align: justify;">7. Using a program prover</h2>
<p style="text-align: justify;">Preceding sections presented candidate binary search algorithms and asked whether they are correct. “Correct” means something quite precise: that for an array <span style="color: #0000ff;">t</span> and a value <span style="color: #0000ff;">x</span>, the final value of the variable <strong><span style="color: #0000ff;">Result</span> </strong>is a valid index of <span style="color: #0000ff;">t</span> (that is to say, is between 1 and <span style="color: #0000ff;">n</span>, the size of <span style="color: #0000ff;">t</span>) if and only if <span style="color: #0000ff;">x</span> appears at that index in <span style="color: #0000ff;">t</span>.</p>
<p style="text-align: justify;">The last section boldly stated that program attempt #6 was correct. The question was: why?</p>
<p style="text-align: justify;">In the case of the preceding versions, which were incorrect, you could prove that property, and I do mean prove, simply by exhibiting a single counter-example: a single <span style="color: #0000ff;">t</span> and <span style="color: #0000ff;">x</span> for which the program does not correctly set <span style="color: #0000ff;"><strong>Result</strong></span>. Now that I asserting the program to be correct, one example, or a million examples, do not suffice. In fact they are almost irrelevant. Test as much as you like and get correct results every time, you cannot get rid of the gnawing fear that if you had just tested one more time after the millionth test you would have produced a failure. Since the set of possible tests is infinite there is no solution in sight [1].</p>
<p style="text-align: justify;">We need a proof.</p>
<p style="text-align: justify;">I am going to explain that proof in the next section, but before that I would like to give you an opportunity to look at the proof by yourself. I wrote in one of the earlier articles that most of what I have to say was already present in Jon Bentley’s 1983 <em>Programming Pearls</em> contribution [2], but a dramatic change did occur in the four decades since: the appearance of automated proof system that can handle significant, realistic programs. One such system, AutoProof, was developed at the Chair of Software engineering at ETH Zurich [3] (key project members were <a href="https://bugcounting.net/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">Carlo Furia</span></a>, <a href="http://se.inf.ethz.ch/people/nordio/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">Martin Nordio</span></a>, <a href="https://cseweb.ucsd.edu/~npolikarpova/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">Nadia Polikarpova</span></a> and <a href="http://se.inf.ethz.ch/people/tschannen/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">Julian Tschannen</span></a>, with initial contributions by <a href="http://se.inf.ethz.ch/people/schoeller/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">Bernd Schoeller</span></a>) on the basis of the Boogie proof technology from Microsoft Research).</p>
<p style="text-align: justify;">AutoProof is available for online use, and it turns out that one of the basic tutorial examples is binary search. You can go to the <a href="http://comcom.csail.mit.edu/e4pubs/#tut-binary_search_solution" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">corresponding page</span></a> and run the proof.</p>
<p style="text-align: justify;">I am going to let you try this out (and, if you are curious, other online AutoProof examples as well) without too many explanations; those will come in the next section. Let me simply name the basic proof technique: <strong>loop invariant</strong>. A loop invariant is a property INV associated with a loop, such that:</p>
<ul style="text-align: justify;">
<li><strong>A</strong>. After the loop’s initialization, INV will hold.</li>
<li><strong>B</strong>. <em>One </em>execution of the loop’s body, if started with INV satisfied (and the loop’s exit condition <em>not </em>satisfied, otherwise we wouldn’t be executing the body!), satisfies INV again when it terminates.</li>
</ul>
<p style="text-align: justify;">This idea is of course the same as that of a proof by induction in mathematics: the initialization corresponds to the base step (proving that P (0) holds) and the body property to the induction step (proving that from P (n) follows P (n + 1). With a traditional induction proof we deduce that the property (P (n)) holds for all integers. For the loop, we deduce that when the loop finishes its execution:</p>
<ul style="text-align: justify;">
<li>The invariant still holds, since executing the loop means executing the initialization once then the loop body zero or more times.</li>
<li>And of course the exit condition also holds, since otherwise we would still be looping.</li>
</ul>
<p style="text-align: justify;">That is how we prove the correctness of a loop: the conjunction of the invariant and the exit condition must yield the property that we seek (in the example, the property, stated above of <strong><span style="color: #0000ff;">Result</span> </strong>relative to<span style="color: #0000ff;"> t</span> and <span style="color: #0000ff;">x</span>).</p>
<p style="text-align: justify;">We also need to prove that the loop does terminate. This part involves another concept, the loop’s <em>variant</em>, which I will explain in the next section.</p>
<p style="text-align: justify;">For the moment I will not say anything more and let you look at the AutoProof example page (again, you will find it <a href="http://comcom.csail.mit.edu/e4pubs/#tut-binary_search_solution" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>), run the verification, and read the invariant and other formal elements in the code.</p>
<p style="text-align: justify;">To “run the verification” just click the <strong>Verify </strong>button on the page. Let me emphasize (and emphasize again and again and again) that clicking <strong>Verify</strong> will not run the code. There is no execution engine in AutoProof, and the verification does not use any test cases. It processes the text of the program as it appears on the page and below. It applies mathematical techniques to perform the proof; the core property to be proved is that the proposed loop invariant is indeed invariant (i.e. satisfies properties<strong> A</strong> and <strong>B</strong> above).</p>
<p style="text-align: justify;">The program being proved on the AutoProof example page is version #6 from the last section, with different variable names. So far for brevity I have used short names such as <span style="color: #0000ff;">i</span>, <span style="color: #0000ff;">j</span> and <span style="color: #0000ff;">m</span> but the program on the AutoProof site applies good naming practices with variables called <span style="color: #0000ff;">low</span>, <span style="color: #0000ff;">up</span>, <span style="color: #0000ff;">middle</span> and the like. So here is that version again with the new variable names:</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">— Program attempt #7 (identical to #6 with different variable names) .</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">low := 0 ; up := n</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>low ≥ up <strong>or Result</strong> > 0 <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">middle := low + ((up – low) // 2)</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>a [middle] < value <strong>then</strong><strong> </strong>— The array is now called a rather than t</span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">low := middle + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>elseif </strong><strong> a</strong> [middle] > value<strong> then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">up := middle</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;"><strong>Result </strong>:= middle</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="text-align: justify;">This is exactly the algorithm text on the AutoProof page, the one that you are invited to let AutoProof verify for you. I wrote “algorithm text” rather than “program text” because the actual program text (in Eiffel) includes <span style="color: #0000ff;"><strong>variant</strong></span> and <span style="color: #0000ff;">invariant</span> clauses which do not affect the program’s execution but make the proof possible.</p>
<p style="text-align: justify;">Whether or not these concepts (invariant, variant, program proof) are completely new to you, do try the prover and take a look at the proof-supporting clauses. In the next article I will remove any remaining mystery.</p>
<h4>Note and references for section 7</h4>
<p style="text-align: justify;">[1] Technically the set of possible <span style="color: #0000ff;">[array, value]</span> pairs is finite, but of a size defying human abilities. As I pointed out in the first section, the “model checking” and “abstract interpretation” verification techniques actually attempt to perform an exhaustive test anyway, after drastically reducing the size of the search space. That will be for some other article.</p>
<p style="text-align: justify;">[2] Jon Bentley:<em> Programming Pearls: Writing Correct Programs</em>, in <em>Communications of the ACM</em>, vol. 26, no. 12, pp. 1040-1045, December 1983, available for example <a href=" https://www.cs.tufts.edu/~nr/cs257/archive/jon-bentley/correct-programs.pdf" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p style="text-align: justify;">[3] The <a href="http://se.inf.ethz.ch/research/autoproof/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">AutoProof page</span></a> contains documentations and numerous article references.</p>
<hr />
<h2 style="text-align: justify;">8. Understanding the proof</h2>
<p style="text-align: justify;">The previous section invited you to run the verification on the <a href="http://comcom.csail.mit.edu/e4pubs/#tut-binary_search_solution" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">AutoProof tutorial page</span></a> dedicated to the example. AutoProof is an automated proof system for programs. This is just a matter of clicking “Verify”, but more importantly, you should read the annotations added to the program text, particularly the loop invariant, which make the verification possible. (To avoid any confusion let me emphasize once more that clicking “Verify” does not run the program, and that no test cases are used; the effect is to run the verifier, which attempts to prove the correctness of the program by working solely on the program text.)</p>
<p style="text-align: justify;">Here is the program text again, reverting for brevity to the shorter identifiers (the version on the AutoProof page has more expressive ones):</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>from</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">i := 0 ; j := n</span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>until </strong>i ≥ j <strong>or Result</strong> > 0 <strong>loop</strong></span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := i + (j – i) // 2</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] < x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>elseif </strong> t [m] > x<strong> then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;"><strong>Result</strong> := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="text-align: justify;">Let us now see what makes the proof possible. The key property is the loop invariant, which reads</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">A: 1 ≤ i ≤ j ≤ n + 1</span><br />
<span style="color: #0000ff;"> B: 0 ≤ <strong>Result </strong>≤ n</span><br />
<span style="color: #0000ff;"> C: ∀ k: 1 .. i –1 | t [k] < x</span><br />
<span style="color: #0000ff;"> D: ∀ k: j .. n | t [k] > x</span><br />
<span style="color: #0000ff;"> E: (<strong>Result </strong>> 0) ⇒ (t [<strong>Result</strong>] = x)</span></p>
<p style="text-align: justify;">The notation is slightly different on the Web page to adapt to the Eiffel language as it existed at the time it was produced; in today’s Eiffel you can write the invariant almost as shown above. Long live Unicode, allowing us to use symbols such as <span style="color: #0000ff;">∀</span> (obtained not by typing them but by using smart completion, e.g. you start typing “forall” and you can select the <span style="color: #0000ff;">∀</span> symbol that pops up), <span style="color: #0000ff;">⇒</span> for “<span style="color: #0000ff;"><strong>implies</strong></span>” and many others</p>
<p style="text-align: justify;">Remember that the invariant has to be established by the loop’s initialization and preserved by every iteration. The role of each of its clauses is as follows:</p>
<ul style="text-align: justify;">
<li>A: keep the indices in range.</li>
<li>B: keep the variable <span style="color: #0000ff;"><strong>Result</strong></span>, whose final value will be returned by the function, in range.</li>
<li>C and D: eliminate index intervals in which we have determined that the sought value, <span style="color: #0000ff;">x</span>, does not appear. Before <span style="color: #0000ff;">i</span>, array values are smaller; starting at<span style="color: #0000ff;"> j</span>, they are greater. So these two intervals, <span style="color: #0000ff;">1..i</span> and <span style="color: #0000ff;">j..</span>n, cannot contain the sought value. The overall idea of the algorithm (and most other search algorithms) is to extend one of these two intervals, so as to narrow down the remaining part of <span style="color: #0000ff;">1..n</span> where <span style="color: #0000ff;">x</span> may appear.</li>
<li>E: express that as soon as we find a positive (non-zero) <span style="color: #0000ff;"><strong>Result</strong></span>, its value is an index in the array (see B) where <span style="color: #0000ff;">x</span> does appear.</li>
</ul>
<p style="text-align: justify;">Why is this invariant useful? The answer is that on exit it gives us what we want from the algorithm. The exit condition, recalled above, is</p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0000ff;">i ≥ j <strong>or Result</strong> > 0</span></p>
<p style="text-align: justify;">Combined with the invariant, it tells us that on exit one of the following will hold:</p>
<ul style="text-align: justify;">
<li><span style="color: #0000ff;"><strong>Result</strong> </span>> 0, but then because of E we know that <span style="color: #0000ff;">x</span> appears at position <strong>Result</strong>.</li>
<li><span style="color: #0000ff;">i < j</span>, but then A, C and D imply that <span style="color: #0000ff;">x</span> does not appear <em>anywhere </em>in <span style="color: #0000ff;">t</span>. In that case it cannot be true that <span style="color: #0000ff;"><strong>Result</strong> </span>> 0, but then because of B <strong><span style="color: #0000ff;">Result</span> </strong>must be zero.</li>
</ul>
<p style="text-align: justify;">What AutoProof proves, mechanically, is that under the function’s precondition (that the array is sorted):</p>
<ul style="text-align: justify;">
<li>The initialization ensures the invariant.</li>
<li>The loop body, assuming that the invariant is satisfied but the exit condition is not, ensures the loop invariant again after it executes.</li>
<li>The combination of the invariant and the exit condition ensures, as just explained, the postcondition of the function (the property that <strong><span style="color: #0000ff;">Result</span> </strong>will either be positive and the index of an element equal to x, or zero with the guarantee that x appears nowhere in t).</li>
</ul>
<p style="text-align: justify;">Such a proof guarantees the correctness of the program <em>if it terminates</em>. We (and AutoProof) must prove separately that it <em>does </em>terminate. The technique is simple: find a “loop <strong><em>variant</em></strong>”, an integer quantity <span style="color: #0000ff;">v</span> which remains non-negative throughout the loop (in other words, the loop invariant includes or implies <span style="color: #0000ff;">v</span> ≥ 0) and decreases on each iteration, so that the loop cannot continue executing forever. An obvious variant here is<span style="color: #0000ff;"> j – i + 1</span> (where the + 1 is needed because<span style="color: #0000ff;"> j – i</span> may go down to -1 on the last iteration if <span style="color: #0000ff;">x</span> does not appear in the array). It reflects the informal idea of the algorithm: repeatedly decrease an interval <span style="color: #0000ff;">i .. j – 1</span> (initially, 1 .. <span style="color: #0000ff;">n</span>) guaranteed to be such that <span style="color: #0000ff;">x</span> appears in<span style="color: #0000ff;"> t</span> if and only if it appears at an index in that interval. At the end, either we already found <span style="color: #0000ff;">x</span> or the interval is empty, implying that <span style="color: #0000ff;">x</span> does not appear at all.</p>
<p style="text-align: justify;">A great reference on variants and the techniques for proving program termination is a <em>Communications of the ACM</em> article of 2011: [3].</p>
<p style="text-align: justify;">The variant gives an upper bound on the number of iterations that remain at any time. In sequential search,<span style="color: #0000ff;"> j – i + 1</span> would be our best bet; but for binary search it is easy to show that <span style="color: #0000ff;"> log<sub>2 </sub>(j – i + 1)</span> is also a variant, extending the proof of correctness with a proof of performance (the key goal of binary search being to ensure a logarithmic rather than linear execution time).</p>
<p style="text-align: justify;">This example is, I hope, enough to highlight the crucial role of loop invariants and loop variants in reasoning about loops. How did we get the invariant? It looks like I pulled it out of a hat. But in fact if we go the other way round (as advocated in classic books [1] [2]) and develop the invariant and the loop together the process unfolds itself naturally and there is nothing mysterious about the invariant.</p>
<p style="text-align: justify;">Here I cannot resist quoting (thirty years on!) from my own book <em>Introduction to the Theory of Programming Languages</em> [4]. It has a chapter on axiomatic semantics (also known as <em>Hoare logic</em>, the basis for the ideas used in this discussion), which I just made available: see <a href="http://se.ethz.ch/~meyer/down/cacm/09-axiom.pdf" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a> [5]. Its exercise 9.12 is the starting point for this series of articles. Here is how the book explains how to design the program and the invariant [6]:</p>
<blockquote>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0a5710;">In the general case <span style="color: #000000;">[<em>of search, binary or not</em>]</span> we aim for a loop body of the form</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;">m := ‘‘Some value in 1.. n such that i ≤ m < j’’;</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>if </strong>t [m] ≤ x <strong>then</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">i := m + 1</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>else</strong></span></p>
<p style="padding-left: 90px; text-align: justify;"><span style="color: #0000ff;">j := m</span></p>
<p style="padding-left: 60px; text-align: justify;"><span style="color: #0000ff;"><strong>end</strong></span></p>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0a5710;">It is essential to get all the details right (and easy to get some wrong):</span></p>
<ul style="text-align: justify;">
<li><span style="color: #0a5710;">The instruction must always decrease the variant<span style="color: #0000ff;"> j – i</span>, by increasing <span style="color: #0000ff;">i</span> or decreasing <span style="color: #0000ff;">j</span>. If the the definition of <span style="color: #0000ff;">m</span> specified just <span style="color: #0000ff;">m ≤ j</span> rather than <span style="color: #0000ff;">m < j</span>, the second branch would not meet this goal.</span></li>
<li><span style="color: #0a5710;"> This does not transpose directly to<span style="color: #0000ff;"> i</span>: requiring<span style="color: #0000ff;"> i < m < j</span> would lead to an impossibility when <span style="color: #0000ff;">j – i</span> is equal to 1. So we accept<span style="color: #0000ff;"> i ≤ m</span> but then we must take <span style="color: #0000ff;">m + 1</span>, not <span style="color: #0000ff;">m</span>, as the new value of<span style="color: #0000ff;"> i</span> in the first branch.</span></li>
<li><span style="color: #0a5710;"> The conditional’s guards are tests on<span style="color: #0000ff;"> t [m]</span>, so m must always be in the interval <span style="color: #0000ff;">1 . . n</span>. This follows from the clause <span style="color: #0000ff;">0 ≤ i ≤ j ≤ n + 1</span> which is part of the invariant.</span></li>
<li><span style="color: #0a5710;"> If this clause is satisfied, then <span style="color: #0000ff;">m ≤ n</span> and <span style="color: #0000ff;">m > 0</span>, so the conditional instruction indeed leaves this clause invariant.</span></li>
<li><span style="color: #0a5710;">You are invited to check that both branches of the conditional also preserve the rest of the invariant.</span></li>
<li><span style="color: #0a5710;">Any policy for choosing <span style="color: #0000ff;">m</span> is acceptable if it conforms to the above scheme. Two simple choices are <span style="color: #0000ff;">i</span> and <span style="color: #0000ff;">j – 1</span>; they lead to variants of the sequential search algorithm<span style="color: #000000;"> [<em>which the book discussed just before binary search</em>]</span>.</span></li>
</ul>
<p style="padding-left: 30px; text-align: justify;"><span style="color: #0a5710;">For binary search, <span style="color: #0000ff;">m</span> will be roughly equal to the average of<span style="color: #0000ff;"> i</span> and <span style="color: #0000ff;">j</span>.</span></p>
</blockquote>
<p style="text-align: justify;">“Roughly” because we need an integer, hence the // (integer division).</p>
<p style="text-align: justify;">In the last section, I will reflect further on the lessons we can draw from this example, and the practical significance of the key concept of invariant.</p>
<h4>References and notes for section 8</h4>
<p style="text-align: justify;">[1] E.W. Dijkstra:<em> A Discipline of Programming</em>, Prentice Hall, 1976.</p>
<p style="text-align: justify;">[2] David Gries: <em>The Science of Programming</em>, Springer, 1989.</p>
<p style="text-align: justify;">[3] Byron Cook, Andreas Podelski and Andrey Rybalchenko: <em>Proving program termination</em>, in <em>Communications of the ACM</em>, vol. 54, no. 11, May 2011, pages 88-98, available <a href="https://dl.acm.org/doi/abs/10.1145/1941487.1941509" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p style="text-align: justify;">[4] Bertrand Meyer, <em>Introduction to the Theory of Programming Languages</em>, Prentice Hall, 1990. The book is out of print but can be found used, e.g. on Amazon. See the next entry for an electronic version of two chapters.</p>
<p style="text-align: justify;">[5] Bertrand Meyer <em>Axiomatic semantics</em>, chapter 9 from [3], available <a href="http://se.ethz.ch/~meyer/down/cacm/09-axiom.pdf" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>. Note that the PDF was reconstructed from an old text-processing system (troff); the figures could not be recreated and are missing. (One of these days I might have the patience of scanning them from a book copy and adding them. Unless someone wants to help.) I also put online, with the same caveat, chapter 2 on notations and mathematical basis: see <a href="http://se.ethz.ch/~meyer/down/cacm/02-math.pdf" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p style="text-align: justify;">[6] Page 383 of [4] and [5]. The text is verbatim except a slight adaptation of the programming notation and a replacement of the variables: <span style="color: #0000ff;">i</span> in the book corresponds to <span style="color: #0000ff;">i – 1</span> here, and <span style="color: #0000ff;">j</span> to <span style="color: #0000ff;">j – 1</span>. As a matter of fact I prefer the original conventions from the book (purely as a matter of taste, since the two are rigorously equivalent), but I changed here to the conventions of the program as it appears in the AutoProof page, with the obvious advantage that you can verify it mechanically. The text extract is otherwise exactly as in the 1990 book.</p>
<h2 style="text-align: justify;">9. Lessons learned</h2>
<p style="text-align: justify;">What was this journey about?</p>
<p style="text-align: justify;">We started with a succession of attempts that might have “felt right” but were in fact all wrong, each in its own way: giving the wrong answer in some cases, crashing (by trying to access an array outside of its index interval) in some cases, looping forever in some cases. Always “in some cases”, evidencing the limits of testing, which can never guarantee that it exercises all the problem cases. A correct program is one that works in <em>all </em>cases. The final version was correct; you were able to prove its correctness with an online tool and then to understand (I hope) what lies behind that proof.</p>
<p style="text-align: justify;">To show how to prove such correctness properties, I have referred throughout the series to publications from the 1990s (my own <em>Introduction to The Theory of Programming Languages</em>), the 1980s (Jon Bentley’s <em>Programming Pearls</em> columns, Gries’s <em>Science of Programming</em>), and even the 1970s (Dijkstra’s <em>Discipline of Programming</em>). I noted that the essence of my argument appeared in a different form in one of Bentley’s <em>Communications</em> articles. What is the same and what has changed?</p>
<p style="text-align: justify;">The core concepts have been known for a long time and remain applicable: assertion, invariant, variant and a few others, although they are much better understood today thanks to decades of theoretical work to solidify the foundation. Termination also has a more satisfactory theory.</p>
<p style="text-align: justify;">On the practical side, however, the progress has been momentous. Considerable engineering has gone into making sure that the techniques scaled up. At the time of Bentley’s article, binary search was typical of the kind of programs that could be proved correct, and the proof had to proceed manually. Today, we can tackle much bigger programs, and use tools to perform the verification.</p>
<p style="text-align: justify;">Choosing binary search again as an example today has the obvious advantage that everyone can understand all the details, but should not be construed as representative of the state of the art. Today’s proof systems are far more sophisticated. Entire operating systems, for example, have been mechanically (that is to say, through a software tool) proved correct. In the AutoProof case, a major achievement was the proof of correctness [1] of an entire data structure (collections) library, EiffelBase 2. In that case, the challenge was not so much size (about 8,000 source lines of code), but the complexity of both:</p>
<ul style="text-align: justify;">
<li>The scope of the verification, involving the full range of mechanisms of a modern object-oriented programming language, with classes, inheritance (single and multiple), polymorphism, dynamic binding, generics, exception handling etc.</li>
<li>The code itself, using sophisticated data structures and algorithms, involving in particular advanced pointer manipulations.</li>
</ul>
<p style="text-align: justify;">In both cases, progress has required advances on both the science and engineering sides. For example, the early work on program verification assumed a bare-bones programming language, with assignments, conditionals, loops, routines, and not much more. But real programs use many other constructs, growing ever richer as programming languages develop. To cover exception handling in AutoProof required both theoretical modeling of this construct (which appeared in [2]) and implementation work.</p>
<p style="text-align: justify;">More generally, scaling up verification capabilities from the small examples of 30 years ago to the sophisticated software that can be verified today required the considerable effort of an entire community. AutoProof, for example, sits at the top of a tool stack relying on the Boogie environment from Microsoft Research, itself relying on the Z3 theorem prover. Many person-decades of work make the result possible.</p>
<p style="text-align: justify;"><img class=" wp-image-5305 aligncenter" src="https://bertrandmeyer.com/wp-content/upLoads/tool_stack-500x346.jpg" alt="tool_stack" width="451" height="312" srcset="https://bertrandmeyer.com/wp-content/upLoads/tool_stack-500x346.jpg 500w, https://bertrandmeyer.com/wp-content/upLoads/tool_stack-150x104.jpg 150w, https://bertrandmeyer.com/wp-content/upLoads/tool_stack.jpg 616w" sizes="(max-width: 451px) 100vw, 451px" /></p>
<p style="text-align: justify;">Beyond the tools, the concepts are esssential. One of them, loop invariants, has been illustrated in the final version of our program. I noted in the first article the example of a well-known expert and speaker on testing who found no better way to announce that a video would not be boring than “relax, we are not going to talk about loop invariants.” Funny perhaps, but unfair. Loop invariants are one of the most beautiful concepts of computer science. Not so surprisingly, because loop invariants are the application to programming of the concept of mathematical induction. According to the great mathematician Henri Poincaré, all of mathematics rests on induction; maybe he exaggerated, maybe not, but who would think of teaching mathematics without explaining induction? Teaching programming without explaining loop invariants is no better.</p>
<p style="text-align: justify;">Below is an illustration (if you will accept my psychedelic diagram) of what a loop is about, as a problem-solving technique. Sometimes we can get the solution directly. Sometimes we identify several steps to the solution; then we use a sequence (<span style="color: #0000ff;">A ; B; C</span>). Sometimes we can find two (or more) different ways of solving the problem in different cases; then we use a conditional (<span style="color: #0000ff;"><strong>if </strong>c <strong>then </strong>A <strong>else </strong>B <strong>end</strong></span>). And sometimes we can only get a solution by getting closer repeatedly, not necessarily knowing in advance how many times we will have to advance towards it; then, we use a loop.</p>
<p style="text-align: center;"><img class="alignnone wp-image-5306" src="https://bertrandmeyer.com/wp-content/upLoads/loop_strategy-500x310.jpg" alt="loop_strategy" width="540" height="335" srcset="https://bertrandmeyer.com/wp-content/upLoads/loop_strategy-500x310.jpg 500w, https://bertrandmeyer.com/wp-content/upLoads/loop_strategy-150x93.jpg 150w, https://bertrandmeyer.com/wp-content/upLoads/loop_strategy-768x476.jpg 768w, https://bertrandmeyer.com/wp-content/upLoads/loop_strategy-1024x635.jpg 1024w, https://bertrandmeyer.com/wp-content/upLoads/loop_strategy.jpg 1461w" sizes="(max-width: 540px) 100vw, 540px" /></p>
<p style="text-align: justify;">We identify an often large (i.e. very general) area where we know the solution will lie; we call that area the loop invariant. The solution or solutions (there may be more than one) will have to satisfy a certain condition; we call it the exit condition. From wherever we are, we shoot into the invariant region, using an appropriate operation; we call it the initialization. Then we execute as many times as needed (maybe zero if our first shot was lucky) an operation that gets us closer to that goal; we call it the loop body. To guarantee termination, we must have some kind of upper bound of the distance to the goal, decreasing each time discretely; we call it the loop variant.</p>
<p style="text-align: justify;">This explanation is only an illustration, but I hope it makes the ideas intuitive. The key to a loop is its invariant. As the figure suggests, the invariant is always a generalization of the goal. For example, in binary search (and many other search algorithms, such as sequential search), our goal is to find a <em><strong>position</strong> </em>where either <span style="color: #0000ff;">x</span> appears or, if it does not, we can be sure that it appears nowhere. The invariant says that we have an <strong><em>interval</em> </strong>with the same properties (either <span style="color: #0000ff;">x</span> appears at a position belonging to that interval or, if it does not, it appears nowhere). It obviously includes the goal as a special case: if the interval has length 1, it defines a single position.</p>
<p style="text-align: justify;">An invariant should be:</p>
<ol style="text-align: justify;">
<li>Strong enough that we can devise an exit condition which in the end, combined with the invariant, gives us the goal we seek (a solution).</li>
<li>Weak enough that we can devise an initialization that ensures it (by shooting into the yellow area) easily.</li>
<li>Tuned so that we can devise a loop body that, from a state satifying the invariant, gets us to a new one that is closer to the goal.</li>
</ol>
<p style="text-align: justify;">In the example:</p>
<ol style="text-align: justify;">
<li>The exit condition is simply that the interval’s length is 1. (Technically, that we have computed <strong><span style="color: #0000ff;">Result</span> </strong>as the single interval element.) Then from the invariant and the exit condition, we get the goal we want.</li>
<li>Initialization is easy, since we can just take the initial interval to be the whole index range of the array, which trivially satisfies the invariant.</li>
<li>The loop body simply decreases the length of the interval (which can serve as loop variant to ensure termination). How we decrease the length depends on the search strategy; in sequential search, each iteration decreases the length by 1, correct although not fast, and binary search decreases it by about half.</li>
</ol>
<p style="text-align: justify;">The general scheme always applies. <em>Every </em> loop algorithm is characterized by an invariant. The invariant may be called the DNA of the algorithm.</p>
<p style="text-align: justify;">To demonstrate the relevance of this principle, my colleagues Furia, Velder, and I published a survey paper [6] in <em>ACM</em> <em>Computing Surveys </em>describing the invariants of important algorithms in many areas of computer science, from search algorithms to sorting (all major algorithms), arithmetic (long integer addition, squaring), optimization and dynamic programming (Knapsack, Levenshtein/Edit distance), computational geometry (rotating calipers), Web (Page Rank)… I find it pleasurable and rewarding to go deeper into the basis of loop algorithms and understand their invariants; like a geologist who does not stop at admiring the mountain, but gets to understand how it came to be.</p>
<p style="text-align: justify;">Such techniques are inevitable if we want to get our programs right, the topic of this article. Even putting aside the Bloch average-computation overflow issue, I started with 5 program attempts, all kind of friendly-looking but wrong in different ways. I could have continued fiddling with the details, following my gut feeling to fix the flaws and running more and more tests. Such an approach can be reasonable in some cases (if you have an algorithm covering a well-known and small set of cases), but will not work for non-trivial algorithms.</p>
<p style="text-align: justify;">Newcomers to the concept of loop invariant sometimes panic: “this is all fine, you gave me the invariants in your examples, how do I find my own invariants for my own loops?” I do not have a magic recipe (nor does anyone else), but there is no reason to be scared. Once you have understood the concept and examined enough examples (just a few of those in [6] should be enough), writing the invariant at the same time as you are devising a loop will come as a second nature to you.</p>
<p style="text-align: justify;">As the fumbling attempts in the first few sections should show, there is not much of an alternative. Try this approach. If you are reaching these final lines after reading what preceded them, allow me to thank you for your patience, and to hope that this rather long chain of reflections on verification will have brought you some new insights into the fascinating challenge of writing correct programs.</p>
<h2 style="text-align: justify;">References</h2>
<p style="text-align: justify;">[1] Nadia Polikarpova, Julian Tschannen, and Carlo A. Furia: <em>A Fully Verified Container Library</em>, in Proceedings of 20th International Symposium on Formal Methods (FM 15), 2015. (Best paper award.)</p>
<p style="text-align: justify;">[2] Martin Nordio, Cristiano Calcagno, Peter Müller and Bertrand Meyer: <em>A Sound and Complete Program Logic for Eiffel</em>, in <em>Proceedings of TOOLS 2009 (Technology of Object-Oriented Languages and Systems)</em>, Zurich, June-July 2009, eds. M. Oriol and B. Meyer, Springer LNBIP 33, June 2009.</p>
<p style="text-align: justify;">[3] Boogie page at MSR, see <a href="https://www.microsoft.com/en-us/research/project/boogie-an-intermediate-verification-language/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a> for publications and other information.</p>
<p style="text-align: justify;">[4] Z3 was also originally from MSR and has been open-sourced, one can get access to publications and other information from its <a href="https://en.wikipedia.org/wiki/Z3_Theorem_Prover" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">Wikipedia page</span></a>.</p>
<p style="text-align: justify;">[5] Carlo Furia, Bertrand Meyer and Sergey Velder: <em>Loop invariants: Analysis, Classification and Examples</em>, in <em>ACM Computing Surveys</em>, vol. 46, no. 3, February 2014. Available <a href="http://se.ethz.ch/~meyer/publications/methodology/invariants.pdf" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p style="text-align: justify;">[6] Dynamic programming is a form of recursion removal, turning a recursive algorithm into an iterative one by using techniques known as “memoization” and “bottom-up computation” (Berry). In this transformation, the invariant plays a key role. I will try to write this up some day as it is a truly elegant and illuminating explanation.</p>
<p>The post <a rel="nofollow" href="https://bertrandmeyer.com/2020/03/26/getting-program-right-nine-episodes/">Getting a program right, in nine episodes</a> appeared first on <a rel="nofollow" href="https://bertrandmeyer.com">Bertrand Meyer's technology+ blog</a>.</p>
<img src="http://feeds.feedburner.com/~r/BertrandMeyer/~4/NCa3Te6N4tI" height="1" width="1" alt=""/>

<p>Consider the following expression:</p>
<p style="padding-left: 30px;"><span style="color: #0000ff;">∃ c: s ¦ moisture (c) = soft</span></p>
<p>This is obviously mathematics. To express such a property in a programming language, you have to write a function containing a loop that iterates through the elements of <span style="color: #0000ff;">s</span>. Right?</p>
<p>Wrong. The above construct is valid Eiffel. It’s a consequence of recent syntax extensions that retain all the simplicity and consistency of the language but take full advantage of Unicode. Of course you do not have Unicode characters such as <code class="prettyprint lang-eiffel prettyprinted"><span class="pln">∃</span></code> on you keyboard, but EiffelStudio’s completion mechanism inserts them for you.</p>
<p>To see how this works, just read Alexander Kogtenkov’s <a href="https://www.eiffel.org/blog/Alexander%20Kogtenkov/2020/03/symbolic-forms-loops" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">recent blog post</span></a> on the topic.</p>
<p>The post <a rel="nofollow" href="https://bertrandmeyer.com/2020/03/17/notations-didnt-even-know-use/">Notations you didn’t even know you could use</a> appeared first on <a rel="nofollow" href="https://bertrandmeyer.com">Bertrand Meyer's technology+ blog</a>.</p>
<img src="http://feeds.feedburner.com/~r/BertrandMeyer/~4/Sv7OObURkoE" height="1" width="1" alt=""/>

<p>I am giving a “distinguished lecture” at the University of California, Santa Barbara, January 10 (Friday, tomorrow) at 14. The title is A<em> Comprehensive Approach to Requirements Engineering</em>.</p>
<p>The abstract and rest of the information are <a href=" https://ccs.ucsb.edu/events/comprehensive-approach-requirements-engineering" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>.</p>
<p>I will spend the last few minutes of the talk discussing other current developments (verification, concurrency).</p>
<p>The post <a rel="nofollow" href="https://bertrandmeyer.com/2020/01/09/talk-requirements-uc-santa-barbara-tomorrow/">Talk on requirements at UC Santa Barbara tomorrow</a> appeared first on <a rel="nofollow" href="https://bertrandmeyer.com">Bertrand Meyer's technology+ blog</a>.</p>
<img src="http://feeds.feedburner.com/~r/BertrandMeyer/~4/FXD36clI4PU" height="1" width="1" alt=""/>

<p>The “Morgenstern Colloquium” at the University of Nice / INRIA Sophia Antipolis invited me to give a talk, next Wednesday (18 December) at 11 in Sophia Antipolis, in the aptly named* “Kahn Building”. The announcement appears <a href="https://iww.inria.fr/colloquium/fr/bertrand-meyer-how-to-build-quality-software-the-eiffel-experience/" target="blog_illustrations"><span style="color: #0000ff; text-decoration: underline;">here</span></a>. I proposed various topics but (pleasant surprise) the organizers explicitly asked me to lecture about what I really want to talk about: the Eiffel approach. I will give a general presentation describing not specifically the language but the unified view of software construction embodied in Eiffel, from modeling to requirements to design, implementation and verification. Here is the abstract:</p>
<blockquote><p><em>With society’s growing reliance on IT systems, the ability to write high-quality software is ever more critical. While a posteriori verification techniques have their role, there is no substitute for methods and tools that provide built-in quality (“correctness by construction”</em>)<em> and scale up to very large systems. For several decades my colleagues and I have been building such a method, based in particular on the concept of Design by Contract, the associated tools and the supporting language, Eiffel. The scope is wide, encompassing all aspects of the software development process, from requirements and design to implementation and verification. I will present an overview of the approach, show what it can yield, and discuss remaining open issues.</em></p>
<p><em>This talk is meant for everyone, whether from industry or academia, with an interest in practical techniques for engineering high-quality software.</em></p></blockquote>
<p>No registration is required. The presentation will be in English.</p>
<h2>Note</h2>
<p>*Gilles Kahn, a brilliant computer scientist who died too young, was for a while director of INRIA.</p>
<p>The post <a rel="nofollow" href="https://bertrandmeyer.com/2019/12/14/wednesday-nice-survey-talk-eiffel-method/">This Wednesday in Nice: survey talk on the Eiffel method</a> appeared first on <a rel="nofollow" href="https://bertrandmeyer.com">Bertrand Meyer's technology+ blog</a>.</p>
<img src="http://feeds.feedburner.com/~r/BertrandMeyer/~4/Chj_M3Vnnmk" height="1" width="1" alt=""/>