Quick benchmark on loops

by Tao Feng (modified: 2011 May 11)

Contents

Source Code
1. Result 6.8.8.5940
Source Code
1. Result 6.8.8.5940

This is a quick benchmark result on loops: across as loop end and from untill loop end

Source Code

note description : "across_benchmark application root class" date : "$Date$" revision : "$Revision$" class APPLICATION inherit ARGUMENTS create make feature {NONE} -- Initialization make -- Run application. local l_array: ARRAYED_LIST [INTEGER] l_n: INTEGER l_time1, l_time2: TIME do l_n := 10_000_000 -- Across as loop end create l_array.make_filled (l_n) print ("Across " + l_n.out + " started: ") create l_time1.make_now across l_array as l_cursor loop l_cursor.item.do_nothing end create l_time2.make_now print (l_time1) print ("%NAcross " + l_n.out + " ended: " + l_time2.out) print ("%NDuration: " + (l_time2.relative_duration (l_time1)).fine_seconds_count.out) -- From until loop end create l_array.make_filled (l_n) print ("%NFrom until " + l_n.out + " started: ") create l_time1.make_now from l_array.start until l_array.after loop l_array.item.do_nothing l_array.forth end create l_time2.make_now print (l_time1) print ("%NFrom until " + l_n.out + " ended: " + l_time2.out) print ("%NDuration: " + (l_time2.relative_duration (l_time1)).fine_seconds_count.out) end end

Result 6.8.8.5940

Frozen Code (Default inlining)

 Across 10000000 started: 5:24:38.976 PM
 Across 10000000 ended: 5:25:13.118 PM
 Duration: 34.141999999999825
 From until 10000000 started: 5:25:13.166 PM
 From until 10000000 ended: 5:25:32.363 PM
 Duration: 19.197000000000116

Finalized Code (Default inlining)

 Across 10000000 started: 5:26:17.000 PM
 Across 10000000 ended: 5:26:17.638 PM
 Duration: 0.63799999999901047
 From until 10000000 started: 5:26:17.649 PM
 From until 10000000 ended: 5:26:17.838 PM
 Duration: 0.18800000000192085

Frozen Code (inlining size: 10)

 Across 10000000 started: 8:06:53.981 AM
 Across 10000000 ended: 8:07:24.103 AM
 Duration: 30.121999999999389
 From until 10000000 started: 8:07:24.152 AM
 From until 10000000 ended: 8:07:41.249 AM
 Duration: 17.096000000001368

Finalized Code (inlining size: 10)

 Across 10000000 started: 8:09:04.666 AM
 Across 10000000 ended: 8:09:05.277 AM
 Duration: 0.6099999999969441
 From until 10000000 started: 8:09:05.290 AM
 From until 10000000 ended: 8:09:05.426 AM
 Duration: 0.13599999999860302

Conclusion

from...until is much faster than across... in this release.
With inlining size 10, from...until gains more points.
The proportion of the instructions being iterated influences the result. The result is more close to the fact when the proportion of the instructions being iterated is close to 0%. I can infer when the proportion is large, the difference of from...until and across... can be ignored.

Source Code

The following code removed the `do_nothing' call in the loop, to evaluate how it could impact.note description : "across_benchmark application root class" date : "$Date$" revision : "$Revision$" class APPLICATION inherit ARGUMENTS create make feature {NONE} -- Initialization make -- Run application. local l_array: ARRAYED_LIST [INTEGER] l_n: INTEGER l_time1, l_time2: TIME l_sum: INTEGER do l_n := 10_000_000 -- Across as loop end create l_array.make_filled (l_n) print ("Across " + l_n.out + " started: ") create l_time1.make_now across l_array as l_cursor loop l_sum := l_sum + l_cursor.item end create l_time2.make_now print (l_time1) print ("%NAcross " + l_n.out + " ended: " + l_time2.out) print ("%NDuration: " + (l_time2.relative_duration (l_time1)).fine_seconds_count.out) -- From until loop end create l_array.make_filled (l_n) print ("%NFrom until " + l_n.out + " started: ") create l_time1.make_now from l_array.start until l_array.after loop l_sum := l_sum + l_array.item l_array.forth end create l_time2.make_now print (l_time1) print ("%NFrom until " + l_n.out + " ended: " + l_time2.out) print ("%NDuration: " + (l_time2.relative_duration (l_time1)).fine_seconds_count.out) end end

Result 6.8.8.5940

Frozen Code (Default inlining)

 Across 10000000 started: 8:30:43.087 AM
 Across 10000000 ended: 8:31:13.166 AM
 Duration: 30.080000000001746
 From until 10000000 started: 8:31:13.215 AM
 From until 10000000 ended: 8:31:30.012 AM
 Duration: 16.795999999998457

Finalized Code (Default inlining)

 Across 10000000 started: 8:31:36.646 AM
 Across 10000000 ended: 8:31:37.293 AM
 Duration: 0.64700000000084401
 From until 10000000 started: 8:31:37.304 AM
 From until 10000000 ended: 8:31:37.493 AM
 Duration: 0.18799999999828287

Frozen Code (inlining size: 10)

 Across 10000000 started: 8:18:20.350 AM
 Across 10000000 ended: 8:18:50.296 AM
 Duration: 29.945999999999913
 From until 10000000 started: 8:18:50.351 AM
 From until 10000000 ended: 8:19:07.328 AM
 Duration: 16.976000000002387

Finalized Code (inlining size: 10)

 Across 10000000 started: 8:19:55.654 AM
 Across 10000000 ended: 8:19:56.265 AM
 Duration: 0.6110000000007858
 From until 10000000 started: 8:19:56.277 AM
 From until 10000000 ended: 8:19:56.406 AM
 Duration: 0.12900000000081491

Conclusion

'+' operator and an assignment are almost the same as `do_nothing' call.

Comments

David Le Bansais (13 years ago 9/5/2011)
Thank you for posting this kind of information, it's always helpful to know all consequences of a coding decision.

On the same subject, I posted some time ago.
Larry Rix (11 years ago 10/4/2013)
Across Vs. From Loops

Testing of some of our own local code, which was created using the across loop and then compared with the from loop shows that the across is only slightly slower than the from (e.g. 1.3x to 1.6x and not the 4.5x demonstrated above). We have not performed an extensive or exhaustive test, but our initial conclusion is that the content that the loop is operating on plays some role.

Below is an example of the from-loop version. Below it is the across-loop version. children: ARRAYED_LIST [C] -- Access to `internal_children'. --| This is NOT the managed list of children --| itself and won't remain current. local l_internal_children: like children l_child: C do l_internal_children := internal_children create Result.make (l_internal_children.count) from l_internal_children.start until l_internal_children.after loop l_child := l_internal_children.item if (not l_child.is_deleted) and then (not is_child_filtered (l_child)) then Result.extend (l_child) end l_internal_children.forth end ensure result_not_internal_children: Result /= internal_children end

And the across ... children: ARRAYED_LIST [C] -- Access to `internal_children'. --| This is NOT the managed list of children --| itself and won't remain current. local l_internal_children: like children l_child: C do l_internal_children := internal_children create Result.make (l_internal_children.count) across l_internal_children as ic_children loop l_child := ic_children.item if (not l_child.is_deleted) and then (not is_child_filtered (l_child)) then Result.extend (l_child) end end ensure result_not_internal_children: Result /= internal_children end

Again, the results from the profiler on this code are:

Across-loop version Call count Descendants Total Percentage Secs per Call Sum Avg 70 0 0.124801 0.124801 1.159 0.001782871 70 0 0.0312 0.0312 0.295 0.000445714 70 0 0.093601 0.093601 0.889 0.001337157 0.003565743 0.001188581 From-loop version Call count Descendants Total Percentage Secs per Call Sum Avg Diff (Nx) 70 0 0.0312 0.0312 0.304 0.000445714 70 0 0.093601 0.093601 0.889 0.001337157 84 0 0.0312 0.0312 0.293 0.000371429 0.0021543 0.0007181 1.6551747x

As stated, the sample is too small to really get a sense of the differences, but as-is, the differences are not significant when compared to the readability of the code using the two versions of looping: across or from.
- Manu (11 years ago 10/4/2013)
  It is clear that there is an overhead to using across compared to using a simple loop as it first creates an iterator object. In addition, the iterator calls to traverse are causing more dynamic dispatch than the version of a loop using an container that is not polymorphic. We are working on those aspects to provide iterators where this becomes less of an issue. We believe that for ARRAYED_LIST we can get something that is faster than a traditional loop.
  
  Note that I would not use the output of the profiler to say one is faster than the other, because profiling actually modifies the generated code and thus any potential optimizations. Profiler is a great tool to figure out where things get slower, but not for benchmarking.