A Tool to Obtain Eiffel Code Metrics Across Many Libraries
- Eiffel Code Metrics
In the mainstream we normally hear about lines of code to quantify the extent of a code base. But this makes little sense in Eiffel since Eiffel lines can be very long or very short depending on the personal style of the coder.
I propose measuring the extent of an Eiffel codebase by counting the number of occurrences of identifiers and Eiffel keywords that occur within the body of a routine, i.e. between the do (or once) keyword and the corresponding end (or ensure) at the end of the routine. But ignoring any identifiers and keywords within code blocks defined by the check or debug keyword, since this code is most likely not operational in the finalized executable. Naturally comments, and any quoted text or characters, are also be ignored. I did not consider that numeric constants add much to the overall picture, so these are also not counted.
This method takes into account that contract code does not contribute to the functionality of the codebase, and neither do local declarations or attribute declarations, or inheritance qualifiers. So all of this is not counted.
And so I created a tool to measure this. with a few other metrics thrown in for good measure (pun intended). Some examples outputs are shown below.
Part of the motivation for this project was to test out an idea for a high-speed parsing technique that makes use of "jail-breaked" immutable strings of type IMMUTABLE_STRING_8. Obtaining Eiffel metrics seemed a good application of my idea and could be useful and of interest. Besides I am not happy with the current metrics I am using for the Eiffel-Loop website. I am pleased with the results as the parsing speeds exceeded my expectations.
Links to the source for the core classes for this codebase analyzer are shown in order of inheritance:
The class EIFFEL_SOURCE_READER could also be easily applied to making a tool that could perform a restrictive search for text within the following categories.
- Only within indexing notes
- Only within comments
- Only within quoted strings that are not indexing notes.
Some Example Readings
The following are some examples of the console output of this tool including performance benchmarking information. I edited out the progress bar indicator. One interesting comparative benchmark to pay attention to is the ratio of keywords to identifiers expressed as percentiles, and also the average number of keywords + identifiers per routine.
Eiffel Software Libraries (Ver. 16.05)
Reading manifest: SRC-ISE-libraries
Class count: 4766
Total mega bytes: 26.23
Keyword occurrences: 171912
Identifier occurrences: 473361
Keyword + identifier count: 645273
Routine count: 45414
External routine count: 11531
Average keywords + identifiers per routine: 14
Percentiles keywords:identifiers: 27%:73%
Execution time: 1 secs 405 ms
Previous runs: 2
Average Execution time: 1 secs 839 ms
GOBO Libraries distributed with ES 16.05
Reading manifest: SRC-GOBO-libraries
Class count: 2555
Total mega bytes: 25.18
Keyword occurrences: 279599
Identifier occurrences: 536832
Keyword + identifier count: 816431
Routine count: 29733
External routine count: 800
Average keywords + identifiers per routine: 27
Percentiles keywords:identifiers: 34%:66%
Execution time: 1 secs 126 ms
Previous runs: 3
Average Execution time: 1 secs 661 ms
Reading manifest: SRC-Eiffel-Loop-library
Class count: 2841
Total mega bytes: 5.91
Keyword occurrences: 52690
Identifier occurrences: 119240
Keyword + identifier count: 171930
Routine count: 14883
External routine count: 1042
Average keywords + identifiers per routine: 12
Percentiles keywords:identifiers: 31%:69%
Execution time: 2 secs 78 ms
Previous runs: 1
Average Execution time: 2 secs 56 ms
FORMAT EXT4 VS NTFS
I was wondering why Eiffel-Loop libraries are taking longer to parse despite being only about 6 megabytes. The only explanation I can think of is that they are located on an NTFS partition accessed by Linux, whilst the other code is on an ext4 partition, but both on the same SSD drive. The reason for having an NTFS partition is, it makes cross-platform development easier.