Let's talk about state machines

by Finnian Reilly (modified: 2015 Jan 13)


Lately I wrote a small utility to augment the editing capabilities of EiffelStudio. At it's core is a simple finite state machine for parsing a files contents at the line level rather than the word level.

Concept of a finite state machine

In computer science, a finite state machine is a programming abstraction used to model the behaviour of a system that has a finite number of states, an elevator or a coffee machine for example. The machine can transit from one state to another, typically triggering some event or action. This change is refered to as a transition. For example the coffee machine detects that the cup is nearly full and stops the flow of liquid. One of the most common examples of a state machine is a code parser.

State machines and Eiffel agents

Eiffel agents are very useful for writing state machine programs and can be used to model both the state of the machine and the transition action. The following class found in Eiffel-Loop is a very simple state machine that is surprisingly useful for many applications. It traverses a linear sequence of data objects, starting in an initial state and finishing either when the state becomes final or when the end of the line has been reached. To change the state of the machine all you have to do is assign the state attribute another agent with an open operand of type G. This defines both the state of the machine and the action to be taken when in that state.

class EL_STATE_MACHINE [G] feature {NONE} -- Initialization make do state := agent final create tuple end feature -- Basic operations traverse (initial: like state; sequence: LINEAR [G]) -- local final_state: EL_PROCEDURE do create final_state.make (agent final) from sequence.start; state := initial until sequence.after or final_state.same_procedure (state) loop call (sequence.item) sequence.forth end end feature {NONE} -- Implementation call (item: G) -- call state procedure with item do tuple.put_reference (item, 1) state.set_operands (tuple) state.apply end final (item: G) -- do end state: PROCEDURE [like Current, TUPLE [G]] tuple: TUPLE [G] end

Some technical points

  • The class EL_PROCEDURE was written expressly for the purpose of detecting whether two procedures represent the same routine using only the 'encaps_rout_disp' pointer attribute. But you do need to remember to force a C compilation in EiffelStudio for it to work.
  • The operands are set using a class tuple attribute rather than state.call ([item]) in order to reduce garbage collection.

A state machine for line processing

A useful implementation of EL_STATE_MACHINE is the following class which can be used to process lines of text from a file or some other source.

class EL_PLAIN_TEXT_LINE_STATE_MACHINE inherit EL_STATE_MACHINE [EL_ASTRING] rename traverse as do_with_lines end feature -- Basic operations do_once_with_file_lines (initial: like state; lines: EL_LINE_SOURCE [FILE]) do do_with_lines (initial, lines) lines.close end end

A utility example

Here is presented a utility illustrating the use of EL_PLAIN_TEXT_LINE_STATE_MACHINE. It operates on an Eiffel source file expanding a shorthand notation for common coding patterns as follows.

1. Expand feature codes

Expand lines beginning with @f xx where xx is a two letter code representing one of the standard feature group catagories: Access, Element Change, etc. Lines beginning with @f {xx will be expanded as feature {NONE} -- <group name> For example: @f {in expands as feature {NONE} -- Initialization === 2. Expand argument assignments === Expand routines containing shorthand for attribute assignments, inserting an assignment line. For example this shorthand code feature {NONE} -- Initialization make (output_file_path:@; input_file_path:@) -- do end
will be expanded as feature {NONE} -- Initialization make (a_output_file_path: like output_file_path; a_input_file_path: like input_file_path) -- do output_file_path := a_output_file_path; input_file_path := a_input_file_path end
=== 3. Expand setter notation === Expand the notation for set attribute procedures so for example this code: feature -- Element change @set bit_rate_per_channel
feature -- Element change set_bit_rate_per_channel (a_bit_rate_per_channel: like bit_rate_per_channel) do bit_rate_per_channel := a_bit_rate_per_channel end
=== 4. Sort features alphabetically === Reorder features within a group so the names will appear in alphabetical order in the EiffelStudio feature view. === Feature editor class === This is code for the top level class that achieves all that using the EL_PLAIN_TEXT_LINE_STATE_MACHINE class. class EIFFEL_FEATURE_EDITOR inherit EL_COMMAND EL_PLAIN_TEXT_LINE_STATE_MACHINE rename make as make_machine redefine call end EL_MODULE_LOG FEATURE_CONSTANTS create make, default_create feature {EL_COMMAND_LINE_SUB_APPLICATTION} -- Initialization make (a_source_path: like source_path) do make_machine source_path := a_source_path create class_header.make (20) create feature_groups.make (8) create class_footer.make (1) create code_line.make_empty end feature -- Basic operations execute local input_lines: EL_FILE_LINE_SOURCE output: EL_PLAIN_TEXT_FILE output_lines: EL_ARRAYED_LIST [READABLE_STRING_GENERAL] do log.enter ("execute") create input_lines.make_latin_1 (source_path) do_once_with_file_lines (agent find_first_feature_block, input_lines) create output_lines.make (class_footer.count + class_header.count + feature_groups.count * 5) across class_header as line loop output_lines.extend (line.item) end across feature_groups as group loop across group.item.header as line loop output_lines.extend (line.item) end group.item.features.sort across group.item.features as l_feature loop l_feature.item.expand_inserts across l_feature.item.lines as line loop output_lines.extend (line.item) end end end across class_footer as line loop output_lines.extend (line.item) end create output.make_open_write (source_path) output.set_encoding_from_other (input_lines) output.put_lines (output_lines) output.close log.exit end feature {NONE} -- State handlers find_first_feature_block (line: EL_ASTRING) do if line.starts_with (Keyword_feature) then feature_groups.extend (create {CLASS_FEATURE_GROUP}.make (line)) state := agent find_first_feature else class_header.extend (line) end end find_first_feature (line: EL_ASTRING) -- find first feature in feature group do if is_feature_declaration (line) then feature_groups.last.features.extend (create {CLASS_FEATURE}.make (line)) state := agent find_next_feature else feature_groups.last.header.extend (line) end end find_next_feature (line: EL_ASTRING) -- find next feature in feature group do if Footer_start_keywords.has (line) then fill_class_footer (line) state := agent fill_class_footer elseif line.starts_with (Keyword_feature) then feature_groups.extend (create {CLASS_FEATURE_GROUP}.make (line)) state := agent find_first_feature elseif is_feature_declaration (line) then feature_groups.last.features.extend (create {CLASS_FEATURE}.make (line)) state := agent find_next_feature else feature_groups.last.features.last.lines.extend (line) end end fill_class_footer (line: EL_ASTRING) do class_footer.extend (line) end feature {NONE} -- Implementation call (line: EL_ASTRING) local white_space, code, label: EL_ASTRING parts: EL_ASTRING_LIST do line.right_adjust if line.starts_with (Feature_abbreviation) then create parts.make_with_words (line) if parts.first ~ Feature_abbreviation and parts.count = 2 then label := "Expanded " label.append (line) line.wipe_out line.append_string ("feature ") code := parts.i_th (2) if code [1] = '{' then line.append_string ("{NONE} ") code.remove_head (1) end line.append_string ("-- ") Feature_catagories.search (code) if Feature_catagories.found then line.append (Feature_catagories.found_item) else line.append (code) end log_or_io.put_labeled_string (label, line) log_or_io.put_new_line end end code_line := line.twin code_line.left_adjust white_space := line.substring (1, line.count - code_line.count) tab_count := white_space.occurrences ('%T') Precursor (line) end is_feature_declaration (line: EL_ASTRING): BOOLEAN -- True if line begins declaration of attribute or routine do Result := tab_count = 1 and not code_line.is_empty end code_line: EL_ASTRING tab_count: INTEGER class_header: EL_ASTRING_LIST class_footer: EL_ASTRING_LIST feature_groups: EL_ARRAYED_LIST [CLASS_FEATURE_GROUP] source_path: EL_FILE_PATH feature {NONE} -- Constants Feature_abbreviation: EL_ASTRING once Result := "@f" end Keyword_feature: EL_ASTRING once Result := "feature" end Keyword_invariant: EL_ASTRING once Result := "invariant" end Keyword_end: EL_ASTRING once Result := "end" end Keyword_note: EL_ASTRING once Result := "note" end Footer_start_keywords: ARRAY [EL_ASTRING] once Result := << Keyword_invariant, Keyword_end, Keyword_note >> Result.compare_objects end end
== A vCard splitter == This example is used for splitting a monolithic addressbook in vCard file format, into one file per contact. This is to satisfy the requirements of some Nokia phones like the Asha 300. class VCF_CONTACT_SPLITTER inherit EL_COMMAND EL_PLAIN_TEXT_LINE_STATE_MACHINE rename make as make_machine end EL_MODULE_LOG EL_MODULE_FILE_SYSTEM create default_create, make feature {EL_SUB_APPLICATION} -- Initialization make (a_vcf_path: EL_FILE_PATH) do make_machine vcf_path := a_vcf_path create output_dir.make (vcf_path.to_string) output_dir.remove_extension File_system.make_directory (output_dir) create record_lines.make (10) end feature -- Basic operations execute local source_lines: EL_FILE_LINE_SOURCE do log.enter ("execute") create source_lines.make (vcf_path) source_lines.set_encoding (source_lines.Encoding_iso_8859, 1) do_once_with_file_lines (agent find_begin_record, source_lines) log.exit end feature {NONE} -- State handlers find_begin_record (line: EL_ASTRING) do if line.starts_with ("BEGIN:") then state := agent find_end_record find_end_record (line) end end find_end_record (a_line: EL_ASTRING) local output_path: EL_FILE_PATH file: PLAIN_TEXT_FILE parts: LIST [EL_ASTRING] first_name, last_name: EL_ASTRING do record_lines.extend (a_line) if a_line.starts_with ("END:") then output_path := output_dir + record_id output_path.add_extension ("vcf") log_or_io.put_path_field ("Writing", output_path) log_or_io.put_new_line create file.make_open_write (output_path) across record_lines as line loop file.put_string (line.item.to_utf8) file.put_character ('%R'); file.put_new_line -- Windows new line end file.close record_lines.wipe_out state := agent find_begin_record elseif a_line.starts_with ("N:") then parts := a_line.split (';') last_name := parts.i_th (1); first_name := parts.i_th (2) last_name.remove_head (2) record_lines.finish record_lines.replace (Name_template #$ [first_name, last_name]) elseif a_line.starts_with ("X-IRMC-LUID:") then record_id := a_line.substring (a_line.index_of (':', 1) + 1, a_line.count) end end feature {NONE} -- Implementation vcf_path: EL_FILE_PATH output_dir: EL_DIR_PATH record_id: EL_ASTRING record_lines: ARRAYED_LIST [EL_ASTRING] feature {NONE} -- Constants Name_template: EL_ASTRING once Result := "N:$S;$S" end end
== EL_ASTRING == Anyone wondering what the EL_ASTRING class is about should read ISO-8859 is dead, long live ISO-8859. Last year I made the brave (or foolish) decision to use it throughout Eiffel-Loop. So far I am happy with the results but I am aware it may turn some people off using Eiffel-Loop, but I hope I am wrong about this. I will make news about EL_ASTRING the subject of another article. == Addendum == I feel I should apologise that this is not immediately downloadable from https://github.com/finnianr/eiffel-loop. This is because I use an NTFS partition for my code on Linux to make it easier to do cross platform development. The downside is I need to run a script to copy the source trees into a Linux partition and fix the file permissions etc. I need to fix this broken script before I can share the code. Will post a comment when this is done.

  • Finnian Reilly (2 years ago 13/1/2015)

    What about UTF-8?

    In anticipation of the question, what if the source is encoded as UTF-8?

    A. EL_FILE_LINE_SOURCE changes it's encoding if it finds a BOM at the file start. EL_PLAIN_TEXT_FILE encodes accordingly writing a BOM if required.