Let's talk about the encoding library

by Jocelyn Fiat (modified: 2014 Apr 01)

Do you know about the "encoding" library?

Overview

This provides ways to convert a text from an encoding to another, and in addition, it provides a way to print unicode to the console (which is not supported by default by ANY.io.put_string (...)).

Where can I find it ?

  • with EiffelStudio: $ISE_LIBRARY/library/encoding
  • subversion: https://svn.eiffel.com/eiffelstudio/trunk/Src/library/encoding

The main interfaces

SYSTEM_ENCODINGS

This interface provides most used encoding, such as utf8, utf32, ISO-8859-1, ... and also has a convenient way to get the encoding of the system, or the console.

ENCODING

The main interface to convert a text from an encoding to another thanks to the functionconvert_to (a_to_encoding: ENCODING; a_string: READABLE_STRING_GENERAL) -- Convert `a_string' from current encoding to `a_to_encoding'. -- If either current or `a_to_encoding' is not `is_valid', or an error occurs during conversion, -- `last_conversion_successful' is unset. -- Conversion result can be retrieved via `last_converted_string' or `last_converted_stream'. require a_to_encoding_not_void: a_to_encoding /= Void a_string_not_void: a_string /= Void

Converting text from an encoding to another

For instance, if you want to convert a text from UTF-32 to ISO-8859-1 encodingclass TEST inherit SYSTEM_ENCODINGS feature test local s: STRING do utf32.convert_to (iso_8859_1, {STRING_32} "my unicode text") s := utf32.last_converted_string_8 end end

There are a few useful status reports like

  • {ENCODING}.last_conversion_successful: BOOLEAN : to ensure the conversion went well
  • {ENCODING}.last_conversion_lost_data: BOOLEAN : to know if the last conversion lost data (could happen for instance when converting true unicode text to ISO-8859-1).
  • {ENCODING}.last_conversion_string_32: STRING_32 : to get the unicode converted text.

You can also create a custom ENCODING by passing code page, most know are available via CODE_PAGE_CONSTANTS, note that you can use the "i18n" library to get dynamically code page by its name.

Write unicode into the console

Thanks to the class LOCALIZED_PRINTER, it is possible to output unicode into the console, either use localized_print (a_str: detachable READABLE_STRING_GENERAL) or localized_print_error (a_str: detachable READABLE_STRING_GENERAL) (to output in the stderr). It is assuming `a_str' is a UTF-32 string.

Alternative solutions

Note that Eiffel Base includes a UTF_CONVERTER class, that is specialized for UTF-* conversions, and it may be enough for most of an application need, the encoding libraries is still needed for specific encoding, and also to output unicode into the console.

Related library

i18n the Internationalization and localization library

  • i18n stands for InternationalizatioN (I+18 character+N).
  • It provides Internationalization and localization functionalities.
  • Please see $ISE_LIBRARY/library/i18n (or subversion https://svn.eiffel.com/eiffelstudio/trunk/Src/library/i18n )
  • Documentation: http://dev.eiffel.com/Internationalization/User_guide
  • And among others functionalities, it can provide encoding code page value to be used with the encoding library.