We have an Eiffel for .NET dll which is called by a VB.NET application. We need to internationalize this application. "That should be easy", I thought, "because Eiffel's STRING is basically just a wrapper for the .NET String." So we fed our Eiffel dll some UTF-8 data, which it dutifully manipulated; but the VB.NET client application received garbage. Instead of displaying beautiful Farsi characters, it spat out wingding-dingbat-like droppings. What was wrong?
The Eiffel STRING class is not natively Unicode, but it can encode UTF-8 as a sequence of bytes.
It seems that STRING is heavily byte-oriented, even in .NET. I guess this is just something the Eiffel Software team hasn't got around to doing properly yet in EiffelStudio 5.7. How could I convert this sequence of bytes, supplied by STRING, into a true Unicode String? And could I do this without having to examine and modify all of our Eiffel classes, on the one side, and our VB classes, on the other side?
The answer lay in SYSTEM_STRING_FACTORY. This is a helper class for STRING; it converts to and from the .NET String. By modifying this class, I could transform those UTF-8 bytes into meaningful characters, with the help of the Gobo class UC_UTF8_STRING. The first step was to copy SYSTEM_STRING_FACTORY from the
base.kernel.dotnet cluster to an override cluster of my own making.
Then I edited my override version of the class. The original version of the from_string_to_system_string function converts the bytes in the Eiffel STRING l_str8 to the .NET String Result with this one line of code:
create Result.make (l_str8.area.native_array, 0, a_str.count)
Now I really don't understand why this doesn't work. The native_array is declared as NATIVE_ARRAY [CHARACTER], and CHARACTER is simply a .NET Character as far as I can tell, which is a Unicode character. So why doesn't creating a Unicode string from an array of Unicode characters produce true Unicode text, rather than a mash of wingdings? I don't know. To fix it, however, I replaced the above line with this:
create utf8.make_from_utf8 (l_str8)
Note that utf8 is declared as UC_UTF8_STRING. Then I looped through utf8, copying each Unicode character to l_str via a .NET StringBuilder (using almost exactly the same code that the original version of SYSTEM_STRING_FACTORY uses to copy 32-bit strings, by the way). The full source for my version of SYSTEM_STRING_FACTORY is attached.
This single change was sufficient for allowing VB client classes to display the strings that our Eiffel libraries create. No change was required to any VB code. Each of the hundreds of places in our VB code that called STRING.to_cil now automatically did the conversion with the help of my version of SYSTEM_STRING_FACTORY.from_string_to_system_string, and so our application displayed proper Farsi text.
But this wasn't enough to handle converting the other way. Passing VB strings to our Eiffel libraries still didn't work: I had to fix the creation routine STRING.make_from_cil. This was achieved by modifying the SYSTEM_STRING_FACTORY.read_system_string_into command, which converts the .NET String a_str to the Eiffel STRING l_str8 with this line of code:
a_str.copy_to (0, l_str8.area.native_array, 0, a_str.length)
Once again, I'm not sure why this doesn't work; but it doesn't. To fix it, I replaced the above line with two loops:
i := 0
nb := a_str.length
i = nb
utf8.append_character (a_str.chars (i))
i := i + 1
i := 1
nb := utf8.byte_count
l_str8 ?= a_result
i > nb
l_str8.append_character (utf8.byte_item (i))
i := i + 1
So now make_from_cil and to_cil handle UTF-8 properly. In each case, they copy the contents of the input string to an instance of UC_UTF8_STRING, which is then copied to the result. This implementation is no doubt inefficient, but it does seem to be working ok. It requires minimal change on the Eiffel side (only one class, SYSTEM_STRING_FACTORY, has been overridden); and it's completely transparent, as far as I can tell, on the VB side.
Before arriving at this approach, I had several false starts. I tried doing all of the conversion on the VB side, but that required modifying hundreds of lines of code. I tried mapping the Eiffel STRING class to STRING_32, but that didn't even compile (and it probably wouldn't have worked anyway); and I tried making VB work directly with a .NET-enabled override of UC_UTF8_STRING, but that blew up at run time with type-cast errors in STRING.is_equal (CAT-calls, I think).
So all in all, I'm happy that it seems to be working; but I'm unhappy that it took a lot of work to figure out how to do it. I'm looking forward to the day when Unicode in Eiffel is as easy as it is in C# and VB.
Attached is my override of SYSTEM_STRING_FACTORY.
Note (May 17, 2007): I've written a follow-up to this at UTF-8 in .NET, revisited, including a new override of SYSTEM_STRING_FACTORY.