Feedback to 'Parlez-Vous Francais?' - Murphy's Law, Feb' 2001
This column led to considerable feedback - I knew I was not the
only one who had battled with this area. Some of our readers solutions
are described bellow.
return to Murphy's Law
James Pettinato gives a detailed description of his method
of handling char *'s - mostly as a response to Nigel Jones article.
He also describes an interesting feature which allows the sales
reps in each country to update the strings on the products with
no interaction with the original designers - it won't fit every
application, but sounds like it could save a lot of trouble when
it works.
Gentlemen,
I'd like to extend my appreciation for the articles in the Feb
2001 issue of Embedded Systems Programming (Vol 14, #2) related
to the support of multiple languages in embedded systems. This
is a concern that we have wrestled with here for some time, as
many of our products are marketed worldwide. Please excuse the
length of this email, obviously this is a topic that I've been
interested in for some time!
I was pleased to see our group had reached similar solutions
to many of the problems raised by Mr. Jones related to the issues
of translations, such as the use of enums to match the strings
in the array, etc. However, we took an alternate approach to the
problem of run time language swapping. In our case, we were working
from scratch on a new product and had ROM and also RAM to spare
(for a change!). Our approach used the same const * const char[]
construct in ROM as described in the article for the default English
language pointers, but added an array of pointers in RAM which
were initialized to point to the English-language defaults:
enum {
string1,
string2,
.
.
.
LAST_STRING
}
static const char * const default_strings[LAST_STRING+1]
{
"String 1",
"String 2",
.
.
"Last String"
}
static char *stringTable[LAST_STRING+1];
// initialize function called at powerup
// can also be called to reinstate default strings
// via diagnostic menu selection (for servicing)
void strTableInit(void)
{
int i;
for (i=0; i<=LAST_STRING; i++)
stringTable[i] = (char *) default_strings[i];
ASSERT (strcmp(stringTable[LAST_STRING], "Last String") == 0);
}
// GetString()... exported getter function
// (Use if desired to encapsulate stringTable array,
// alternately stringTable could be made global)
char *GetString(int index)
{
ASSERT (index < LAST_STRING);
return stringTable[index];
}
This approach adds one more benefit... it allowed us to implement
'run time' translation updates and additions. In fact, we give
our distributors and customers the ability to change any or all
of the strings in the table using a Windows-based companion application.
This program provides for the actual editing of the string table
using any of character sets displayable on the embedded system,
and provides a mechanism for transferring a new language file
to the device via serial communications. The translation is then
stored in flash memory. The stringTable array is re-initialized
so that any translated (non-NULL) entries in the downloaded table
are used in place of the default string. With this approach, a
subset or all of the table can be translated. Very handy for us...
'do it yourself' translations!
This brings me to the area of most interest to me currently...
handling languages represented best by character sets other than
Latin-1 in embedded systems. I think that we have implemented
a rather novel approach to this problem as well. I will attempt
to briefly characterize the design. We first built a display driver
that allows text drawing in several 'fonts'. The fonts are actually
each a bitmap cache of character glyphs based on a rendering of
a font at a specific point size and weight. A tool I wrote allows
any font displayable by Windows (TrueType or raster) to be used
to produce an embedded font (the tool outputs 'C' source). For
the first release of our latest project, we included a small non-proportional
handcrafted font for the menus and a larger bold font for data
displays, both using the Latin-1 code page. For demonstration
purposes, we also included the equivalent typefaces using code
page PC-866 (the pre-Windows Russian Cyrillic) since we had already
done that font for another project and had the handcrafted version
done. The distributed companion application then provides the
ability for any language that can be represented in Latin-1 or
PC-866 Cyrillic to be represented on our device. Note that the
download does not have to consist of a complete set of strings.
If someone wants to change one string on one menu, they can. Since
often our market (custody transfer of petroleum) is tightly regulated
by local governing bodies, terminology requirements can be stringent.
This flexibility allows local agencies' requirements to be satisfied
without a custom build.
Mr. Murphy described to some extent the difficulty in arranging
translation of source text via external resources. Many of the
described difficulties are avoided (as suggested in the article)
by allowing the translator to see the results of his work. We
also had implemented a PC-based emulation of our display as he
suggests. Our German distributor simply took the emulator and
utility, translated sections of the string table, then downloaded
and verified the appearance on the display, with no intervention
by us. The coordination effort is quite simplified, as the translator
can actually see for himself how the new translations appear in
context. Even if a translator unfamiliar with the product or application
is used, fewer iterations have been required since they still
can 'see' how the translation looks as they progress.
The portion of this scheme that I think really makes this approach
powerful is the Windows companion application that allows for
run time translations. (I may be biased, as I wrote it). The utility
uses the same bitmap cache 'fonts' as the actual display so you
see what you get. Since ISO standard code pages are used, selecting
the proper keyboard mapping using Window's built-in language support
results in the proper characters being inserted by just typing
normally. Additional code pages can be added to the system as
required, although this still requires a firmware change at this
time. It would be possible for even the bitmap cache 'fonts' to
be uploaded real time if that was desired. (The original project
had a separate display processor board to allow for remote display
placement via EIA232, and so transferring the font images would
have been a two-step process, which we decided was not worth the
effort on that particular project.) The companion application
can read out previously downloaded translations, and also does
a nice job of importing translations from a previous revision
and keeping everything in the right location, so it is easy for
users who upgrade their firmware to keep their custom translations
accurate and up-to-date. Strings that are inserted into a new
revision need to be translated during the upgrade process by someone,
of course.
We are now in the process of adding additional code pages (ISO
1251-Cyrillic, ISO 1250-Eastern European, and ISO 1253-Greek)
to the third generation of our flagship product, since our European
distributors are clamoring for language support in those markets.
It will be simply a matter of adding some fonts to the display
firmware and updating the utility to be aware of them. The distributor
will happily handle the translation details if it will sell units
for him. We feel this is a vast improvement over the prior generations,
where custom versions were released for each language, and each
had to be maintained.
Some drawbacks to this approach include the fact that the companion
application must match the revision of the embedded system's firmware
(to assure that the default string tables are synched). Currently
there is no CJK or multibyte support and the display engine also
does not support right-to-left scripts or combining of glyphs,
but our marketing requirements have not forced us to address these
issues as yet. I see no reason why this same architecture would
not work with these features added in as well.
Currently I'm working on updating the font conversion utility
to a 32 bit app (it was originally done as a 16-bit Win3.x app)
and enhancing it to be able to access characters in a Unicode
font from code pages other than the default. Then the utility
will be able to (for example) create a 'Greek' Arial from the
Unicode Arial font rather than having to dig up a Greek version
of Arial produced by someone else that may or may not be mapped
to the ISO standard.
I was also pleased to see Mr. Murphy include Roman Czyborra's
excellent site as a reference... I have found it to be invaluable
on numerous occasions and have referred overseas colleagues to
this site as well.
I am looking forward to Mr. Murphy's coming column on double-byte
character sets since I am sure that it will not be long before
I am asked to address these issues with the emerging Asian market
for our products.
Thanks again, keep them coming!
Jim Pettinato --
James M. Pettinato, Jr.
FMC Measurement Solutions
Smith Meter - Erie Operation
Cliff Smith gives another example of good use of the PC platform
to keep translators in check! His mail follows:
Having recently completed a project in which I was responsible
for obtaining translations, I enjoyed your article in Feb. 2001
'Embedded Systems Programming'.
I ran into all the problems you mentioned when getting translations
for our product which used a 96x48 pixel display and our own proportional
fonts. We 'solved' (if I can say that with a straight face) our
problems with the following strategy.
I wrote a PC app that had access to a database containing the
strings, field specifications, and font specifications. This program
presented the translator with the English string and the space
available (width and number of lines) As the translator entered
the translation, the program kept a running total of space used
so the translator could try different combinations to get the
best translation that would fit the available space.
Also, I made arrangements with the translation agency to have
the translators come to our site to do the translations. This
gave the translators a chance to ask questions and get clarifications
quickly. We had about 540 words and phrases to translate and each
translation was done in one 8 hour day. We are still in the review
process, but the quality of the translations has been very satisfying.
We did have to pay a little extra but it was money well spent.
Not all agencies were willing to do this. Several agencies turned
us down because "they don't work that way". It requires a degree
of trust that the agency will supply quality people, and that
you won't subsequently attempt to cut the agency out.
After having done it this way once, we will never do translations
any other way.
Best regards,
Cliff Smith
Mobile Communications Design Center
SY Wong gives an interesting, if not very directed, response
which visits topics as diverse as programming language choice
and architectures of various bit-widths. The mail follows:
Your language articles in ESP touched a subject of much interest
to me. There were about 20000 Chinese characters in the authoritative
Kwanshi dictionary named after the emperor that commissioned the
work. I dare say a 4000-character set can include the English,
Latin, most commonly used Chinese words/characters plus the very
useful original IBM PC character set of 1980. No more than 16
escape characters can cover the entire ISO 8859 standard.
A bit of history. There never was any reason that computer word
lengths need by powers of 2 or divisible by 8. The 8-bit was originated
by the STRETCH super computer project at IBM in the 50s. The 64-bit
word length was both powers of two and divisible by 8. The 12-bit
column on IBM cards may have similarily influenced the IBM 701
and 704 with 36-bit word size prior to STRETCH. About that time,
I also designed a 36-bit machine as specified by the funding agency
that used Teletype I/O with 6-bit codes which may have influenced
the choice of word-length. The two address (4k words) instruction
included a 12-bit command code. I next designed a 48-bit machine
for the Navy to control displays. 48 bits was chosen because it
is divisible by large number of sub-groups. It was sawed in half
for a drone control system with 24-bit words. CDC later made similar
48- and 24-bit machines commercially. Even earlier, the Institute
for Advanced Study computer, now in the Smithsonian museum, used
40 bits. The binary keyboard was grouped in 4 buttons for one-hand
entry of hex symbols using neons on the computer register visible
through a glass window as display.
My conclusion is that for a 4k basic character set for Chinese
or Japanese, there is no need to slavishly haning on 16-bits of
the ISO standard especially for embedded application-specific
appliances such as a web-email box. You cannot make a box to sell
for less than $100 by wasting memory space, however cheap are
memories. National Semiconductor used to have a 12-bit microcontroller.
If processors cost nothing, language processing can be a series
of fixed translators or filters that eliminates the resource wastefull
multi-tasking operation system.
What do you think about the 4k character subsets for Asian languages?
Trade groups can define compatible subsets by standardizing the
compatibility part. ASCII, Latin and the original IBM PC fonts
of 1980 can be the upper part of 8-bit sub-subsets of the 12-bit
subset.
Several years ago, the Electronic Design editor sent me a few
samples when their Chinese editions first started. They were excellent
in quality. I don't think I will see practical machine translators
in my lifetime. GUI alone may not be optimum for Chinese data
inputs. The system of spelling English sounds (pin-yin) of the
about 100 Chinese phonemes and let the user to select from a row
of similar sounding characters is a tedious process which even
I cannot stomach. Seems a trainable voice recognition of the 100
pin-yins might be achievable with current VR techniques. A bit
extra intelligence using context can let the machine move the
cursor and highlight the most likely character to reduce finger
movements, might also be achievable.
What do you think about the above conjecture?
I use a small safety critical subset of Ada as my hw/sw design
language to define an "almost zero cost" core processor design.
There is a similar subset in IEEE VHDL for IC design. This Ada
subset is really a language to define languages that supports
reusable components and should be also useful in the language
processing field. That subset I use is restricted according to
the Safety Annex in the ISO standard and not C extensions or YAL
(yet another language) so frequently suggested in magazines. Unfortunately
Ada has a bad taste in the software community and subsetting is
generally frowned upon.
Can the CS community ditch the assumption that C/C++ are too
entrenched and impossible to consider better alternatives that
remedies well known C shortcomings?
SY Wong, Tarzana CA
Hello Niall,
I just read your article in Embedded Systems Programming magazine.
I found it quite helpful.
You may already know about Win-Trans, but if not, it is worth
checking out. The feature I have used myself is the ability to
convert .rc files to/from .xls files.
I guess that's only helpful if you're using Microsoft tools
for development. But if you are, Excel supports text entry in
nearly every language you might need.
Best regards,
Rex Baldon Sr.
Software Engineer
Newport Corporation
This note from Eric Lukac-Kuruc confrims everything that my
French teacher ever said about my command of that language!
Hello, Just a few notes about your French translation examples
in Embedded Systems.
In French, "a" used as a conjunction, and not as the verb "avoir"
(to have), requires an accent (à), which is the case in your example.
Moreover, it is not allowed to have "à le" sequence of words.
The contraction for this meaning is "au", so that the sentence
becomes "Bienvenue au gadget".
On the other hand, the translation of "Welcome to this gadget"
is not "Bienvenue au gadget" but "Bienvenue à ce gadget". "Bienvenue
au gadget" would come from "Welcome to the gadget".
French is a tricky language, with traps at every corner. I wish
you a nice day.
Best regards,
Eric Lukac-Kuruc, R&D Manager Klavis Technologies, Belgium
Hello Mr. Murphy,
Thank you for your excellent article on embedded systems translations.
We run into this issue often with our products.
As you noted, translation context is essential. We have also
considered the feasibility of PC prototypes as substitutes for
reference products, but time & schedule constraints have limited
this approach. I look forward to your future articles on this
topic. What we have done is to create templates that show typical
screens, with screen elements clearly identified (e.g. "Menu item",
"Item status", "Help text"). The translator can then understand
the basic structure of the interface, as well as our internal
vocabulary for referring to each item. These templates are not
a substitute for an actual reference product or PC simulation,
but for very little time, they do provide a little context for
the words in a spreadsheet. In the spreadsheet, the words are
identified using the vocabulary on the template.
Another issue we sometimes come across is consistency across
products. Sometimes if we have one translation in one product,
we can re-use the translation again (assuming the original was
error-free). A sort of "translation history" may be a useful starting
point for some translations, but without a good system, it can
be difficult to track, and also may impose unnecessary constraints,
since there's no reason to limit a 30-character interface to the
translation used for a 20-character interface, unless the shorter
translation cannot be improved.
Thanks again for your thoughts. So often, these types of issues
are faced and solved again and again, but the information isn't
shared as freely or clearly as you have done.
Best regards,
Sabrina Yeh
Sony Electronics
San Diego,
California
Niall's reply:
You are right about the importance of being consistent across
products. This can be an awkward issue if you change translation
company, or if the translation company change the translator from
one product to the next (or even from one version of a product
to the next).
On the PC simulations point, I have never built them just
for translations work (and I do not think the effort would be
justified for that alone), but I generally do the prototypes to
allow the user interaction to be investigated before building
the final hardware. I also use the PC prototype to develop some
of the production code. This is a topic I will return to in my
column before the year is out.
|