Software Internationalization: Software Development Before Localization

Changes to apply to your code to simplify the localization (l10n) process

Internationalization (i18n)Given the constant competitive pressure on executives to expedite product time-to-market, many developers are given tight deadlines to deliver functional software. This software is often geared for localization once the source language version is ready for release. But is the software internationalized?

Keeping these pressures in mind, developers can strive to ensure that basic software internationalization (i18n) principles are maintained while developing software to facilitate localization efforts – and meet time-to-market requirements for all the required languages, not just the source.

Here are 12 software i18n do’s and don’ts that all developers should read and apply in their work:

1. Do externalize messages in Message Catalogs, resource files, and configuration files

Messages are textual objects that are translatable components. These catalogs or files, such as Java resource bundle message files or Microsoft resource files, are installed in a locale-specific location or named with a locale-specific suffix.This i18n practice will facilitate the localization process, since localizers can work on these resource bundles without the need to modify source code. It will also permit the use of a single source code for all languages, where only the resource bundles will have different language flavors.

2. Don’t internationalize fixed textual objects

These are objects that you should not translate, such as comments, commands, and configuration settings. Only externalize the strings needing translation. If these objects appear in resource or configuration files, you should mark them “NOT_FOR_TRANSLATION.” Here are some examples of fixed textual objects not requiring i18n:

  • User names, group names, and passwords
  • System or host names
  • Names of terminals (/dev/tty*), printers, and special devices
  • Shell variables and environment variable names
  • Message queues, semaphores, and shared memory labels
  • UNIX commands and command line options (e.g., ls -l is still ls -l in all locales)
  • Commands such as /usr/bin/dos2unix and /usr/ccs/bin/gprof
  • Commands that are XPG4-compliant (in /usr/xpg4/bin/vi) and have equivalent non-XPG4 commands; non-XPG4 commands that are not fully internationalized. For example, /usr/bin/vi does not process non-EUC codesets, but /usr/xpg4/bin/vi is fully internationalized and can process characters in any locale.
  • Some GUI textual components, such as keyboard mnemonics and keyboard accelerators

3. Do allow for text expansion in messages (especially for GUI items)

Here are some Microsoft translations into German opens in a new window:

  • bullet    –>  Aufzählungszeichen
  • bundle  –>  Einzelvorgangsbündel
  • Link      –>  Verknüpfung
  • Login    –>  Anmeldung
  • Update –>  Aktualisierung
  • Undo     –>  Rückgängig (machen)
  • Geschäftsaktivitätsüberwachung replaces the acronym BAM (Business Activity Monitoring)!

Apply the following expansion rules when possible during i18n. When the source text is:

  • 0 – 10 characters: The expansion required is from 101 – 200%.
  • 11 – 20 characters: 81 – 100%
  • 21 – 30 characters: 61 – 80%
  • 31 – 50 characters: 41 – 60%
  • 50 – 70 characters: 31 – 40%
  • Over 70 characters: 30%

But keep the string length well below your limit (usually 254 characters) to account for the extra characters needed.

Try to place the labels above the controls, not beside them. The expansion of a label can increase the width of the form more than the expected resolution, which will force horizontal scroll bars or cause truncation. This also simplifies localizing applications required into bidirectional languages (languages that are read from different directions [RTL or LTR], such as Arabic opens in a new windowand Hebrew).

4. Don’t use variables when you can avoid them

Variables create questions in the translator’s mind as to the gender of the term to substitute, making it difficult to correctly translate the sentences that incorporate it. If variables are to be used, offer a list of replacements. Also allow for gender and plurals variations in the translation of the sentences that incorporate the variable.

Incorrect example

if err = 400
errtext = “server”
else
errtext = “connection”
end if
<P> The <%=errtext%> is currently unavailable </P>

While this displays grammatically correct sentences in English opens in a new window, the translation in French will be problematic. In French, the word “server” is masculine, while the word “connection” is feminine. The translator cannot use the correct translation for the article “the” based on the translation of the differing genders of server and connection.

Correct Example

if err = 400

<P> The server is currently unavailable </P>
else
<P> The connection is currently unavailable </P>
end if

At the same time and for similar reasons, don’t use composite strings. A composite string is an error message or other text that is dynamically generated from partial sentence segments and presented to the user in full sentence form. Use complete sentences instead, even at the expense of repeating segments. This will ensure the accuracy of the translation, regardless of gender, plurality, conjugation, or sentence structure.

Also, avoid using the same placeholders when using multiple variables in the same string, since the sentence structure does change in different languages. For example, <Total %s, %s of %s> (as in Total 5, 1 of 5) might read “5 of 1, Total 5” in the translated text. Instead, use numbered placeholders (e.g., “Total %1, %2 of %3”).

5. Do perform pseudo-translation

Pseudo-translation is the process of replacing or adding characters to your software strings to detect character encoding issues and hard-coded text remaining in the source files.Here’s an example of a few strings from a C resource file, with their respective pseudo-translations in Japanese opens in a new window:

IDS_TITLE_OPEN_SKIN “Select Device”
IDS_TITLE_OPEN_SKIN “日本Slct Dvcイ本日“IDS_MY_OPEN “&Open”
IDS_MY_OPEN “日本&Opn

In these strings, Japanese characters replace the vowels in all English words. After compilation, testers can easily detect corrupt characters (junk characters replacing the Japanese characters) or strings that remain fully in English (source strings still embedded in the code).

6. Don’t use IF Conditions or rely on a sort order in your code to evaluate a string value

For example, avoid (IF Gender = “Male” THEN). Always depend on enumeration or unique IDs.

7. Do use Unicode functions and methods to support all scripts

Applications that store and retrieve text data need to accept and display the characters from any given language. Using Unicode encoding solves the problem of unsupported character sets and the display of junk characters.

8. Don’t insert hard carriage returns in the middle of sentences

Translation memory tools key off hard returns and assume that the sentence has ended. Inserting them in the middle of a sentence leads to incomplete sentences in the translation database and corrupts the sentence structure in the target language files. Instead, replace hard returns with soft returns (or better yet, use a break tag of some sort, such as <BR>). Also be aware that sentence structures change in different languages, as well as the length of sentence parts. So, additional breaks may be needed in target languages.

9. Do choose your third-party software provider carefully

Insist they support Unicode and comply with the above internationalization (i18n) practices. Often problems are encountered with third-party software, and the fact that you don’t have control over their code to fix the problems makes the localization tasks particularly difficult.

10. Don’t use text in icons and bitmaps

The translated text may be too long to fit. Also, avoid using symbols with cultural connotations and locale-specific idioms.

11. Do use long dates or month abbreviations instead of numbers when identifying dates

Month vs. day orders in different parts of the world vary (e.g., mm/dd/yy in the US; dd/mm/yy in Europe).

12. Don’t alphabetically sort strings in string tables and resource bundles

Try to offer as much context as you can with the externalized strings. This will help the translator better adapt the translation to that context. If context is non-existent, run-time QA will take much longer to correct the translations. For example: “Update” could be the action (to update) or the software itself. “Check” in a financial software could be the action (noun or verb), or the monetary equivalent. “Email” could be a verb or a noun.

Software Internationalization

Following these simple software internationalization principles will expedite product localization and reduce testing, rework, and quality assurance costs – ultimately allowing you to meet the strict time-to-market requirements expected from companies selling products worldwide.

To get proactive assistance in addressing the above software i18n issues during product localization as well as any technical translation services, contact our localization experts.


Going Global Whitepaper

Whitepaper Going Global on a Shoestring?!

Want to grow your business outside of your current markets? Request our complimentary whitepaper to learn more about how you can achieve international market penetration incrementally and judiciously.
…….Enabling Globalization eBook

Enabling Globalization: A Guide to Using Localization to Penetrate International Markets

Executives come to us with a desire to go global but unsure of what the process entails. In this eBook, you will find the practical advice you need to start on your way and follow through to a successful finish. Sold online everywhere including Amazon and iTunes.