White Paper: Internationalization in GTK+


< Prev Contents  

Future Directions

The current internationalization facilities in GTK+ are sufficient for creating applications that work well for a wide range of languages - the languages of both Western and Eastern Europe (including Cyrillic and Greek) and of East Asia (Chinese, Japanese, and Korean.) However, there are still a considerable range of languages that are not covered. Most prominently, currently released versions of GTK+ cannot handle languages where the primary writing direction is right-to-left, such as Arabic and Hebrew. Handling these languages is a challenge, because texts usually mix together text read from right-to-left and from left-to-right, so a complicated reordering process is needed to take the input text and display it on the screen.

Figure 2. Transformations while displaying complex-text languages.

Another class of languages for which support is currently being developed are the so-called complex text languages. In the writing systems of South and South-East Asia, when letters are put together, they combine to form clusters which can differ considerably in shape from the original letters. See Figure 2.

To support these scripts, and also to make it easier for application developers to fully use the support GTK+ already has for international scripts, GTK+ will be moving to using Unicode to encode all strings, instead of the current system where the encoding is chosen per locale. Because the encoding is the same for all locales, code to manipulate strings is easier to write and more efficient. In addition, the conversion will improve interoperability with the many other systems that are currently standardizing on Unicode.

Figure 3. Proposed future architecture of internationalization in GTK+.

Because the rules for forming the writing for each different script are complex, it is not desirable to build all the necessary intelligence into GTK+ directly. Instead we will use modules that contain all the intelligence necessary. A module will be written for a language or group of related languages and will contain the necessary knowledge to input, process, and output the text for that language. Actually, each module will be composed of multiple parts, so that the portions of code specific to one toolkit or output device can be separated out from the portions that can be shared between in a system-independent manner. The proposed architecture is shown in Figure 3.

GTK+ already provides facilities to allow developers to internationalize their applications for a wide range of languages. When the above changes are complete, GTK+ will be able to handle all of the worlds languages in a sophisticated and flexible manner.


< Prev Contents