|
What is internationalization?
·
Internationalization allows
software to be adapted to any language and cultural convention.
·
During the
internationalization process, the programmer isolates the parts of a program
that are dependent on language and culture
·
Abbreviated as i18n,
because there are 18 letters between the first "i" and the last "n."
What is localization?
·
Localization is the process
of adapting a program for use in a specific locale.
·
Localization includes the
translation of text such as GUI labels, error messages, and online
help.
·
It
also includes the culture-specific formatting of data items such as monetary
values, times, dates, and numbers.
·
Often abbreviated as
l10n, because there are 10 letters between the "l" and the
"n."
Types of data that vary with region or
language:
-
Messages
-
Labels on GUI components
-
Online help
-
Sounds
-
Colors
-
Graphics
-
Icons
-
Dates
-
Times
-
Numbers
-
Currencies
-
Measurements
-
Phone numbers
-
Honorifics and personal
titles
-
Postal addresses
-
Page layouts
-
Legal
rules, e.g. tax
calculations
-
Encryption techniques
-
Dictionary Sort Order
-
Usually most of the objects you need to isolate
in a
ResourceBundle are String objects. However, not all String objects are
locale-specific. For example, if a String is a protocol element used
by interprocess communication, it doesn't need to be
localized, because the end users never see it
-
Log
file: If a log file is written
by one program and read by another, both programs are using the log file as
a buffer for communication. Then there is no need of translation.On the
other hand, if end users rarely check the log file, the cost of translation
may not be worthwhile
Characteristics of internationalized program:
-
With the addition of localization data, the
same executable can run worldwide.
-
Support for new languages does not
require recompilation.
-
Textual elements, such as status messages and
the GUI component labels, are not hard-coded in the program.
Instead they are stored outside the source code and retrieved dynamically.
-
Culturally-dependent data, such as dates and
currencies, appear in formats that conform to the end user's region and
language.
-
It can be localized quickly.
-
Localization is the process of adapting a
program for use in a specific
locale. Localization includes
the translation of text such as GUI labels, error messages, and online help.
It also includes the culture-specific formatting of data items such as
monetary values, times, dates, and numbers.
Locale Object:
§
A Locale object represents a
specific geographical, political, or cultural region.
§
An
operation that requires a Locale to perform its task is called
locale-sensitive and uses the Locale to tailor information for the
user. For example, displaying a number is a locale-sensitive operation--the
number should be formatted according to the customs/conventions of the user's
native country, region, or culture.
§
If you intend to create
international Java applications, you'll definitely use the java.util.Locale class.
There's no getting around it
§
You create a Locale object
using one of the two constructors in this class:
o
Locale(String language,
String country)
o
Locale(String
language, String country, String variant)
§
The country and variant
codes are optional. When omitting the country code, you specify a null
String.
§
Although the Locale
constructor allows lowercase letters, it promptly converts the code to
uppercases to create the correct internal representation
§
The first argument to both
constructors is a valid ISO Language Code. These codes are
the lower-case two-letter codes as defined by ISO-639.
§
The second argument to both
constructors is a valid ISO Country Code. These codes are the
upper-case two-letter codes as defined by ISO-3166.
§
The second constructor
requires a third argument--the Variant. The Variant codes are
vendor and browser-specific.
§
Because a Locale object is
just an identifier for a region, no validity check
is performed when you construct a Locale.
§
If you want to see whether
particular resources are available for the Locale you construct, you must query
those resources. For example, ask the NumberFormat for the locales it supports using
its getAvailableLocales method.
§
Note:
When you ask for a resource for a particular locale, you get back the best
available match, not necessarily precisely what you asked for.
§
The
Locale class provides a number of convenient constants that you can use to
create Locale objects for commonly used locales. For example, the following
creates a Locale object for the United
States:
Locale.US
§
Once you've created a Locale
you can query it for information about itself.
o
Use
getCountry to get the ISO Country Code and getLanguage to get the ISO Language Code.
o
You can use getDisplayCountry to get the
name of the country suitable for displaying to the user. Similarly, you can
use getDisplayLanguage to get the name of the language suitable for displaying to the
user.
o
Interestingly,
the getDisplayXXX methods are themselves locale-sensitive and have two versions: one
that uses the default locale and one that uses the locale specified as an
argument.
§
The Java 2 platform provides
a number of classes that perform locale-sensitive operations.
o
the NumberFormat class formats
numbers, currency, or percentages in a locale-sensitive manner.
§
NumberFormat.getInstance()
§
NumberFormat.getCurrencyInstance()
§
NumberFormat.getPercentInstance()
o
These methods have two
variants; one with an explicit locale and one without; the latter using the
default locale.
§
A Locale is the mechanism for
identifying the kind of object (NumberFormat) that you would like to get. The locale
is just a mechanism for identifying objects,
not a container for the objects themselves.
§
A variant is an
optional extension to a Locale. Usually you specify variant codes to identify
differences caused by the computing platform
§
The variant codes
conform to no standard. They are arbitrary and specific to your
application.
§
Locale-sensitive classes support only
certain Locale
definitions
§
Although the Java compiler
and run-time environment won't complain if you make up your own language and
country identifiers, you should use the valid codes defined by ISO
standards
§
When the
Java1
Virtual Machine (JVM) starts up, it queries the underlying OS for a
default-locale setting. You can discover your default locale
programmatically.
§
In a Java application, each
locale-sensitive object is responsible for its own locale-dependent behavior.
A Locale object doesn't enforce this behavior; it simply acts as an indicator
to other objects. Those objects are then responsible for using the Locale
appropriately.
§
By design,
locale-sensitive classes are independent of each other. That is,
the set of supported Locales in one class does not need to be the same as the
set in another class.
§
A Java application can
have multiple locales active at the same time. That is, it's possible
to use a French date format and a
U.S.
number format in the same application. Nothing limits you from creating truly
multicultural and multilingual Java applications. You can assign a different
Locale to every locale-sensitive object in your program. This flexibility
allows you to develop multilingual applications, which can
display information in multiple languages.
§
Scope of a
Locale: On the Java platform you
do not specify a global Locale by setting an environment variable before
running the application. Instead you either rely on the default Locale or
assign a Locale to each locale-sensitive object.
Resource Bundle:
·
Resource bundles contain
locale-specific objects. When program needs a locale-specific
resource, a String for example, your program can load it from the resource
bundle that is appropriate for the current user's locale.
·
A ResourceBundle is an example of a
locale-sensitive object.
·
This allows you to write
programs that can:
§
be easily localized, or
translated, into different languages
§
handle multiple locales at
once
§
be easily modified later to
support even more locales
·
One resource bundle is,
conceptually, a set of related classes that inherit from Resource Bundle. Each related
subclass of Resource
Bundle has the same base name plus an additional component that
identifies its locale.
·
Each related subclass of
Resource Bundle contains the same items, but the items have been translated for the
locale represented by that ResourceBundle subclass.
·
In general, the objects
stored in a ResourceBundle are predefined and ship with the product. These objects are
not modified while the program is running
·
When your program needs a
locale-specific object, it loads the ResourceBundle class using the getBundle method:
§
ResourceBundle
my Resources =
ResourceBundle.getBundle("MyResources", currentLocale);t
§
the first argument specifies
the family name of the resource bundle that contains the object in question.
The second argument indicates the desired locale.getBundle uses these two
arguments to construct the name of the ResourceBundle subclass it should load as
follows.
·
The resource bundle lookup
searches for classes with various suffixes on the basis of
§
the desired locale
and
§
the current default locale
as returned by Locale.getDefault(), and
§
the root resource bundle
(baseclass),
In the following order from lower-level (more
specific) to parent-level (less specific):
baseclass + "_" + language1 + "_" + country1 +
"_" + variant1
baseclass + "_" + language1 + "_" + country1 + "_" + variant1 + ".properties"
baseclass + "_" + language1 + "_" + country1
baseclass + "_" + language1 + "_" + country1 + ".properties"
baseclass + "_" + language1
baseclass + "_" + language1 + ".properties"
baseclass + "_" + language2 + "_" + country2 + "_" + variant2
baseclass + "_" + language2 + "_" + country2 + "_" + variant2 + ".properties"
baseclass + "_" + language2 + "_" + country2
baseclass + "_" + language2 + "_" + country2 + ".properties"
baseclass + "_" + language2
baseclass + "_" + language2 + ".properties"
baseclass
baseclass + ".properties"
§
The baseclass
must be fully qualified (for example, myPackage.MyResources, not just MyResources).
It must also be accessible by your code; it cannot be a class that is
private to the package where ResourceBundle.getBundle is called.
§
Resource bundles
contain key/value pairs. The keys uniquely identify a
locale-specific object in the bundle. Here's an example of a ListResourceBundle that
contains two key/value pairs:
class MyResource extends ListResourceBundle {
public Object[][] getContents() {
return contents;
}
static final Object[][] contents = {
// LOCALIZE THIS
{"OkKey", "OK"},
{"CancelKey", "Cancel"},
// END OF MATERIAL TO LOCALIZE
};
}
-
Keys are always
Strings.
In this example, the keys are OkKey and CancelKey. In the above example, the
values are also Strings--OK and Cancel--but they don't have to be.
The values can be any type of object.
§
You retrieve an object from
resource bundle using the appropriate getter method.:
button1 = new Button(myResourceBundle.getString("OkKey"));
§
The getter methods all
require the key as an argument and return the object if found. If the object
is not found, the getter method throws a MissingResourceException.
§
Besides getString;
ResourceBundle supports a number of other methods for getting different types
of objects such as getStringArray. If you don't have an object that matches one of
these methods, you can use getObject and cast the result to the appropriate type.
§
You should always
supply a baseclass with no suffixes. This will be the class of
"last resort", if a locale is requested that does not exist. In
fact, you must provide all of the classes in any given inheritance chain
that you provide a resource for. For example, if you provide MyResources_fr_BE, you
must provide both MyResources and MyResources_fr or the resource bundle lookup
won't work right.
§
The Java 2 platform provides
two subclasses of ResourceBundle, ListResourceBundle and PropertyResourceBundle, that provide a fairly simple way to
create resources. ListResourceBundle manages its resource as a List of key/value pairs.
§
PropertyResourceBundle uses a properties file
to manage its resources.
§
If ListResourceBundle or PropertyResourceBundle do not
suit your needs, you can write your own ResourceBundle subclass. Your subclasses must
override two methods: handleGetObject and getKeys().
§
The keys must be String
objects in ListResourceBundle Object. The keys as well as key values must be
string objects in PropertyResourceBundle Object.
§
You can organize your
ResourceBundle objects according to the category of objects they contain. For
example, you might want to load all of the GUI labels for an order entry
window into a ResourceBundle called OrderLabelsBundle..
o
Advantages: Easier to read
& maintain; load into memory fast; reduce memory usuage by loading the
required bundle.
InputStreamReader
§
An InputStreamReader is
a bridge from byte streams to character streams: It reads bytes
and decodes them into characters using a specified
charset.
§
The charset
that it uses may be specified by name or may be given explicitly, or the
platform's default charset may be accepted.
§
Each
invocation of one of an InputStreamReader's read () methods may cause one or
more bytes to be read from the underlying byte-input stream.
§
To enable the
efficient conversion of bytes to characters, more bytes may be
read ahead from the underlying stream than are necessary to satisfy the
current read operation.
§
For top efficiency, consider
wrapping an InputStreamReader within a BufferedReader. For
example: BufferedReader
in = new BufferedReader(new
InputStreamReader(System.in));
OutputStreamWriter
o
An OutputStreamWriter is a
bridge from character streams to byte streams:
o
Characters
written to it are encoded into bytes using a specified
charset.
T
o
he charset that it uses may
be specified by name or may be given explicitly, or the platform's default
charset may be accepted.
o
Each invocation of a
write () method causes the encoding converter to be invoked on
the given character(s). The resulting bytes are accumulated in a buffer before
being written to the underlying output stream. The size of this buffer may be
specified, but by default it is large enough for most purposes. Note that the
characters passed to the write() methods are not buffered.
o
For top efficiency, consider
wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent
converter invocations. For example:
o
Writer
out = new BufferedWriter(new
OutputStreamWriter(System.out));
o
A surrogate
pair is a character represented by a sequence of two char
values: A high surrogate in the range '\uD800' to '\uDBFF'
followed by a low surrogate in the range '\uDC00' to '\uDFFF'. If the
character represented by a surrogate pair cannot be encoded by a given charset
then a charset-dependent substitution sequence is written to the output
stream.
o
A malformed surrogate
element is a high surrogate that is not followed by a low surrogate or
a low surrogate that is not preceeded by a high surrogate.
o
It is illegal to attempt to
write a character stream containing malformed surrogate elements. The behavior
of an instance of this class when a malformed surrogate element is written is
not specified.
Properties:
·
The Properties class represents a
persistent set of properties.
·
The Properties can be saved to a
stream or loaded from a stream.
·
Each key and its
corresponding value in the property list is a string.
·
Properties file stores
information about the characteristics of a program or
environment including internationalization/localization
information.
·
A properties file is in
plain-text format
·
These keys
must not change, because they will be referenced when your program fetches the
translated text
·
A property list can contain
another property list as its "defaults"; this second property list is searched
if the property key is not found in the original property list.
·
Because Properties inherits from
Hashtable, the put and putAll methods can be applied to a Properties object. Their use
is strongly discouraged as they allow the caller to insert entries whose keys
or values are not Strings. The setProperty method should be used instead.
·
If
the store or save method is called on a "compromised"
Properties object that contains a non-String key or value, the call
will fail.
·
When saving
properties to a stream or loading them from a stream, the ISO 8859-1
character encoding is used. For characters that cannot be
directly represented in this encoding,
Unicode
escapes are used; however, only a single 'u' character is allowed
in an escape sequence.
·
The native2ascii
tool can be used to convert property files to and from other character
encodings.
·
By creating a Properties
object and using the load method a program can read a properties file. The
program can then access the values by using the key as follows:
o
Properties props = new
Properties();
o
props.load(new
BufferedInputStream(new FileInputStream("filename");
o
String value =
System.getProperty("key");
·
Alternatively properties can
be specified on the command line at application startup time,
e.g.
java -Dmy.property=value MyApplication
·
If the key is not
found getProperty returns null.
·
PropertyResourceBundle is
backed up by a set of properties files. ListResourceBundle is backed by a class
file
Package java. text
·
Provides classes and
interfaces for handling text, dates, numbers, and messages in a
manner independent of natural languages. This means your main application or
applet can be written to be language-independent, and it can rely upon
separate, dynamically-linked localized resources. This allows the flexibility
of adding localizations for new localizations at any time.
·
All classes in the java.
text package are Locale sensitive
·
These classes are capable of
o
formatting dates, numbers,
and messages, parsing;
o
searching and sorting
strings;
o
Iterating over characters,
words, sentences, and line breaks.
·
This package contains three
main groups of classes and interfaces:
o
Classes for iteration over
text
o
Classes for formatting and
parsing
o
Classes for string collation
·
A CollationKey represents a String
under the rules of a specific Collator object.
·
The Collator class performs
locale-sensitive String comparison
·
An Annotation
object is used as a wrapper for a text attribute value
if the attribute has annotation characteristics.
·
Use the BreakIterator class only
with natural-language text. To tokenize a programming language, use the StreamTokenizer
class.
·
Unicode
·
Unicode is an international
effort to provide a single character set that everyone can
use.
·
Java uses the
Unicode 2.0 (or 2.1) character
encoding standard.
·
In the Java programming
language char values represent Unicode characters. Unicode is a
16-bit character encoding that supports the world's major
languages
·
In
Unicode, every character occupies two bytes. Ranges of character
encodings represent different writing systems or other special symbols. For
example, Unicode characters in the range 0x0000 through 0x007F represent the
basic Latin alphabet, and characters in the range 0xAC00 through 0x9FFF
represent the Han characters used in
China,
Japan,
Korea,
Taiwan,
and
Vietnam.
·
UTF
is a multibyte encoding format, which stores some characters as
one byte and others as two or three bytes. If most of your data is ASCII
characters, it is more compact than Unicode, but in the worst case, a UTF
string can be 50 percent larger than the corresponding Unicode string.
Overall, it is fairly efficient.
·
Despite the advantages of
Unicode, there are some drawbacks: Unicode support is limited on many
platforms because of the lack of fonts capable of displaying all the
Unicode characters.
·
UTF-8 stands for
Universal Transformation Format, 8-bit encoding form. It
is a transmission format for Unicode that is suitable for use with many
network protocols and UNIX file systems.
Annotation
·
An Annotation object is used
as a wrapper for a text attribute value if the attribute has
annotation characteristics. These characteristics are:
o
The text range
that the attribute is applied to is critical to the semantics of the range.
That means, the attribute cannot be applied to subranges of the text
range that it applies to, and, if two adjacent text ranges have
the same value for this attribute, the attribute still cannot be
applied to the combined range as a whole with this value.
o
The attribute or its value
usually no longer applies if the underlying text is changed.
CollationKey
·
A CollationKey represents a
String under the rules of a specific Collator object.
·
Comparing two CollationKeys
returns the relative order of the Strings they represent.
·
Using CollationKeys to
compare Strings is generally faster than using Collator.compare. Thus, when
the Strings must be compared multiple times, for example when
sorting a list of Strings. It's more efficient to use
CollationKeys.
·
You can not create
CollationKeys directly. Rather, generate them by calling
Collator.getCollationKey.
·
You can only compare
CollationKeys generated from the same Collator object.
·
Generating a CollationKey
for a String involves examining the entire String and converting
it to series of bits that can be compared bitwise. This allows
fast comparisons once the keys are generated.
·
The cost of generating keys
is recouped in faster comparisons when Strings need to be compared many
times.
·
Collator.compare
examines only as many characters as it needs which allows it to
be faster when doing single comparisons.
Collator
·
The Collator class performs
locale-sensitive String comparison.
·
Use this class to build
searching and sorting routines for natural language text.
·
Collator is an
abstract base class. Subclasses implement specific collation
strategies. You can use the static factory method, getInstance,
to obtain the appropriate Collator object for a given
locale.
·
The Character comparison methods
use the Unicode standard to identify character properties.
Character
Encoding:
A character encoding is a
mapping between characters and code values.
Input
method:
·
Lets users enter thousands
of different characters using keyboards with far fewer keys.
·
the user may have input
methods for different languages or input methods that accept various types of
input
·
Input method
framework: enables all text
editing components to receive Japanese, Chinese, or Korean text input through
input methods.
|
Scenario
|
Solution
|
|
You need to find a localized value for a
given key, for example, an error message
|
Use java.util.Properties to load
values from a stream(e.g. a java.io.FileInputStream) and then use
a singlelookup key to obtain a localized value
|
|
You need to format and present numbers and
currencies.
|
Use java.text.NumberFormat.
|
|
You need to format and present dates and
times
|
Use java.text.DateFormat.
|
|
You need to order and handle text
data.
|
Use Collator and CollationKey
for ordering and MessageFormat, ResourceBundle,
orPropertyResourceBundle to handle text.
|
|
You need to read and write files.
|
Use InputStreamReader for reading
and
OutputStreamWriter
for writing.
|
|
You need to create localized JSPs.
|
Use Locale,
contentType, and pageEncoding
attributes. You need to create localized servlets. Use Locale
and ServletResponse.setContentType() and
ServletResponse.setLocale() methods
|
|
You are developing an application that will
only execute in a single and very narrow geographic location.
|
There is no need to develop the
application
Using Javas internationalization
feature.
|
|
You are creating an application for a
company with offices in several countries and time zones. Where
possible, the application needs to adapt its functionality and
presentation to local customs and language.
|
Use Javas internationalization feature to
develop this application.
|
|
Converting byte stream to character stream
(or) locale sp encoding to Unicode
|
InputStreamReader
|
|
Converting Character streams to Byte
Streams (or) Unicode to
regional specific encoding
|
OutputStreamWriter
|
|
Locale independent
string/character comaprisions/sort
|
Use Collator Object
|
|
For repeated searching and sorting of
strings
|
Use Collation Key Class
|
|
To Isolate localizable elements from the
rest of the application.
|
ResourceBundle Object
|
|
contains String objects that need to be
translated into various languages
|
Use PropertyResourceBundle object
|
|
format a compound message in
a locale-independent manner
|
construct a pattern that you apply to a
MessageFormat object and store this pattern in a ResourceBundle.
|
|
To detect character, word, sentence and
line boundaries
|
BreakIterator Class
|
java.text.NumberFormat
-
Provides support for parsing/formatting
numbers, currency and percentages in a locale-specificmanner using
pre-defined patterns
-
NumberFormat.getNumberInstance (LOCALE).format
(NUM)
-
NumberFormat.getCurrencyInstance
(LOCALE).format (NUM)
-
NumberFormat.getPercentageInstance
(LOCALE).format (NUM)
java.text.DecimalFormat
-
Provides support for custom
parsing/formatting of numbers using format patterns
-
# is used to specify digits, , for grouping
and . for decimal points
-
?0 is used to specify digits with leading
zeros
-
123456.789 with pattern of 0000,###.##
results in 0123,456.79
-
output symbols can be changed e.g. . can be
rendered as any requested character
java.text.DateFormat
-
Provides support for parsing/formatting dates
and times in a locale-specific manner using predefinedpatterns. Len
of output can be controlled e.g. DEFAULT, SHORT, MEDIUM, LONG, FULL
-
DateFormat.getDateInstance (DateFormat.DEFAULT,
LOCALE).format (DATE)
-
DateFormat.getTimeInstance (DateFormat.DEFAULT,
LOCALE).format (DATE)
-
df.getDateTimeInstance (DateFormat.DEFAULT,
DateFormat.DEFAULT, LOCALE).format (DATE)
java.text.SimpleDateFormat
-
Provides support for custom
parsing/formatting of dates/times using format patterns
-
E.g. pattern dd/MM/yy HH:mm:ss results in
06/03/02
02:06:30for
correct rendering of dates and times, use locale + pattern (pattern on its
own could leads toinconsistent formatting in other languages)
-
date symbols can be changed (e.g. Mon can be
changed to MON)
java.text.MessageFormat
-
provides support for template based rendering
in a locale-specific manner using a pattern string and an array of arguments
similar to placeholders in SQL PreparedStatement
java.text.BreakIterator
-
provides support for identifying breaks (by
character, word, sentence or line) in text in a localespecific manner
-
getCharacterInstance (), getWordInstance (),
getSentenceInstance (), getLineInstance ()
-
BreakIterator.first (), BreakIterator.next (),
while (BreakIterator.next () != BreakIterator.DONE)
|