Novell NETWARE 6-DOCUMENTATION Manual page 2145

Table of Contents

Advertisement

Unicode and UTF8
216 Getting Results with Novell Web Services
encodings identified in the SearchServlet and PrintServlet properties files.
You can modify these settings using NetWare Web Search Manager.
Because most languages have several encodings that their character sets are
identified by, NetWare Web Search Server supports a wide variety of character
set encodings and encoding aliases.
Some examples of character set encodings include iso-8859-1, shift_jis, big5,
and latin2. The official list of registered encodings is available from the
Internet Assigned Numbers Authority (see
the official names for character sets that can be used in the Internet and can be
referred to in Internet documentation. However, not all IANA-registered
character set encodings are supported by NetWare Web Search Server. Refer
to
Table 16 on page 222
supported by NetWare Web Search Server.
Unicode is a 16-bit character encoding standard developed by the Unicode
Consortium. By using two bytes to represent each character, Unicode enables
almost all of the written languages of the world to be represented using a
single character set. Unicode does not require any special processing to access
any character in any language.
This makes Unicode very easy to use when processing text from multiple
languages and scripts. This is the reason NetWare Web Search converts all
external files into Unicode for processing.
As already mentioned, Unicode is two bytes wide for all characters. Although
this is ideal for computer processing, it doubles the size of all single-byte
languages. This has a significant impact on Internet performance. For this
reason, NetWare Web Search also supports an alternate representation of
Unicode known as UTF-8. UTF-8 is a Unicode Transformation Format that
uses sequences of 1 to 6 bytes to represent all the characters in the Unicode
standard. Most notably, ASCII characters are transmitted without any
conversion at all. This means that most Internet content is already in the UTF-
8 representation. Many Asian languages, however, require three bytes per
character in the UTF-8 format. Other languages can require up to six bytes to
represent each of their characters.
You will have to decide if Unicode or UTF-8 best meets your needs when
creating HTML content, Web Search templates, or search pages.
Table 16 on page
for a list of encodings and encoding aliases that are
222). These are

Advertisement

Table of Contents
loading

This manual is also suitable for:

Netware 6

Table of Contents