The Java Unicode Character Encoding; Character Encoding Conversion Issues - MACROMEDIA COLDFUSION MX 61-DEVELOPING COLDFUSION MX Develop Manual

Developing coldfusion mx applications
Table of Contents

Advertisement

Computers often must convert between character encodings. In particular, the character
encodings most commonly used on the Internet are not used by Java or Windows. Character sets
used on the Internet are typically single-byte or multiple-byte (including DBCS character sets
that allow single-byte characters). These character sets are most efficient for transmitting data,
because each character takes up the minimum necessary number of bytes. Currently, Latin
characters are most frequently used on the web, and most character encodings used on the web
represent those characters in a single byte.
Computers, however, process data most efficiently if each character occupies the same number of
bytes. Therefore, Windows and Java both use double-byte encoding for internal processing.

The Java Unicode character encoding

ColdFusion MX uses the Java Unicode Standard for representing character data internally. This
standard corresponds to UCS-2 encoding of the Unicode character set. The Unicode character set
can represent many languages, including all major European and Asian character sets. Therefore,
ColdFusion MX can receive, store, process, and present text from all languages supported by
Unicode.
The Java Virtual Machine (JVM) that is used to processes ColdFusion pages converts between the
character encoding used on a ColdFusion page or other source of information to UCS-2. The
page or data encodings that ColdFusion supports depend on the specific JVM, but include most
encodings used on the web. Similarly, the JVM converts between its internal UCS-2
representation and the character encoding used to send the response to the client.
By default, ColdFusion MX uses UTF-8 to represent text data sent to a browser. UTF-8
represents the Unicode character set using a variable-length encoding. ASCII characters are sent
using a single byte. Most European and Middle Eastern characters are sent as two bytes, and
Japanese, Korean, and Chinese characters are sent as three bytes. One advantage of UTF-8 is that
it sends ASCII character set data in a form that can be recognized by systems designed to process
only single-byte ASCII characters, while it is flexible enough to handle multiple-byte character
representations.
While the default format of text data returned by ColdFusion is UTF-8, you can have
ColdFusion return a page to any character set supported by Java. For example, you can return text
using the Japanese language Shift-JIS character set. Similarly, ColdFusion can handle data that is
in many different character sets. For more information, see
server output" on page

Character encoding conversion issues

Because different character encodings support different character sets, you can encounter errors if
your application gets text in one encoding and presents it in another encoding. For example, the
Windows Latin-1 character encoding, Windows-1252, includes characters with hexadecimal
representations in the range 80-9F, while ISO 8859-1 does not include characters in that range.
As a result, under the following circumstances, characters in the range 80-9F, such as the euro
symbol (
), are not displayed properly:
A file encoded in Windows-1252 includes characters in the range 80-9F.
ColdFusion reads the file, specifying the Windows-1252 encoding in the
ColdFusion displays the file contents, specifying ISO-8859 in the
379.
"Determining the page encoding of
cffile
tag.
cfcontent
About character encodings
tag.
375

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the COLDFUSION MX 61-DEVELOPING COLDFUSION MX and is the answer not in the manual?

Questions and answers

This manual is also suitable for:

Coldfusion mx

Table of Contents