Monday, March 9, 2009

10/3/09 UNICODE

Unicode
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article contains special characters. Without proper rendering support, you may see question marks, boxes, or other symbols.

The Unicode Standard, Version 5.0
Unicode
Character encodings
Comparison
UTF-7, UTF-1
UTF-8, CESU-8
UTF-16/UCS-2
UTF-32/UCS-4
UTF-EBCDIC
SCSU, BOCU-1
Punycode (IDN)
GB 18030
UCS
Mapping
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and E-mail
Unicode typefaces
Unicode is a computing industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a repertoire of more than 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-right scripts).[1]
The Unicode Consortium, the non-profit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments.
Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java programming language, the Microsoft .NET Framework and modern operating systems.
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters), the now-obsolete UCS-2 (which uses 2 bytes for all characters, but does not include every character in the Unicode standard), and UTF-16 (which extends UCS-2, using 4 bytes to encode characters missing from UCS-2).

No comments:

Post a Comment