|
Java™ by example!
|
|
|
What is UTF-8?
The standard UTF-8 format is a Unicode encoding that is compatible with ASCII, allowing old programs to work with the new format (text searching, etc). ASCII values are encoded into a single byte. Java has a modified UTF-8 format. Arabic, Greek and Hebrew characters are encoded in two bytes and the rest is encoded in three bytes. The JVM does not recognize longer UTF-8 formats than 3 bytes. There is another exception in Java, '\u0000' is encoded in two bytes.
For example: (I'll take the example of the RFC - see links).
Further Information
Author of answer: Joris Van den Bogaert
Comments to this answer are only viewable by members. Login or become a member!
|
|
|
|
|