Unicode: Difference between revisions
From wikinotes
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Unicode is a standard for text encoding.<br> | Unicode is a standard for text encoding.<br> | ||
It | It defines a mapping of integers to characters in various languages (''code-points'').<br> | ||
Various text-encodings alter how the integer is divided across byte(s),<br> | |||
but the number/character is consistent across encodings. | |||
For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte,<br> | |||
and if the number spans multiple bytes afterwards.<br> | |||
The remaining bits are assembled into one large integer, that may span multiple bytes worth of bits. | |||
UTF-1,7,8,16,32 all map to the same character set defined by unicode. | |||
= Documentation = | = Documentation = |
Revision as of 22:58, 4 August 2021
Unicode is a standard for text encoding.
It defines a mapping of integers to characters in various languages (code-points).
Various text-encodings alter how the integer is divided across byte(s),
but the number/character is consistent across encodings.
For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte,
and if the number spans multiple bytes afterwards.
The remaining bits are assembled into one large integer, that may span multiple bytes worth of bits.
UTF-1,7,8,16,32 all map to the same character set defined by unicode.
Documentation
wikipedia https://en.wikipedia.org/wiki/Unicode