Unicode: Difference between revisions
From wikinotes
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
but regardless of it's composition, the assigned number/character is constant. | but regardless of it's composition, the assigned number/character is constant. | ||
For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte, | For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte, and if the number spans multiple bytes.<br> | ||
and if the number spans multiple bytes | The remaining bits are assembled into one large integer, that refers to a ''code-point''/character. | ||
The remaining bits are assembled into one large integer, that | |||
UTF-1,7,8,16,32 all map to the same character set defined by unicode. | UTF-1,7,8,16,32 all map to the same character set defined by unicode. |
Revision as of 23:01, 4 August 2021
Unicode is a standard for text encoding.
It defines a mapping of integers to characters in various languages (code-points).
Various text-encodings alter how the integer is divided across byte(s),
but regardless of it's composition, the assigned number/character is constant.
For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte, and if the number spans multiple bytes.
The remaining bits are assembled into one large integer, that refers to a code-point/character.
UTF-1,7,8,16,32 all map to the same character set defined by unicode.
Documentation
wikipedia https://en.wikipedia.org/wiki/Unicode