Revision as of 23:01, 4 August 2021

Unicode is a standard for text encoding.
It defines a mapping of integers to characters in various languages (code-points).
Various text-encodings alter how the integer is divided across byte(s),
but regardless of it's composition, the assigned number/character is constant.

For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte, and if the number spans multiple bytes.
The remaining bits are assembled into one large integer, that refers to a code-point/character.

UTF-1,7,8,16,32 all map to the same character set defined by unicode.

Documentation

wikipedia https://en.wikipedia.org/wiki/Unicode

@@ Line 4: / Line 4: @@
 but regardless of it's composition, the assigned number/character is constant.
-For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte,<br>
+For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte, and if the number spans multiple bytes.<br>
-and if the number spans multiple bytes afterwards.<br>
+The remaining bits are assembled into one large integer, that refers to a ''code-point''/character.
-The remaining bits are assembled into one large integer, that may span multiple bytes worth of bits.
 UTF-1,7,8,16,32 all map to the same character set defined by unicode.

Anonymous

Search

Unicode: Difference between revisions

Namespaces

More

Page actions

Revision as of 23:01, 4 August 2021

Documentation

Navigation

Navigation

Programs

QuickRef

Operating Systems

wiki pages

Wiki tools

Wiki tools

Anonymous

Search

Unicode: Difference between revisions

Revision as of 23:01, 4 August 2021

Documentation

Navigation

Wiki tools

Page tools