Revision as of 22:58, 4 August 2021

Unicode is a standard for text encoding.
It defines a mapping of integers to characters in various languages (code-points).
Various text-encodings alter how the integer is divided across byte(s),
but the number/character is consistent across encodings.

For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte,
and if the number spans multiple bytes afterwards.
The remaining bits are assembled into one large integer, that may span multiple bytes worth of bits.

UTF-1,7,8,16,32 all map to the same character set defined by unicode.

Documentation

wikipedia https://en.wikipedia.org/wiki/Unicode

@@ Line 1: / Line 1: @@
 Unicode is a standard for text encoding.<br>
-It extends [[ASCII]], which espresses english in the first 7-bits of an 8-bit integer.<br>
+It defines a mapping of integers to characters in various languages (''code-points'').<br>
-In unicode, the 8th bit is used to change switch out sets of 128 characters.<br>
+Various text-encodings alter how the integer is divided across byte(s),<br>
-When the first bit is 0, the character set is ascii - which conveniently means all ASCII is valid unicode.
+but the number/character is consistent across encodings.
-Unicode can be represented using various different text encodings.
+For example, UTF-8 uses the first 1-5 bits of a byte to indicate the type of byte,<br>
+and if the number spans multiple bytes afterwards.<br>
+The remaining bits are assembled into one large integer, that may span multiple bytes worth of bits.
+UTF-1,7,8,16,32 all map to the same character set defined by unicode.
 = Documentation =

Anonymous

Search

Unicode: Difference between revisions

Namespaces

More

Page actions

Revision as of 22:58, 4 August 2021

Documentation

Navigation

Navigation

Programs

QuickRef

Operating Systems

wiki pages

Wiki tools

Wiki tools

Anonymous

Search

Unicode: Difference between revisions

Revision as of 22:58, 4 August 2021

Documentation

Navigation

Wiki tools

Page tools