Total possible glyphs using UTF-8

Total possible glyphs using UTF-82017-08-14

UTF-8 is an encoding method for representing large amount of glyphs. UTF-8 will use one, two, three, or four bytes to encode a given glyph depending on the given code point needed. Wikipedia has a good table that explains how UTF-8 breaks out:

Number of bytes	Code point bits	First code point	Last code point	Byte 1	Byte 2	Byte 3	Byte 4
1	7	U+0000	U+007F	0xxxxxxx
2	11	U+0080	U+07FF	110xxxxx	10xxxxxx
3	16	U+0800	U+FFFF	1110xxxx	10xxxxxx	10xxxxxx
4	21	U+10000	U+10FFFF	11110xxx	10xxxxxx	10xxxxxx	10xxxxxx

There are 1,114,112 (17 x 2^16) total code points available. BableStone reports that 276,337 (approximately 24.8%) code points are in use, which leaves 837,775 still available. That's a lot of room left for emojis.

Tags:

Unicode

UTF