Total possible glyphs using UTF-8
UTF-8 is an encoding method for representing large amount of glyphs. UTF-8 will use one, two, three, or four bytes to encode a given glyph depending on the given code point needed. Wikipedia has a good table that explains how UTF-8 breaks out:
Number of bytes | Code point bits | First code point | Last code point | Byte 1 | Byte 2 | Byte 3 | Byte 4 |
---|---|---|---|---|---|---|---|
1 | 7 | U+0000 | U+007F | 0xxxxxxx | |||
2 | 11 | U+0080 | U+07FF | 110xxxxx | 10xxxxxx | ||
3 | 16 | U+0800 | U+FFFF | 1110xxxx | 10xxxxxx | 10xxxxxx | |
4 | 21 | U+10000 | U+10FFFF | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |
There are 1,114,112 (17 x 2^16) total code points available. BableStone reports that 276,337 (approximately 24.8%) code points are in use, which leaves 837,775 still available. That's a lot of room left for emojis.