Patrik Fältström is the Technical Director and Head of Security at Netnod. He is also a member of the ICANN’s Security and Stability Advisory Committee (SSAC). In this article, he explains the risks and problems associated with the use of emojis in domain names.
The IDNA2003 standard, created in 2003, enabled users to have non-ASCII characters in domain names.
To be technically correct, the DNS has always handled all values of a byte (decimal 0-255) in the bytes in a label. So as long as one knows what character set and encoding is in use, the domain name system can carry any data —including a period (‘.’)— inside a label. However, most applications interpreted the bytes as ASCII. Due to this condition, the use of non-ASCII was in practise not possible.
IDNA2003 did not change what was sent in the DNS, but instead specified what character set to use (Unicode) and what encoding to use (Punycode encoded). By encoding and decoding Unicode in ASCII, not only was there an agreement, but the standard was also backward compatible with applications that only used ASCII.
But IDNA2003 had issues. The selection of characters was made one by one, and the standard included a mapping between Unicode characters. Because of these features, a label with Unicode characters that was encoded according to IDNA2003 and then decoded could result in a different Unicode string. In short, the standard was neither agnostic to Unicode version, nor did it ensure a 1:1 mapping between ASCII encoding and native Unicode.
A new version of IDNA was created: IDNA2008. It changed many things. The table with code points was replaced by an algorithm with which one could calculate whether a Unicode character could be used or not. In IDNA2008, the mapping was not included in the mapping between ASCII and Unicode so the transformation was 1:1.
Neither IDNA2003 nor IDNA2008 allowed symbols to be used in a domain name. As Emojis are symbols (general category “Symbol Other”, or “So”), they are not allowed according to the standard. In fact, the standard explicitly states that a registry should not allow every permissible Unicode character. So even if a character is allowed by the standard, it might not be allowed by a registry.
But just like a registry can restrict what characters it allows (which in fact is what the Internet Architecture Board suggests), it can also allow characters that are forbidden by the standard. Indeed, some TLDs have accepted registration of second-level domains with Emojis, and a few are still doing that.
As a consequence of this acceptance, the registrant might get a domain name which does not work as expected, and the users of such domain names (as part of a URL for example) might be a target for phishing or just confusion. These are serious consequences for having a domain name that the registrant might believe is “cute”.
The Security and Stability Advisory Committee (SSAC) of ICANN has written a report (SAC095 - SSAC Advisory on the Use of Emoji in Domain Names) on this topic that addresses a few issues with emojis. What follows is a summary of the main issues identified in the SSAC report:
The first issue is the similarity between emojis. Many emojis look alike: think of the smileys that are only different because of the different position of the eyes or because they show the mouth open or closed, etc. One way of demonstrating the potential confusion here is to notice that different smileys are more similar on the same operating system than the same smiley on two different operating systems.
The second issue is that it is possible to combine characters, thus, to change how these characters are rendered. For example, can three persons be joined to a group of three persons, or combined in some way?; can three animals be combined, or two dogs and a ship? Another example here is the rainbow flag, which is made by combining a white flag and a rainbow. All such combinations have the potential for confusion.
The third issue is that some combinations can be used for modifications, like changing the skin color of a person or “thumbs up”. Exactly how to differentiate between such rendered combinations of characters makes the resulting string(s) extremely difficult to separate from each other.
The fourth issue is related to universal acceptance. The simple case here is that there is no universal way of saying an emoji. It is extremely difficult for people with disabilities to use their tools to read and write emojis.
For reasons such as the ones cited above, the Unicode Consortium, the IETF and ICANN have all, in their various processes, concluded that emojis are not to be allowed in domain names.
From my perspective, the most problematic issue with emojis is that registrants mix up their interest in having words and text as identifiers. This confusion is the key issue here. Just because emojis and many other characters are commonly used in email, text or chat messages, it does not follow that they make good identifiers in domain names.
Today, we only have these issues in some TLDs where domain names with emojis were registered before the registries added a good policy. Unfortunately, there are still some TLDs that allow new registrations of domain names with emojis.*
It might look fun to have emojis in domain names, but it is far from practical and causes a whole array of negative consequences. Emojis are fun in text, I like them too, so let’s keep them there where they do no harm, and away from domain names, where they certainly cause problems.