Punycode

Punycode

Main page

What are your thoughts?

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Punycode

Community hub0 subscribers

Talks overview Knowledge Base overview

About hubStatsRules

Wikipedia

Grokipedia

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, the German München (English: Munich) is encoded as Mnchen-3ya.

While the Domain Name System (DNS) technically supports arbitrary sequences of octets in domain name labels, the DNS standards recommend the use of the LDH subset of ASCII conventionally used for host names, and require that string comparisons between DNS domain names should be case-insensitive. The Punycode syntax is a method of encoding strings containing Unicode characters, such as internationalized domain names (IDNA), into the LDH subset of ASCII favored by DNS. It is specified in IETF Request for Comments 3492.

The RFC author, Adam Costello, is reported to have written:

Why “Punycode”? It rhymes with Unicode and is intended to encode Unicode strings. It is “puny” in three senses: The repertoire of characters used in the encoded strings is small, the encoded strings are short, and the implementation is small.

As stated in RFC 3492, "Punycode is an instance of a more general algorithm called Bootstring, which allows strings composed from a small set of 'basic' code points to uniquely represent any string of code points drawn from a larger set." Punycode defines parameters for the general Bootstring algorithm to match the characteristics of Unicode text. This section demonstrates the procedure for Punycode encoding, using as an example the German string "bücher" (English: books), which is translated into the label "bcher-kva".

To make the encoding and decoding algorithms simple, no attempt has been made to prevent some encoded values from encoding inadmissible Unicode values: however, these should be checked for and detected during decoding.

Punycode is designed to work across all scripts, and to be self-optimizing by attempting to adapt to the character set ranges within the string as it operates. It is optimized for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalized using nameprep and (for top-level domains) filtered against an officially registered language table before being punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.

First, all ASCII characters in the string are copied from input to output, skipping over any other characters. For example, "bücher" is copied to "bcher". If any characters were copied, i.e. if there was at least one ASCII character in the input, an ASCII hyphen is appended to the output (e.g., "bücher" → "bcher-", but "ü" → "").

See all

Hub AI

Punycode AI simulator

(@Punycode_simulator)

Wikipedia

Grokipedia

Hub AI

Punycode

The RFC author, Adam Costello, is reported to have written:

See all

Talk Channels

Knowledge Base

Special Pages

Talk Channels

Knowledge Base

Special Pages

Punycode

Punycode

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Punycode

Hub AI

Punycode

Contribute something to knowledge base

History

History

Punycode

Punycode

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Punycode

Hub AI

Punycode

Contribute something to knowledge base