网址可以有UTF-8字符吗? [英] Can urls have UTF-8 characters?

查看:216
本文介绍了网址可以有UTF-8字符吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇,如果我应该用ASCII或UTF-8编码网址。我认为网址不能有非ASCII字符,但有人告诉我,他们可以有UTF-8,我搜索周围,不能很好地找到哪一个是真的。有人知道吗?

I was curious if I should encode urls with ASCII or UTF-8. I was under the belief that urls cannot have non-ASCII characters, but someone told me they can have UTF-8, and I searched around and couldn't quite find which one is true. Does anyone know?

推荐答案

这有两个部分,但都是yes。

There are two parts to this, but they both amount to "yes".

通过 IDNA ,可以使用完整的Unicode字符集(有一些小的扭曲,以防止歧义和滥用)。

With IDNA, it is possible to register domain names using the full Unicode repertoire (with a few minor twists to prevent ambiguities and abuse).

路径部分没有严格规定,但可以编码路径中的任意字符串。浏览器可以选择显示人工可读的呈现而不是编码路径。但是,这需要启发式,因为没有办法指定路径的字符集和编码。

The path part is not strictly regulated, but it's possible to encode arbitrary strings in the path. The browser could opt to display a human-readable rendering rather than an encoded path. However, this requires heuristics, as there is no way to specify the character set and encoding of the path.

因此, http://xn--msic-0ra.example/mot%C3%B6rhead 是一个(虚构的例子,不完全正确)计算机可读编码的URL,其可以作为http://müsic.example/motorhead显示给用户。域名在称为Punycode的东西中编码为 xn - msic-0ra.example ,路径包含编码为UTF-8和URL编码的标签motörhead Unicode代码点 U + 00F6 使用两个字节0xC3 0xB6在UTF-8中)。

So, http://xn--msic-0ra.example/mot%C3%B6rhead is a (fictional example, not entirely correct) computer-readable encoded URL which could be displayed to the user as http://müsic.example/motörhead. The domain name is encoded as xn--msic-0ra.example in something called Punycode, and the path contains the label "motörhead" encoded as UTF-8 and URL encoded (the Unicode code point U+00F6 is reprecented with the two bytes 0xC3 0xB6 in UTF-8).

路径也可以是 mot%F6rhead 拉丁语-1。在这种情况下,推导一个合理的人类可读的表示将是困难得多,但也许周围的字符的上下文可以提供足够的提示一个好的猜测。

The path could also be mot%F6rhead which is the same label in Latin-1. In this case, deducing a reasonable human-readable representation would be much harder, but perhaps the context of the surrounding characters could offer enough hints for a good guess.

%F6 几乎可以是任何东西,%C3%B6 UTF-16。

In isolation, %F6 could be pretty much anything, and %C3%B6 could be e.g. UTF-16.

这篇关于网址可以有UTF-8字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆