网址可以有 UTF-8 字符吗? [英] Can urls have UTF-8 characters?

查看:16
本文介绍了网址可以有 UTF-8 字符吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇我是否应该用 ASCII 或 UTF-8 编码网址.我相信 url 不能有非 ASCII 字符,但有人告诉我他们可以有 UTF-8,我四处搜索并不能完全找到哪个是真的.有人知道吗?

I was curious if I should encode urls with ASCII or UTF-8. I was under the belief that urls cannot have non-ASCII characters, but someone told me they can have UTF-8, and I searched around and couldn't quite find which one is true. Does anyone know?

推荐答案

这有两个部分,但它们都等于是".

There are two parts to this, but they both amount to "yes".

使用 IDNA,可以使用完整的 Unicode 曲目(带有一些小改动以防止歧义和滥用).

With IDNA, it is possible to register domain names using the full Unicode repertoire (with a few minor twists to prevent ambiguities and abuse).

路径部分没有严格规定,但可以在路径中对任意字符串进行编码.浏览器可以选择显示人类可读的渲染而不是编码路径.但是,这需要启发式方法,因为无法指定路径的字符集和编码.

The path part is not strictly regulated, but it's possible to encode arbitrary strings in the path. The browser could opt to display a human-readable rendering rather than an encoded path. However, this requires heuristics, as there is no way to specify the character set and encoding of the path.

所以,http://xn--msic-0ra.example/mot%C3%B6rhead 是一个(虚构的例子,不完全正确)计算机可读的编码 URL,它可以作为 http://müsic.example/motörhead.域名在称为 Punycode 的内容中编码为 xn--msic-0ra.example,并且路径包含编码为 UTF-8 和 URL 编码的标签motörhead"(Unicode 代码点 U+00F6 在 UTF-8 中用两个字节 0xC3 0xB6 表示.

So, http://xn--msic-0ra.example/mot%C3%B6rhead is a (fictional example, not entirely correct) computer-readable encoded URL which could be displayed to the user as http://müsic.example/motörhead. The domain name is encoded as xn--msic-0ra.example in something called Punycode, and the path contains the label "motörhead" encoded as UTF-8 and URL encoded (the Unicode code point U+00F6 is reprecented with the two bytes 0xC3 0xB6 in UTF-8).

路径也可以是 mot%F6rhead,它与 Latin-1 中的标签相同.在这种情况下,推断出一个合理的人类可读的表示会困难得多,但也许周围字符的上下文可以提供足够的提示来进行正确的猜测.

The path could also be mot%F6rhead which is the same label in Latin-1. In this case, deducing a reasonable human-readable representation would be much harder, but perhaps the context of the surrounding characters could offer enough hints for a good guess.

单独来看,%F6 几乎可以是任何东西,而 %C3%B6 可以是例如UTF-16.

In isolation, %F6 could be pretty much anything, and %C3%B6 could be e.g. UTF-16.

这篇关于网址可以有 UTF-8 字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆