我应该假设 URL 中的编码字符使用什么字符集? [英] What character set should I assume the encoded characters in a URL to be in?

查看:23
本文介绍了我应该假设 URL 中的编码字符使用什么字符集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

RFC 1738 指定了 URL 的语法,并提到了

RFC 1738 specifies the syntax for URL's, and mentions that

网址只用图形写
的可打印字符US-ASCII 编码的字符集.这八位字节 80-FF 十六进制不是
用于 US-ASCII 和八位字节 00-1F和 7F 十六进制表示
控制字符;这些必须是编码.

URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.

然而,它没有说明这些八位字节代表什么代码集.

It does not, however, say what code set these octets then represent.

RFC 2396 似乎试图改善这种情况,但是:

RFC 2396 seems to try and improve on the situation, but:

对于原始字符序列,包含非 ASCII 字符,但情况更多难的.传输八位字节序列的互联网协议旨在表示字符序列有望提供某种方式识别使用的字符集,如果可能有多个[RFC2277].不过目前国内没有规定用于完成此标识的通用 URI 语法.单个 URI方案可能需要单个字符集,定义默认字符集,或提供一种方法来指示所使用的字符集.

For original character sequences that contain non-ASCII characters, however, the situation is more difficult. Internet protocols that transmit octet sequences intended to represent character sequences are expected to provide some way of identifying the charset used, if there might be more than one [RFC2277]. However, there is currently no provision within the generic URI syntax to accomplish this identification. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.

预计将对 URI 中的字符编码进行系统处理作为本规范的未来修改而开发.

It is expected that a systematic treatment of character encoding within URI will be developed as a future modification of this specification.

是否有任何明确的方法可以让客户端确定使用哪个字符集来解释编码的八位字节,或者服务器可以确定客户端使用什么编码?

Is there any unambigous way in which a client can determine in which character set to interpret encoded octets, or in which a server can determine what a client used to encode with ?

在我看来,大多数服务器都默认使用 UTF-8,但这似乎是一种事实上的选择,而不是指定的选择.

It looks to me like most servers default to UTF-8, but this seems to be a de facto choice more than a specified one.

推荐答案

根据您的说法,URL 是 ASCII.仅此而已.

As per your quote, URLs are ASCII. That's all.

URIs OTOH,允许更大的字符集;通常是您自己说的 UTF-8.

URIs OTOH, allow for bigger charsets; usually UTF-8 as you said yourself.

需要记住的一点是 URL 是 URI 的子集.因此,真正的问题是,其中哪些是您在浏览器中编写的内容?

The point to remember is that URLs are a subset of URIs. Therefore, the real question is, which of these is what you write in a browser?

我猜你可以写一个 URI,浏览器应该尽量转换成一个 URL(这是 HTTP/1.1 支持的,AFAICR).对于非 ASCII 字符,这意味着十六进制代码,通常编码为 UTF-8.

I'd guess you can write an URI, and the browser should try its best to transform to an URL (which is what HTTP/1.1 support, AFAICR). For non-ASCII characters, that means hexcodes, usually coding UTF-8.

这篇关于我应该假设 URL 中的编码字符使用什么字符集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆