为什么URL的编码和查询字符串部分不同? [英] Why does the encoding's of a URL and the query string part differ?

查看:118
本文介绍了为什么URL的编码和查询字符串部分不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究为什么我的查询参数加上 + 符号而不是%20 以及为什么他们有字符串如%C3%BC 而不是ü(UTF-8)作为编码的URL。

I was researching why my query parameters have plus + signs in it instead of %20 and why they have strings like %C3%BC instead of a ü (UTF-8) as an encoded URL does.

之后2小时的思考我的webapp与URL编码标准不兼容我发现查询字符串的编码方案与URL的编码不同(这里我指的是没有查询字符串的部分)。

After 2 hours of thinking my webapp is not compatible to the URL encoding standard I found that the encoding scheme of a query string is not the same as the encoding of a URL (here i mean the part without the query string).

示例:


  • 网址:

    • whitespace编码为%20

    • UTF-8字符保留UTF-8字符


    • 空格编码为+

    • UTF-8字符编码为十六进制表示

    所以有人可以告诉我为什么编码方案不同,因为查询参数是URL的一部分?

    So can someone tell me why do the encoding schemes differ, since the query parameters are a part of the URL?

    参见:

    • wiki Percent-encoding
    • wiki: Query String

    推荐答案

    URI起源于 RFC 1630 ,使用百分比编码作为允许表示不安全字符的方法。这个原始版本实际上提到了ISO Latin 1字符集作为非ASCII字符的编码。那年晚些时候 RFC 1738 在定义URL时删除了对Latin-1的引用。

    URIs originated in RFC 1630, with percent-encoding as a method to allow "unsafe" characters to be represented. This original version actually mentioned the ISO Latin 1 character set as the encoding for non-ASCII characters. RFC 1738 later that year removed this reference to Latin-1 in defining URLs.

    查询字符串格式实际上是一个不同的但相关的编码,application / x-www-form-urlencoded,在 RFC 1866 以及HTML 2.0。它基于 RFC 1738 ,但指定了空格(不是所有空格,只是ASCII代码为0x20的字符被'+'替换,并且换行符将被编码为CRLF(即%0D%0A )。前者很可能是因为在表单提交中为一个非常常见的字符节省了2个字节,代价是对于一个不太常见的字符使用额外的2个字节,后者是为了避免在使用不同字符结尾的系统之间进行传输时出现问题行编码。非ASCII字符未被考虑。

    The query string format is actually a different but related encoding, application/x-www-form-urlencoded, defined in RFC 1866 along with HTML 2.0. It was based on RFC 1738, but specified that spaces (not all whitespace, just the character with ASCII code 0x20) are replaced by '+' and that line breaks are to be encoded as CRLF (i.e. %0D%0A). The former is likely because that saves 2 bytes for a very common character in form submissions at the expense of using an extra 2 bytes for a much less common character, and the latter is to avoid problems when transferring between systems using different end-of-line codings. Non-ASCII characters were left unconsidered.

    URI中的UTF-8编码十多年后出现在 RFC 3986 ,虽然各个协议可能早先指定了这种或另一种非ASCII字符编码。为了保持向后兼容性,所有UTF-8八位字节必须进行百分比编码。随附的 RFC 3987 定义了国际化资源标识符(IRI),它们基本上是大多数代码点160及以上的URI允许出现未编码的,但许多协议仍然需要URI。请注意,上面的陈述不正确,因为 U RL可能不包含未编码的ü或任何其他非ASCII字符。

    UTF-8 coding in URIs came over a decade later, in RFC 3986, although individual protocols may have specified this or another encoding of non-ASCII characters earlier. To maintain backwards compatibility, all UTF-8 octets must be percent-encoded. The companion RFC 3987 defines "Internationalized Resource Identifiers" (IRIs) which are basically "URIs with most codepoints 160 and above allowed to appear unencoded", but many protocols still require URIs. Note that your statement above is incorrect, as a URL may not contain an unencoded ü or any other non-ASCII character.

    application / x-www-form-urlencoded已经以不同的方式进行了国际化。 HTML5规范的application / x-www-form-urlencoded 明确允许任何与ASCII兼容的字符集可用于查询字符串中的字符,实际上是不同的字段可能使用不同的字符集,但所有非ASCII八位字节仍必须进行百分比编码。当在IRI的查询部分中使用时,如果正确规范化的UTF-8被用作字符集,那么这些字符可能被表示为未编码,因为转换回URI将导致在正确的应用程序/ x-www-form-urlencoded数据。

    application/x-www-form-urlencoded has been internationalized in a different manner. The HTML5 specification of application/x-www-form-urlencoded explicitly allows that any ASCII-compatible character set may be used for characters in the query string, and in fact different fields may use different character sets, but all non-ASCII octets must still be percent-encoded. When used in the query part of an IRI, it is possible that these characters could be represented unencoded if properly-normalized UTF-8 is being used as the character set, since conversion back to a URI would result in correct application/x-www-form-urlencoded data.

    这篇关于为什么URL的编码和查询字符串部分不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆