哪些字符必须在 HTTP 查询字符串中转义? [英] What characters must be escaped in an HTTP query string?
问题描述
这个问题涉及 URL 的查询字符串部分中出现在 ?
标记字符之后的字符.
This question concerns the characters in the query string portion of the URL, which appear after the ?
mark character.
根据 维基百科,某些字符保持原样,其他字符进行编码(通常带有%
转义序列).
Per Wikipedia, certain characters are left as is and others are encoded (usually with a %
escape sequence).
我一直试图将其追溯到实际规范,以便我了解该维基百科页面中每个要点背后的理由.
I've been trying to track this down to actual specifications, so that I understand the justification behind every bullet point in that Wikipedia page.
矛盾示例 1:
HTML 规范说将空间编码为 +
并将其余部分推迟到 RFC1738.然而,这个 RFC 说 ~
是不安全的,而且[a]ll 不安全的字符必须始终在 URL 中编码".这似乎与维基百科相矛盾.
The HTML specification says to encode space as +
and defers the rest to RFC1738. However, this RFC says that ~
is unsafe and furthermore that "[a]ll unsafe characters must always be encoded within the URL". This seems to contradict Wikipedia.
实际上,IE8 在其生成的查询字符串中对 ~
进行编码,而 FF3 保持原样.
In practice, IE8 encodes ~
in the query strings it generates, while FF3 leaves it as is.
矛盾示例 2:
维基百科规定,所有未提及的字符都必须进行编码.!
未在维基百科中提及.但是 RFC1738 指出 !
是一个特殊"字符和可以使用未编码".这似乎与维基百科说它必须被编码相矛盾.
Wikipedia states that all characters that it does not mention must be encoded. !
is not mentioned in Wikipedia. But RFC1738 states that !
is a "special" character and "may be used unencoded". This seems to contradict Wikipedia which says that it must be encoded.
实际上,IE8 在其生成的查询字符串中对 !
进行编码,而 FF3 则保持原样.
In practice, IE8 encodes !
in the query strings it generates, while FF3 leaves it as is.
我知道这样做的寓意可能是对那些在维基百科和规范之间存在疑问的字符进行编码.甚至可能对不是 [A-Za-z0-9] 的所有内容进行编码.我只想知道这方面的实际标准.
结论
维基百科上描述的算法精确编码那些不是 RFC3986 非保留字符的字符.也就是说,它对字母数字和 -._~
以外的所有字符进行编码.作为一种特殊情况,根据 RFC3986,空格被编码为 +
而不是 %20
.
The algorithm described on Wikipedia encodes precisely those characters which are not RFC3986 unreserved characters. That is, it encodes all characters other than alphanumerics and -._~
. As a special case, space is encoded as +
instead of %20
per RFC3986.
某些应用程序使用较旧的 RFC.为了比较,RFC2396 非保留字符是字母数字和!'()*-._~
.
Some applications use an older RFC. For comparison, the RFC2396 unreserved characters are alphanumerics and !'()*-._~
.
为了比较,HTML5 工作草案算法 对除字母数字和 *-._
之外的所有字符进行编码.空格的特殊情况编码仍然是 +
.显着的区别是 *
没有被编码,而 ~
被编码.(从技术上讲,这种对 *
的处理与 RFC3986 兼容,即使 *
在 reserved
中,因为它在 sub-delims
允许在 query
产品中使用.)
For comparison, the HTML5 working draft algorithm encodes all characters other than alphanumerics and *-._
. The special case encoding for space remains +
. Notable differences are that *
is not encoded and ~
is encoded. (Technically, this handling of *
is compatible with RFC3986 even though *
is in reserved
because it is in the sub-delims
which are allowed in the query
production.)
推荐答案
答案在 RFC 3986 文档中,特别是 第 3.4 节.
The answer lies in the RFC 3986 document, specifically Section 3.4.
查询组件由第一个问题表示标记(?")字符并以数字符号(#")字符结尾或在 URI 的末尾.
The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.
...
字符斜线(/")和问号(?")可以代表数据在查询组件中.
The characters slash ("/") and question mark ("?") may represent data within the query component.
从技术上讲,RFC 3986-3.4 将查询组件定义为:
Technically, RFC 3986-3.4 defines the query component as:
query = *( pchar / "/" / "?" )
这个语法意味着查询可以包含来自 pchar
以及 /
和 ?
的所有字符.pchar
指的是另一种路径字符规范.RFC 3986 的 Appendix A 列出了相关的 ABNF 定义,大多数值得注意的是:
This syntax means that query can include all characters from pchar
as well as /
and ?
. pchar
refers to another specification of path characters. Helpfully, Appendix A of RFC 3986 lists the relevant ABNF definitions, most notably:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
因此,除了所有字母数字和百分比编码字符之外,查询还可以合法地包含以下未编码字符:
Thus, in addition to all alphanumerics and percent encoded characters, a query can legally include the following unencoded characters:
/ ? : @ - . _ ~ ! $ & ' ( ) * + , ; =
当然,您可能要记住="和&"通常在查询中具有特殊意义.
Of course, you may want to keep in mind that '=' and '&' usually have special significance within a query.
这篇关于哪些字符必须在 HTTP 查询字符串中转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!