应该在HTTP URL中何时编码星号? [英] When should an asterisk be encoded in an HTTP URL?
问题描述
根据 RFC1738 ,星号(*)可以在未编码的情况下使用一个URL:
According to RFC1738, an asterisk (*) "may be used unencoded within a URL":
因此,只有字母数字,特殊字符$ -_。+!*'(),和
用于保留目的的保留字符可以在URL中使用
未编码。
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
然而, w3.org的命名和寻址材料说明星号是保留用作特殊用途特定方案中的重要性并暗示它应该被编码。
However, w3.org's Naming and Addressing material says that the asterisk is "reserved for use as having special signifiance within specific schemes" and implies that it should be encoded.
此外,根据 RFC3986 ,URL是一个URI:
Also, according to RFC3986, a URL is a URI:
术语统一资源定位器 (URL)指的是URI
的子集,除了标识资源之外,还提供
通过描述其主要访问机制
(例如,其网络位置)来定位资源的方法。
The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").
它还指定星号是sub-delim,它是保留集的一部分,并且:
It also specifies that the asterisk is a "sub-delim", which is part of the "reserved set" and:
URI生成应用程序应对
对应于保留集中字符的数据八位字节进行百分比编码,除非URI方案明确允许这些字符
来表示该
组件中的数据。
URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.
它还明确指出它更新 RFC1738 。
我将所有这些都理解为要求将星号编码在URL中,除非它们用于由URI方案。
I read all of this as requiring that asterisks be encoded in a URL unless they are used for a special purpose defined by the URI scheme.
RFC1738 HTTP URI sc的规范引用血红素?它是否以某种方式免除星号编码,或者由于 RFC3986 ?
Is RFC1738 the canonical reference for the HTTP URI scheme? Does it somehow exempt the asterisk from encoding, or is it obsolete in that regard due to RFC3986?
维基百科他说当没有保留目的时,他不需要对角色进行百分比编码。 RFC1738 是否会删除星号的保留用途?
Wikipedia says that "[t]he character does not need to be percent-encoded when it has no reserved purpose." Does RFC1738 remove the reserved purpose of the asterisk?
各种资源和工具似乎在这个问题上分开。
Various resources and tools seems split on this question.
PHP的 urlencode
和 rawurlencode
- 后者声称按照RFC3986 - 对星号进行编码。
PHP's urlencode
and rawurlencode
-- the latter of which purports to follow RFC3986 -- do encode the asterisk.
然而,JavaScript的转义
和 encodeURIComponent
不编码星号。
However, JavaScript's escape
and encodeURIComponent
do not encode the asterisk.
和Java的 URLEncoder
不对星号进行编码:
特殊字符。, - ,*和_保持不变。
The special characters ".", "-", "*", and "_" remain the same.
热门在线 工具( Google搜索在线网址编码器)也不编码星号。 网址编码和解码工具明确指出必须保留字符仅在某些情况下编码。它继续列出星号和&符作为保留字符。它编码&符号而不是星号。
Popular online tools (top two results for a Google search for "online url encoder") also do not encode the asterisk. The URL Encode and Decode Tool specifically states that "[t]he reserved characters have to be encoded only under certain circumstances." It goes on to list the asterisk and ampersand as reserved characters. It encodes the ampersand but not the asterisk.
Stack Exchange社区中的其他类似问题似乎有陈旧,不完整或难以理解的答案:
Other similar questions in the Stack Exchange community seem to have stale, incomplete, or unconvincing answers:
- urlencode()'asterisk' (明星?)字符这个问题突出了Java和PHP对星号的处理之间的区别,并提出了正确的问题。 接受的答案仅引用 RFC1738 ,未提及最近的 RFC3986 并解决冲突。 另一个答案承认存在差异,并建议特定URL的星号不同,而不是其他URI,但它没有为该结论提供具体权限。
- 可以网址有星号? 一个答案仅引用较旧的 RFC1738 和接受的答案意味着当它被用作分隔符时是可以接受的,人们认为它是保留目的。
- 我可以在网址中使用星号吗? 接受的答案似乎不鼓励使用aste没有澄清管理使用规则的风险。 另一个答案说你可以使用星号因为它是一个保留字符。但是,如果您将其用于保留目的,那是不是真的?
- 转义网址中的特殊字符 一个答案指出对是否必须编码星号存在一些含糊之处在一个URL。我试图通过这个问题解决这种含糊不清的问题。
- Spring UriUtils和RFC3986 这个问题指出UriUtil的
encodeQueryParam
声称遵循RFC3986 ,但它不编码星号。截至2014-08-01 12:50 PM CDT,该问题没有答案。 - 如何在JavaScript中对URL进行编码?这似乎是Stack Overflow上的规范JavaScript URL编码问题,虽然答案中注意到星号被排除在各种方法之外,但它们并未解决它们是否为应该。
- urlencode() the 'asterisk' (star?) character This question highlights the differences between Java's and PHP's treatment of the asterisk and asks which is "right". The accepted answer references only RFC1738, not mentioning the more recent RFC3986 and resolving the conflict. Another answer acknowledges the discrepancy and suggests that asterisks are different for URLs specifically, as opposed to other URIs, but it doesn't provide specific authority for that conclusion.
- Can an URL have an asterisk? One answer cites only the older RFC1738 and the accepted answer implies it's acceptable when being used as a delimiter, which one presumes is the "reserved purpose".
- Can I use asterisks in URLs? The accepted answer seems to discourage use of the asterisk without clarifying the rules governing the use. Another answer says you can use the asterisk "because it's a reserved character". But isn't that only true if you're using it for its reserved purpose?
- escaping special character in a url One answer points out that "there is some ambiguity on whether an asterisk must be encoded in a URL". I'm trying to resolve that ambiguity with this question.
- Spring UriUtils and RFC3986 This question notes that UriUtil's
encodeQueryParam
purports to follow RFC3986, but it doesn't encode the asterisk. There are no answers to that question as of 2014-08-01 12:50 PM CDT. - How to encode a URL in JavaScript? This seems to be the canonical JavaScript URL encoding question on Stack Overflow, and although the answers note that asterisks are excluded from the various methods, they don't address whether they should be.
考虑到所有这一切, 时应编码一个星号在HTTP URL?
With all this in mind, when should an asterisk be encoded in an HTTP URL?
推荐答案
简短回答
当前定义URL语法表示您永远不需要对URL的路径,查询或片段组件中的星号字符进行百分比编码。
Short answer
The current definition of URL syntax indicates that you never need to percent-encode the asterisk character in the path, query, or fragment components of a URL.
正如@Riley Major指出的那样,HTTP 1.1引用的URL语法的RFC已被 RFC3986 ,这不是bl ack和白色关于使用星号作为最初引用的RFC是。
As @Riley Major pointed out, the RFC that HTTP 1.1 references for URL syntax has been obsoleted by RFC3986, which isn't as black and white about the use of asterisks as the originally referenced RFC was.
星号永远不需要在HTTP 1.1 URL中编码,因为 *
在RFC2396 ,用于在HTTP 1.1中定义URI语法。 网址的路径组件中允许使用未保留的字符。
An asterisk never needs to be encoded in HTTP 1.1 URLs as *
is listed as an "unreserved character" in RFC2396, which is used to define URI syntax in HTTP 1.1. Unreserved characters are allowed in the path component of a URL.
2.3。未保留的字符
URI中允许但没有保留目的的数据字符称为未保留。这些包括大写和小写字母,十进制数字以及一组有限的标点符号和符号。
Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols.
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
无需更改语义
URI,但除非在不允许非转义字符出现的上下文中使用
,否则不应该这样做。
Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.
RFC3986(HTTP的当前URL语法)
RFC3986修改RFC2396以使星号成为保留字符,原因是它通常不安全解码 。我对此RFC的理解是,URL的路径,查询和片段组件中允许使用未编码的星号字符,因为这些组件未将星号指定为分隔符( 2.2。保留字符):
这些字符称为保留 因为可能(或可能不会)通过通用语法定义为分隔符 ...如果URI组件的数据与保留字符作为分隔符的目的冲突,那么冲突的数据必须是在URI形成之前进行百分比编码。
These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax... If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
此外,3.3路径确认保留字符的子集( sub-delims
)可以在路径段中使用未编码的部分(部分路径组件由 /
分解:
Additionally, 3.3 Path confirms that a subset of reserved characters (sub-delims
) can be used unencoded in path segments (parts of the path component broken up by /
):
除了点段( 。和...... )在分层路径中,通用语法认为路径段是不透明的。 URI生成应用程序通常使用段中允许的保留字符。
...
例如,分号(;)和等号(=)保留字符通常用于分隔适用于该段的参数和参数值。逗号(,)保留字符通常用于类似目的。例如,一个URI生成器可能使用诸如name; v = 1.1之类的段来表示对
name的版本1.1的引用,而另一个URI生成器可能使用诸如name,1.1之类的段来指示相同。
Aside from dot-segments ("." and "..") in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment. ... For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.
HTTP 1.0
HTTP 1.0引用RFC1738 定义URL语法,通过一系列更新和废弃,它意味着它使用与HTTP 1.1相同的RFC作为URL语法。
HTTP 1.0
HTTP 1.0 references RFC1738 to define URL syntax, which through a series of updates and obsoletes means it uses the same RFC as HTTP 1.1 for URL syntax.
就向后兼容性而言,RFC1738将星号指定为保留字符,但由于HTTP 1.0实际上没有为路径组件中未编码的星号定义任何特殊含义如果你使用URL,它不应该破坏任何东西。这应该意味着你仍然可以安全地将星号放在指向最旧系统的URL中。
As far as backwards compatibility goes, RFC1738 specifies the asterisk as a reserved character, though as HTTP 1.0 doesn't actually define any special meaning for an unencoded asterisk in the path component of a URL, it shouldn't break anything if you use one. This should mean you're still safe putting asterisks in the URLs pointing to the oldest of systems.
作为旁注,星号字符在两者的 Request-URI 中都有特殊含义HTTP规范,但不能用HTTP URL表示它:
As a side note, the asterisk character does have a special meaning in a Request-URI in both HTTP specs, but it's not possible to represent it with an HTTP URL:
星号*表示请求不适用于特定资源,但是服务器本身,并且仅在使用的方法不一定适用于资源时才允许。一个例子是
The asterisk "*" means that the request does not apply to a particular resource, but to the server itself, and is only allowed when the method used does not necessarily apply to a resource. One example would be
OPTIONS * HTTP/1.1
免责声明:我只是自己阅读和解释这些RFC,所以我可能错了。
Disclaimer: I'm just reading and interpreting these RFCs myself, so I may be wrong.
这篇关于应该在HTTP URL中何时编码星号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!