HTTP标头值允许哪些字符? [英] what characters are allowed in HTTP header values?
问题描述
在学习了 HTTP / 1.1标准后,特别是第31页和相关的我来到了结论是任何8位八位字节都可以出现在HTTP头值中。即代码来自[0,255]范围的任何字符。
After studying HTTP/1.1 standard, specifically page 31 and related I came to conclusion that any 8-bit octet can be present in HTTP header value. I.e. any character with code from [0,255] range.
然而我试过的HTTP服务器拒绝接受代码> 127(或大多数US-ASCII不可打印字符)的任何内容。
And yet HTTP servers I tried refuse to take anything with code > 127 (or most US-ASCII non-printable chars).
以下是标准中使用的语法摘录:
Here is dried out excerpt of grammar used in standard:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value and consisting of
either *TEXT or combinations of token, separators, and
quoted-string>
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>
CRLF = CR LF
LWS = [CRLF] 1*( SP | HT )
OCTET = <any 8-bit sequence of data>
CHAR = <any US-ASCII character (octets 0 - 127)>
CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
TEXT = <any OCTET except CTLs, but including LWS>
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\"
| <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
quoted-pair = "\" CHAR
正如你所看到的那样 field-content
可以是 quoted-string
,这是一个引用的序列 TEXT
(即除了之外的任何8位八位字节和来自
的值[0-8,11-12,14-31,127]
range)或 quoted-pair
( \
后跟<$ c中的任何值$ c> [0,127] range)。即任何8位char序列都可以通过引用它并使用 \ $ c为特殊符号加前缀来传递$ c>)。
As you can see field-content
can be a quoted-string
, which is an enquoted sequence of TEXT
(i.e. any 8-bit octet with exception of "
and values from [0-8, 11-12, 14-31, 127]
range) or quoted-pair
(\
followed by any value from [0, 127]
range). I.e. any 8-bit char sequence can be passed by en-quoting it and prefixing special symbols with \
).
(注意标准不会以任何特殊方式处理 NUL(0x00)
char )
(Note that standard doesn't treat NUL(0x00)
char in any special way)
但是,显然我所尝试的所有服务器都不符合标准或自1999年以来标准发生了变化,或者我无法正确阅读。
But, obviously either all servers I tried are not conforming or standard has changed since 1999 or I can't read it properly.
那么...... HTTP标题值允许哪些字符以及为什么?
So... which characters are allowed in HTTP header values and why?
PS背后的原因:我正在寻找一种方法在HTTP标头值中传递utf-8编码序列(带如果可能的话,输出额外的编码。
P.S. Reason behind all of this: I am looking for a way to pass utf-8-encoded sequence in HTTP header value (without additional encoding, if possible).
推荐答案
RFC 2616已经过时(参见 https://www.rfc-editor.org/info/rfc2616 ),相关部分已被RFC 7230取代(见< a href =https://www.greenbytes.de/tech/webdav/rfc7230.html#rfc.section.A.2.p.9 =nofollow noreferrer> https://www.greenbytes.de /tech/webdav/rfc7230.html#rfc.section.A.2.p.9 ):
RFC 2616 is obsolete (see https://www.rfc-editor.org/info/rfc2616), the relevant part has been replaced by RFC 7230 (see https://www.greenbytes.de/tech/webdav/rfc7230.html#rfc.section.A.2.p.9):
NUL注释和引用字符串文本中不再允许八位字节,
和它们中的反斜杠转义处理已经澄清。
引用对规则不再允许转义其他
的控制字符而不是HTAB。 标题字段中的非US-ASCII内容和原因短语
已被废弃并变为不透明(删除了TEXT规则)。
(第3.2.6节)
The NUL octet is no longer allowed in comment and quoted-string text, and handling of backslash-escaping in them has been clarified. The quoted-pair rule no longer allows escaping control characters other than HTAB. Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)
从本质上讲,RFC 2616默认为ISO-8859-1,无论如何这都是不够的,也不是可互操作的。因此,RFC 7230已弃用字段值中的非ASCII八位字节。建议在其上使用转义机制(例如RFC 8187中定义的,或普通URI-percent-encoding)。
In essence, RFC 2616 defaulted to ISO-8859-1, and this was both insufficient and not interoperable anyway. Thus, RFC 7230 has deprecated non-ASCII octets in field values. The recommendation is to use an escaping mechanism on top of that (such as defined in RFC 8187, or plain URI-percent-encoding).
这篇关于HTTP标头值允许哪些字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!