HTTP 标头值中允许使用哪些字符? [英] what characters are allowed in HTTP header values?

查看:23
本文介绍了HTTP 标头值中允许使用哪些字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在研究HTTP/1.1 标准,特别是第31页和相关之后我得出了结论任何 8 位八位字节都可以出现在 HTTP 标头值中.IE.代码在 [0,255] 范围内的任何字符.

After studying HTTP/1.1 standard, specifically page 31 and related I came to conclusion that any 8-bit octet can be present in HTTP header value. I.e. any character with code from [0,255] range.

然而,我尝试过的 HTTP 服务器拒绝接收任何代码 > 127(或大多数 US-ASCII 不可打印字符)的内容.

And yet HTTP servers I tried refuse to take anything with code > 127 (or most US-ASCII non-printable chars).

这是标准中使用的语法摘录:

Here is dried out excerpt of grammar used in standard:

message-header = field-name ":" [ field-value ]
field-name     = token
field-value    = *( field-content | LWS )
field-content  = <the OCTETs making up the field-value and consisting of
                  either *TEXT or combinations of token, separators, and
                  quoted-string>

CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
SP             = <US-ASCII SP, space (32)>
HT             = <US-ASCII HT, horizontal-tab (9)>
CRLF           = CR LF
LWS            = [CRLF] 1*( SP | HT )
OCTET          = <any 8-bit sequence of data>
CHAR           = <any US-ASCII character (octets 0 - 127)>
CTL            = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
TEXT           = <any OCTET except CTLs, but including LWS>

token          = 1*<any CHAR except CTLs or separators>
separators     = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | ""
               | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT

quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext         = <any TEXT except <">>
quoted-pair    = "" CHAR

如您所见,field-content 可以是 quoted-string,它是 TEXT 的引用序列(即任何 8-位字节,除了 "[0-8, 11-12, 14-31, 127] 范围内的值)或 quoted-pair ( 后跟 [0, 127] 范围内的任何值).即任何 8 位字符序列都可以通过将其引用并在特殊符号前加上前缀来传递).

As you can see field-content can be a quoted-string, which is an enquoted sequence of TEXT (i.e. any 8-bit octet with exception of " and values from [0-8, 11-12, 14-31, 127] range) or quoted-pair ( followed by any value from [0, 127] range). I.e. any 8-bit char sequence can be passed by en-quoting it and prefixing special symbols with ).

(请注意,标准不会以任何特殊方式处理 NUL(0x00) 字符)

(Note that standard doesn't treat NUL(0x00) char in any special way)

但是,很明显,要么我尝试的所有服务器都不符合标准,要么标准自 1999 年以来发生了变化,或者我无法正确阅读.

But, obviously either all servers I tried are not conforming or standard has changed since 1999 or I can't read it properly.

那么... HTTP 标头值中允许使用哪些字符以及为什么?

So... which characters are allowed in HTTP header values and why?

附:这一切背后的原因:我正在寻找一种在 HTTP 标头值中传递 utf-8 编码序列的方法(如果可能,无需额外编码).

P.S. Reason behind all of this: I am looking for a way to pass utf-8-encoded sequence in HTTP header value (without additional encoding, if possible).

推荐答案

RFC 2616 已过时,相关部分已替换为 RFC 7230.

RFC 2616 is obsolete, the relevant part has been replaced by RFC 7230.

NUL 字节不再允许在注释和引用字符串文本中,并澄清了其中的反斜杠转义处理.这引用对规则不再允许转义其他控制字符比HTAB.标头字段中的非 US-ASCII 内容和原因短语已被废弃并变得不透明(TEXT 规则已被删除).(第 3.2.6 节)

The NUL octet is no longer allowed in comment and quoted-string text, and handling of backslash-escaping in them has been clarified. The quoted-pair rule no longer allows escaping control characters other than HTAB. Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)

本质上,RFC 2616 默认为 ISO-8859-1,这既不充分也不可互操作.因此,RFC 7230 已弃用字段值中的非 ASCII 八位字节.建议在此之上使用转义机制(例如在 RFC 8187 或纯 URI 百分比编码).

In essence, RFC 2616 defaulted to ISO-8859-1, and this was both insufficient and not interoperable anyway. Thus, RFC 7230 has deprecated non-ASCII octets in field values. The recommendation is to use an escaping mechanism on top of that (such as defined in RFC 8187, or plain URI-percent-encoding).

这篇关于HTTP 标头值中允许使用哪些字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆