HTTP get 请求字符串的正确编码是什么? [英] What's the correct encoding of HTTP get request strings?

查看:39
本文介绍了HTTP get 请求字符串的正确编码是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HTTP 标准或其他东西是否定义了特殊字符在使用 %XXs 在 url 中编码之前应该使用哪种编码?如果它没有定义有没有办法定义使用哪种编码?似乎大多数浏览器都以 utf-8 格式发送数据.

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs? If it doesn't define is there a way define which encoding is used? It seems that most browsers send the data in utf-8.

推荐答案

HTTP 标准或其他东西是否定义了特殊字符在使用 %XXs 在 url 中编码之前应使用哪种编码?

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs?

HTTP 标准,没有.但是另一个标准 IRI 可以发挥作用.

The HTTP standard, no. But another standard, IRI, can come into play.

URI 是明确的(一旦被 % 解码)字节序列.URI 标准或 http:-scheme URI 的 HTTP 标准未指定这些字节映射到哪些 Unicode 字符.

URIs are explicitly (once %-decoded) byte sequences. What Unicode characters those bytes map onto is not specified by the URI standard or the HTTP standard for http:-scheme URIs.

特别是对于查询参数:Web 浏览器将使用原始页面的编码来生成表单提交 GET URL,因此如果您有一个 ISO-8859-1 中的页面并且您将é"放在搜索框中,那么您'将得到 '?search=%E9',但如果您在编码为 UTF-8 的页面中执行相同操作,您将得到 '?search=%C3%E9'.如果您不使用任何特定字符集提供表单页面,浏览器会猜测您不想要的,因为它会导致无法猜测提交将采用什么格式.

Specifically for query parameters: web browsers will use the encoding of the originating page to make a form submission GET URL, so if you have a page in ISO-8859-1 and you put ‘é’ in a search box you'll get ‘?search=%E9’, but if you do the same in a page encoded as UTF-8 you'll get ‘?search=%C3%E9’. If you don't serve your form page with any particular charset the browser will guess, which you don't want as it'll make it impossible to guess what format the submission is going to come in as.

对于 URL 的其他部分,浏览器不会自行生成它们,但是如果您在链接中提供非 ASCII 字符,它通常会将它们编码为 UTF-8.这并不可靠,因为它取决于浏览器和区域设置,因此目前最好不要使用它.

For the other parts of a URL, a browser won't generate them itself, but if you supply it with non-ASCII characters in links it will usually encode them as UTF-8. This is not reliable as it depends on browser and locale settings, so it's best not to use this at the moment.

在链接中正确允许非 ASCII 字符的标准是 IRI.IRI 通过对大部分 URL 进行 UTF-8-%-encoding 转换为 URI,但使用 Punycode<转换主机名/a> 代替.为了兼容性,最好不要依赖浏览器理解链接中的 IRI.相反,UTF-8-then-%-encode 您自己的路径和参数字符.它们仍然会在现代浏览器的地址栏中显示为正确的字符;不幸的是,IE 不会在所有情况下都显示解码字符 IRI 表单,具体取决于语言设置.

The standard that properly allows non-ASCII characters in links is IRI. IRI converts to URI by UTF-8-%-encoding most of the URL, but the hostname is converted using Punycode instead. For compatibility it is best not to rely on browsers understanding IRIs in links yet. Instead, UTF-8-then-%-encode your path and parameter characters yourself. They will still appear as the right characters in the address bar in modern browsers; unfortunately IE won't display the decoded-character IRI form in all cases, depending on language settings.

希腊伽马字符的 Wiki IRI 是:

The Wiki IRI for the Greek gamma character is:

http://en.wikipedia.org/wiki/Γ

编码成一个URI,就是:

Encoded into a URI, it is:

http://en.wikipedia.org/wiki/%CE%93

这篇关于HTTP get 请求字符串的正确编码是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆