HTTP获取请求字符串的正确编码是什么? [英] What's the correct encoding of HTTP get request strings?

查看:150
本文介绍了HTTP获取请求字符串的正确编码是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HTTP标准或某些东西在使用%XXs的URL进行编码之前,应该在特殊字符上使用哪个编码?如果没有定义是否有一种方式定义使用哪个编码?大多数浏览器似乎以utf-8的形式发送数据。

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs? If it doesn't define is there a way define which encoding is used? It seems that most browsers send the data in utf-8.

推荐答案


HTTP标准或某些特定字符在使用%XXs的URL进行编码之前应该使用哪些编码?

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs?

HTTP标准,但另一个标准IRI可以发挥作用。

The HTTP standard, no. But another standard, IRI, can come into play.

URI是明确的(一旦是%-decoded)字节序列。这些字节映射到哪些Unicode字符不是由URI标准或HTTP标准指定为-scheme URI。

URIs are explicitly (once %-decoded) byte sequences. What Unicode characters those bytes map onto is not specified by the URI standard or the HTTP standard for http:-scheme URIs.

具体用于查询参数:Web浏览器将使用编码原始页面以提交表单提交GET URL,因此如果您在ISO-8859-1中有一个页面,并且在搜索框中放置é,您将获得'?search =%E9',但如果您在编码为UTF-8的页面中执行相同操作,您将得到'?search =%C3%E9'。如果您不使用浏览器猜测的任何特定字符集来提供表单页面,那么您不需要它,因为它将无法猜测提交的格式将作为。

Specifically for query parameters: web browsers will use the encoding of the originating page to make a form submission GET URL, so if you have a page in ISO-8859-1 and you put ‘é’ in a search box you'll get ‘?search=%E9’, but if you do the same in a page encoded as UTF-8 you'll get ‘?search=%C3%E9’. If you don't serve your form page with any particular charset the browser will guess, which you don't want as it'll make it impossible to guess what format the submission is going to come in as.

对于URL的其他部分,浏览器本身不会生成,但如果在链接中提供非ASCII字符,则通常将其编码为UTF-8。这是不可靠的,因为它取决于浏览器和区域设置,所以现在最好不要使用这个。

For the other parts of a URL, a browser won't generate them itself, but if you supply it with non-ASCII characters in links it will usually encode them as UTF-8. This is not reliable as it depends on browser and locale settings, so it's best not to use this at the moment.

正确地允许链接中的非ASCII字符的标准是 IRI 。 IRI转换为UTF-8 - %编码为URL大部分URL,但主机名转换使用 Punycode 代替。为了兼容性,最好不要依赖浏览器了解链接中的IRI。相反,UTF-8 - 然后 - % - 自己编码您的路径和参数字符。它们仍将在现代浏览器的地址栏中显示为正确的字符;不幸的是,根据语言设置,IE不会显示所有情况下的解码字符IRI表单。

The standard that properly allows non-ASCII characters in links is IRI. IRI converts to URI by UTF-8-%-encoding most of the URL, but the hostname is converted using Punycode instead. For compatibility it is best not to rely on browsers understanding IRIs in links yet. Instead, UTF-8-then-%-encode your path and parameter characters yourself. They will still appear as the right characters in the address bar in modern browsers; unfortunately IE won't display the decoded-character IRI form in all cases, depending on language settings.

希腊伽玛字符的维基IRI为:

The Wiki IRI for the Greek gamma character is:

http://en.wikipedia.org/wiki/Γ

编码成一个URI,它是:

Encoded into a URI, it is:

http://en.wikipedia.org/wiki/%CE%93

这篇关于HTTP获取请求字符串的正确编码是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆