检测HTTP POST请求的字符编码 [英] Detecting the character encoding of an HTTP POST request
问题描述
我正在构建一个Web服务,并且有一个接受POST的节点来创建新资源。资源需要两种内容类型中的一种 - 我将定义的XML格式或表单编码变量。
I'm building a web service and have a node that accepts a POST to create a new resource. The resource expects one of two content-types - an XML format I'll be defining, or form-encoded variables.
想法是消费应用程序可以直接POST XML并且可以从更好的验证等方面受益,但是还有一个HTML界面将POST表单编码的东西。显然XML格式有一个charset声明,但我看不到我是如何通过查看POST来检测表单的charset。
The idea is that consuming applications can POST XML directly and benefit from better validation etc., but there's also an HTML interface that will POST the form-encoded stuff. Obviously the XML format has a charset declaration, but I can't see how I detect the form's charset just from looking at the POST.
表单中的典型帖子Firefox看起来像这样:
A typical post to the form from Firefox looks like this:
POST /path HTTP/1.1
Host: www.myhostname.com
User-Agent: Mozilla/5.0 [...etc...]
Accept: text/html,application/xhtml+xml, [...etc...]
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 41
field1=value1&field2=value2&field3=value3
其中似乎没有包含任何有用的指示字符集。
Which doesn't seem to contain any useful indication of the character set.
从我所看到的,application / x-www-form-urlencoded类型完全用HTML定义,它只是列出了%编码规则,但没有说什么ch arset数据应该在。
From what I can see, the application/x-www-form-urlencoded type is entirely defined in HTML, which just lays out the %-encoding rules, but doesn't say anything about what charset the data should be in.
基本上,如果我不知道HTML最初呈现的字符集,有没有办法告诉字符集?否则我将不得不尝试根据存在的字符来猜测字符集,而且总是有点不确定。
Basically, is there any way of telling the character set if I don't know the character set the HTML originally presented was? Otherwise I'll have to try and guess the character set based on what chars are present, and that's always a bit iffy from what I can tell.
推荐答案
HTTP POST的默认编码是ISO-8859-1。
the default encoding of a HTTP POST is ISO-8859-1.
否则你必须查看Content-Type标题然后看起来喜欢
else you have to look at the Content-Type header that will then look like
Content-Type: application/x-www-form-urlencoded ; charset=UTF-8
您可以使用
<form enctype="application/x-www-form-urlencoded;charset=UTF-8">
或
<form accept-charset="UTF-8">
强制进行编码。
一些参考:
http: //www.htmlhelp.com/reference/html40/forms/form.html
http://www.w3schools.com/tags/tag_form.asp
这篇关于检测HTTP POST请求的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!