检测HTTP POST请求的字符编码 [英] Detecting the character encoding of an HTTP POST request

查看:122
本文介绍了检测HTTP POST请求的字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个Web服务,并且有一个接受POST的节点来创建新资源。资源需要两种内容类型中的一种 - 我将定义的XML格式或表单编码变量。

I'm building a web service and have a node that accepts a POST to create a new resource. The resource expects one of two content-types - an XML format I'll be defining, or form-encoded variables.

想法是消费应用程序可以直接POST XML并且可以从更好的验证等方面受益,但是还有一个HTML界面将POST表单编码的东西。显然XML格式有一个charset声明,但我看不到我是如何通过查看POST来检测表单的charset。

The idea is that consuming applications can POST XML directly and benefit from better validation etc., but there's also an HTML interface that will POST the form-encoded stuff. Obviously the XML format has a charset declaration, but I can't see how I detect the form's charset just from looking at the POST.

表单中的典型帖子Firefox看起来像这样:

A typical post to the form from Firefox looks like this:

POST /path HTTP/1.1
Host: www.myhostname.com
User-Agent: Mozilla/5.0 [...etc...]
Accept: text/html,application/xhtml+xml, [...etc...]
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 41

field1=value1&field2=value2&field3=value3

其中似乎没有包含任何有用的指示字符集。

Which doesn't seem to contain any useful indication of the character set.

从我所看到的,application / x-www-form-urlencoded类型完全用HTML定义,它只是列出了%编码规则,但没有说什么ch arset数据应该在。

From what I can see, the application/x-www-form-urlencoded type is entirely defined in HTML, which just lays out the %-encoding rules, but doesn't say anything about what charset the data should be in.

基本上,如果我不知道HTML最初呈现的字符集,有没有办法告诉字符集?否则我将不得不尝试根据存在的字符来猜测字符集,而且总是有点不确定。

Basically, is there any way of telling the character set if I don't know the character set the HTML originally presented was? Otherwise I'll have to try and guess the character set based on what chars are present, and that's always a bit iffy from what I can tell.

推荐答案

HTTP POST的默认编码是ISO-8859-1。

the default encoding of a HTTP POST is ISO-8859-1.

否则你必须查看Content-Type标题然后看起来喜欢

else you have to look at the Content-Type header that will then look like

Content-Type: application/x-www-form-urlencoded ; charset=UTF-8

您可以使用

<form enctype="application/x-www-form-urlencoded;charset=UTF-8">

<form accept-charset="UTF-8">

强制进行编码。

一些参考:

http: //www.htmlhelp.com/reference/html40/forms/form.html

http://www.w3schools.com/tags/tag_form.asp

这篇关于检测HTTP POST请求的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆