检测 HTTP POST 请求的字符编码 [英] Detecting the character encoding of an HTTP POST request

查看:28
本文介绍了检测 HTTP POST 请求的字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个 Web 服务,并且有一个接受 POST 以创建新资源的节点.该资源需要两种内容类型之一 - 我将定义的 XML 格式或表单编码变量.

I'm building a web service and have a node that accepts a POST to create a new resource. The resource expects one of two content-types - an XML format I'll be defining, or form-encoded variables.

这个想法是消费应用程序可以直接 POST XML 并从更好的验证等中受益,但还有一个 HTML 界面可以 POST 表单编码的东西.显然,XML 格式有一个字符集声明,但我无法仅通过查看 POST 来了解如何检测表单的字符集.

The idea is that consuming applications can POST XML directly and benefit from better validation etc., but there's also an HTML interface that will POST the form-encoded stuff. Obviously the XML format has a charset declaration, but I can't see how I detect the form's charset just from looking at the POST.

来自 Firefox 的典型表单帖子如下所示:

A typical post to the form from Firefox looks like this:

POST /path HTTP/1.1
Host: www.myhostname.com
User-Agent: Mozilla/5.0 [...etc...]
Accept: text/html,application/xhtml+xml, [...etc...]
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 41

field1=value1&field2=value2&field3=value3

其中似乎没有包含任何有用的字符集指示.

Which doesn't seem to contain any useful indication of the character set.

据我所知,application/x-www-form-urlencoded 类型完全是在 HTML 中定义的,它只是列出了 %-encoding 规则,但没有说明数据应该是什么字符集.

From what I can see, the application/x-www-form-urlencoded type is entirely defined in HTML, which just lays out the %-encoding rules, but doesn't say anything about what charset the data should be in.

基本上,如果我不知道 HTML 最初呈现的字符集是什么,有没有办法告诉字符集?否则,我将不得不尝试根据存在的字符来猜测字符集,而我所知道的总是有点不确定.

Basically, is there any way of telling the character set if I don't know the character set the HTML originally presented was? Otherwise I'll have to try and guess the character set based on what chars are present, and that's always a bit iffy from what I can tell.

推荐答案

HTTP POST 的默认编码是 ISO-8859-1.

the default encoding of a HTTP POST is ISO-8859-1.

否则,您必须查看 Content-Type 标头,然后看起来像

else you have to look at the Content-Type header that will then look like

Content-Type: application/x-www-form-urlencoded ; charset=UTF-8

你也许可以用

<form enctype="application/x-www-form-urlencoded;charset=UTF-8">

<form accept-charset="UTF-8">

强制编码.

一些参考:

http://www.htmlhelp.com/reference/html40/forms/表单.html

http://www.w3schools.com/tags/tag_form.asp

这篇关于检测 HTTP POST 请求的字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆