当包含文件输入的形式时,编码会变得混乱 [英] encoding gets messed up when including file input in form

查看:194
本文介绍了当包含文件输入的形式时,编码会变得混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在我的表单中包含一个文件输入时,我遇到一个问题,就是编码从表单元素输入的错误。我正在使用jquery和一个servlet后端(和ajax调用),但是我不知道这与它有什么关系。 HTML页面编码设置为UTF-8,并且为servlet请求指定使用utf8的字符编码。当我从表单中删除文件输入时,编码是完整的。



当我调查请求的标题时,我在firebug中看到以下有效内容:

  ... 
------ WebKitFormBoundaryMxjJWBwBmPLxN623
内容处理:表单数据; name =createActivityTitleInputId

Ã|øåÃ|øåÃ|øåÃ|øå
...

输入的内容应该是æøåæøåæøå,我不知道webkitformboundary的东西是什么...?



如果有人可以帮助我解决这个问题,我将非常感激。



谢谢:)



-----编辑------



所以我做了一个小的测试项目,试图缩小问题。当我不使用ajax发布表单时,一切都正常。如果我使用jQuery表单插件来提交表单,那么编码失败...

  form.ajaxSubmit({
dataType:'json',
data:data,
type:'POST',
success:function(response){
successfunction(response);
}
});

任何人都可以使用这个插件吗?

解决方案


当我调查请求的标题时,我在bugzilla中看到以下有效内容:


你的意思是Firebug?您是否正在查看Firebug的Net日志记录中的post选项卡?



因为如果是这样,那么查看整个表单提交上传,并尝试解码它,包括任何上传的文件的字节内容 - 如UTF-8。如果失败,它将返回到区域设置默认编码,通常是Windows代码页1252(类似于ISO-8859-1),以显示表单提交内容。



这不会改变表单的实际提交方式!这只是Firebug的可视化。 Firebug实际上并不知道用什么字符编码来编码表单内容,只是猜测。一般来说,表单提交没有任何信息让服务器(或Firebug)知道正在使用的编码。



所以如果你提交没有文件上传的表单或者文件内容本身是有效的UTF-8序列(包括任何仅限ASCII文件)的文件上传文件,Firebug将显示整个表单提交为UTF-8,因此显示已发布的内容为您期望的字符。另一方面,如果在文件的字节中存在不是有效的UTF-8序列的序列(这对于诸如图像的任何二进制文件确实是可能的),则Firebug将尝试将字节解码为UTF-8,失败,并回到cp1252。



这将给你一个Ã|øåÃ|øåÃ|à ¸Ã¥Ã|øå,即使实际的服务器将读取为UTF-8,并获得æøåæøåæøå。 Firebug不知道文本提交值(它们是字符)和文件上传提交内容(这是字节;它们也可以表示字符)之间的区别,但如果是,则不保证上传的文件将使用相同的编码作为表单)


我不知道webkitformboundary的东西是什么...?


< blockquote>

在一个MIME multipart / 结构中,有一个分隔每个子部分的边界字符串。在 multipart / form-data 中,每个子部分都是一个表单域。边界字符串始终以一个换行符开始,然后是 - ,但是随后选择了一个任意字符串作为边界,通常涉及一个字符随机序列不太可能在数据中出现的子部分本身。



边界字符串可以是任何东西,并且在 Content-Type:multipart / form-data; boundary = 参数。 WebKit浏览器始终使用以 ---- WebKitFormBoundary 开头的边界字符串。


I'm having an issue with the encoding of input from form elements being messed up when I include a file input in my form. I'm using jquery and a servlet backend(and ajax call), but I dont see how this should have anything to do with it. HTML page encoding is set to UTF-8, and I specify the character encoding for the servlet request to use utf8 as well. When I remove the file input from the form, the encoding is allright.

When I investigate the headers for the request I see the following payload in firebug:

...
------WebKitFormBoundaryMxjJWBwBmPLxN623
Content-Disposition: form-data; name="createActivityTitleInputId"

æøåæøåæøåæøå
...

The content of the input should be æøåæøåæøå, and I do not know what the webkitformboundary stuff is...?

I would very much appreciate it if someone could help me with this problem.

Thanks :)

----- EDIT------

So I made a small test project to try to narrow down the issue. When I do not use ajax to post the form, everything works fine. If I however use the jQuery form plugin to submit the form then encoding fails...

form.ajaxSubmit({ 
        dataType: 'json',
        data: data,
        type: 'POST',
        success: function(response) {
            successfunction(response);
        }
    });

Anyone have any experience using this plugin?

解决方案

When I investigate the headers for the request I see the following payload in bugzilla:

Do you mean Firebug? Are you looking at the ‘post’ tab in the Net logging in Firebug?

Because if so what it does is to look at the entire form submission upload, and try to decode it—including the byte content of any uploaded files—as UTF-8. If that fails, it will fall back to the locale default encoding, typically Windows code page 1252 (similar to ISO-8859-1), to display the form submission content.

This doesn't change how the form was actually submitted! It's just Firebug's visualisation of that. Firebug doesn't actually know what character encoding was used to encode the form content, it's just guessing. In general a form submission does not carry any information to let the server (or Firebug) know what encoding is in use.

So if you submit a form with no file upload, or with a file upload where the file content itself is a valid UTF-8 sequence (including any ASCII-only file), Firebug will display the whole form submission as UTF-8 and so display the posted content as the characters you expected. If, on the other hand, there is a sequence in the bytes of the file that is not a valid UTF-8 sequence (which is very likely indeed for any binary file such as an image), Firebug will try to decode the bytes as UTF-8, fail, and fall back to cp1252.

This will give you a display of "æøåæøåæøåæøå", even if the actual server will be reading that as UTF-8 and getting "æøåæøåæøå". Firebug doesn't know the difference between text form submission values (which are characters) and file upload submission contents (which are bytes; they might also represent characters, but if so there is no guarantee that the uploaded file will be using the same encoding as the form).

I do not know what the webkitformboundary stuff is...?

In a MIME multipart/ structure, there is a boundary string that splits up each subpart. In multipart/form-data each subpart is a form field. The boundary string always begins with a newline then --, but then there's an arbitrary string chosen as the boundary, usually involving a random sequence of character unlikely to turn up in the data of the subpart itself.

The boundary string can be anything, and is specified in the Content-Type: multipart/form-data;boundary= parameter. WebKit browsers always use a boundary string starting with ----WebKitFormBoundary.

这篇关于当包含文件输入的形式时,编码会变得混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆