mutipart formdata中文件名中的国际字符 [英] International characters in filename in mutipart formdata

查看:71
本文介绍了mutipart formdata中文件名中的国际字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Apache HTTP组件(4.1-alpha2)将文件上传到保管箱.这是使用多部分表单数据完成的.用包含国际(非ascii)字符的多部分格式编码文件名的正确方法是什么?

I am using Apache HTTP components (4.1-alpha2) to upload a files to dropbox. This is done using multipart form data. What is the correct way to encode filenames in in a multipart form that contain international (non-ascii) characters?

如果我在那里使用标准API,则服务器将返回HTTP状态禁止".如果我修改了上传代码,则文件名使用了urlencoded:

If I use there standard API, the server returns an HTTP status Forbidden. If I modify the upload code so the file name is urlencoded:

MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
FileBody bin = new FileBody(file_obj, URLEncoder.encode(file_obj.getName(), HTTP.UTF_8), HTTP.UTF_8, HTTP.OCTET_STREAM_TYPE );
entity.addPart("file", bin);            
req.setEntity(entity);

文件已上传,但最终得到的文件名仍然经过编码.例如.%D1%82%D0%B5%D1%81%D1%82.txt

The file is uploaded, but I end up with a filename that is still encoded. E.g. %D1%82%D0%B5%D1%81%D1%82.txt

推荐答案

要专门针对保管箱服务器解决此问题,我必须在utf8中编码文件名.为此,我必须声明我的多部分实体,如下所示:

To solve this issue specifically for the dropbox server I had to encode the filename in utf8. To do this I had to declare my multipart entity as follows:

MultipartEntity实体=新的MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE,空,Charset.forName(HTTP.UTF_8));

我之所以被禁止,是因为OAuth签名的实体与发送的实际实体不匹配(已被URL编码).

I was getting the forbidden because of the OAuth signed entity not matching the actual entity sent (it was being URL encoded).

对于那些对标准有什么要求感兴趣的人,我读了一些RFC.如果严格遵守该标准,则所有标头都应编码为7位,这将使文件名的utf8编码非法.但是RFC2388()指出:

For those interested on what the standards have to say on this I did some reading of RFCs. If the standard is strictly adhered then all headers should be encoded 7bit, this would make utf8 encoding of the filename illegal. However RFC2388 () states:

原始本地文件名可能是也提供了文件名"参数之一内容配置:表单数据"标头,如果是多个,则为文件,位于内容处置"中:文件"子部分的标题.发送应用程序可以提供一个文件名称;如果发件人的文件名操作系统不是US-ASCII,文件名可能是近似值,或使用RFC的方法进行编码2331.

The original local file name may be supplied as well, either as a "filename" parameter either of the "content-disposition: form-data" header or, in the case of multiple files, in a "content-disposition: file" header of the subpart. The sending application MAY supply a file name; if the file name of the sender's operating system is not in US-ASCII, the file name might be approximated, or encoded using the method of RFC 2231.

许多帖子提到使用rfc2231或rfc2047对7位非US-ASCII的标头进行编码.但是,rfc2047在第5.3节中明确指出,不得在Content-Disposition字段上使用编码字.这只会留下rfc2231,但这是扩展,不能依赖于在所有服务器上实现.实际上,大多数主流浏览器都以UTF-8发送非US-ASCII字符(因此在Apache HTTP客户端中为HttpMultipartMode.BROWSER_COMPATIBLE模式),因此大多数Web服务器都支持此功能.要注意的另一件事是,如果在多部分实体上使用HttpMultipartMode.STRICT,则该库实际上将用非ASCII代替文件名中的问号(?).S

Many posts mention using either rfc2231 or rfc2047 for encoding headers in non US-ASCII in 7bit. However rfc2047 explicitly states in section 5.3 encoded words MUST NOT be used on a Content-Disposition field. This would only leave rfc2231, this however is an extension and cannot be relied upon being implemented in all servers. The reality is most of the major browsers send non-US-ASCII characters in UTF-8 (hence the HttpMultipartMode.BROWSER_COMPATIBLE mode in Apache HTTP client), and because of this most web servers will support this. Another thing to note is that if you use HttpMultipartMode.STRICT on the multipart entity, the library will actually substitute non-ASCII for question mark (?) in the filename.S

这篇关于mutipart formdata中文件名中的国际字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆