正确格式的multipart / form-data的身体 [英] Properly format multipart/form-data body

查看:1354
本文介绍了正确格式的multipart / form-data的身体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写上传的东西,包括使用的内容类型定义的的multipart / form-data的文件,一个脚本://tool​​s.ietf .ORG / HTML / rfc2388> RFC 2388 。从长远来看,我想提供一个简单的Python脚本做<一个href="http://developer.github.com/v3/repos/downloads/#create-a-new-download-part-2-upload-file-to-s3">uploads二进制包github上,其中包括发送的形式,如数据到Amazon S3。

I'm writing a script to upload stuff including files using the multipart/form-data content type defined in RFC 2388. In the long run, I'm trying to provide a simple Python script to do uploads of binary packages for github, which involves sending form-like data to Amazon S3.

这个问题已经问如何做到这一点,但它是没有一个公认的答案,到目前为止,和< A HREF =htt​​p://stackoverflow.com/a/7064786/1468366>两个答案就目前点<一个比较有用的 href="http://$c$c.activestate.com/recipes/146306-http-client-to-post-using-multipartform-data/">these食谱从而手动构建整个消息。我有些担心这种做法,特别是在字符集和二进制内容。

This question has already asked about how to do this, but it is without an accepted answer so far, and the more useful of the two answers it currently has points to these recipes which in turn build the whole message manually. I am somewhat concerned about this approach, particularly with regard to charsets and binary content.

还有这个问题,其的目前得分最高的答案暗示 MultipartPostHandler 模块。但是,这是不是从我所提到的配方很大的不同,因此,我的顾虑适用寿这一点。

There is also this question, with its currently highest-scoring answer suggesting the MultipartPostHandler module. But that is not much different from the recipes I mentioned, and therefore my concerns apply tho that as well.

RFC 2388第4.3节明确地指出,内容预计将在7位,除非另有声明,以及因此<一个href="https://en.wikipedia.org/wiki/MIME#Content-Transfer-Encoding"><$c$c>Content-Transfer-Encoding头的可能需要。这是否意味着我必须为Base64恩code二进制文件的内容?或将内容传输编码:8位足以任意文件?还是应该读内容传输编码:二进制

RFC 2388 Section 4.3 explicitely states that content is expected to be 7 bit unless declared otherwise, and therefore a Content-Transfer-Encoding header might be required. Does that mean I'd have to Base64-encode binary file content? Or would Content-Transfer-Encoding: 8bit be sufficient for arbitrary files? Or should that read Content-Transfer-Encoding: binary?

头一般领域,并且特别是文件名头域,是ASCII仅默认。我想我的方法,以便能够通过非ASCII文件名,以及。我知道,我目前的上传工具和github上的应用程序,我可能就不再需要为文件名是在一个单独的领域。但我想我的code可重复使用,所以我宁愿EN $ C C符合规范的方式将文件名参数$。 RFC 2388第4.4节建议中的 RFC 2231 ,例如: 文件名* = UTF-8''t%C3%A4st.txt

Header fields in general, and the filename header field in particular, are ASCII only by default. I'd like my method to be able to pass non-ASCII file names as well. I know that for my current application of uploading stuff for github, I probably won't need that as the file name is given in a separate field. But I'd like my code to be reusable, so I'd rather encode the file name parameter in a conforming way. RFC 2388 Section 4.4 advises the format introduced in RFC 2231, e.g. filename*=utf-8''t%C3%A4st.txt.

由于的multipart / form-data的基本上是一个MIME类型,我认为它应该是可以使用的从标准Python库电子邮件撰写我的职务。特别是非ASCII头字段的相当复杂的处理是我想委托。

As multipart/form-data is essentially a MIME type, I thought that it should be possible to use the email package from the standard python libraries to compose my post. The rather complicated handling of non-ASCII header fields in particular is something I'd like to delegate.

所以我写了下面code:

So I wrote the following code:

#!/usr/bin/python3.2

import email.charset
import email.generator
import email.header
import email.mime.application
import email.mime.multipart
import email.mime.text
import io
import sys

class FormData(email.mime.multipart.MIMEMultipart):

    def __init__(self):
        email.mime.multipart.MIMEMultipart.__init__(self, 'form-data')

    def setText(self, name, value):
        part = email.mime.text.MIMEText(value, _charset='utf-8')
        part.add_header('Content-Disposition', 'form-data', name=name)
        self.attach(part)
        return part

    def setFile(self, name, value, filename, mimetype=None):
        part = email.mime.application.MIMEApplication(value)
        part.add_header('Content-Disposition', 'form-data',
                        name=name, filename=filename)
        if mimetype is not None:
            part.set_type(mimetype)
        self.attach(part)
        return part

    def http_body(self):
        b = io.BytesIO()
        gen = email.generator.BytesGenerator(b, False, 0)
        gen.flatten(self, False, '\r\n')
        b.write(b'\r\n')
        b = b.getvalue()
        pos = b.find(b'\r\n\r\n')
        assert pos >= 0
        return b[pos + 4:]

fd = FormData()
fd.setText('foo', 'bar')
fd.setText('täst', 'Täst')
fd.setFile('file', b'abcdef'*50, 'Täst.txt')
sys.stdout.buffer.write(fd.http_body())

结果是这样的:

The result looks like this:

--===============6469538197104697019==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: form-data; name="foo"

YmFy

--===============6469538197104697019==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: form-data; name*=utf-8''t%C3%A4st

VMOkc3Q=

--===============6469538197104697019==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: form-data; name="file"; filename*=utf-8''T%C3%A4st.txt

YWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJj
ZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVm
YWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJj
ZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVm
YWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJjZGVmYWJj
ZGVmYWJjZGVmYWJjZGVm

--===============6469538197104697019==--

它似乎处理头还算不错。二进制文件的内容将得到的base64恩codeD,这可能是可以避免的,但它应该工作不够好。我担心的是介于两者之间的文本字段。他们的base64恩codeD也是如此。我认为,根据标准,这应该工作不够好,但我宁愿有明文在那里,以防万一一些愚蠢的框架,具有处理数据处于中间水平,不知道的Base64 EN codeD数据。

It does seem to handle headers reasonably well. Binary file content will get base64-encoded, which might be avoidable but which should work well enough. What worries me are the text fields in between. They are base64-encoded as well. I think that according to the standard, this should work well enough, but I'd rather have plain text in there, just in case some dumb framework has to deal with the data at an intermediate level and does not know about Base64 encoded data.

  • 我可以用我的文本字段的8位数据,并仍然符合规范?
  • 我可以得到email包,无需额外的编码序列化我的文字领域的8位数据?
  • 如果我不得不坚持一些7位编码,可我得到了实现,以使用引用的可打印这些文本部分,其中该编码比的base64短?
  • 在我能避免对二进制文件的内容base64编码呢?
  • 如果我能避免它,我应该写在内容传输编码 8位
  • 如果我的身体我自己序列化,我怎么能使用 email.header 自己只格式标头值包(<一href="http://docs.python.org/3/library/email.util.html#email.utils.en$c$c_rfc2231"><$c$c>email.utils.en$c$c_rfc2231做到这一点。)
  • 有一些实现,已经做了所有我想要做什么?
  • Can I use 8 bit data for my text fields and still conform to the specification?
  • Can I get the email package to serialize my text fields as 8 bit data without extra encoding?
  • If I have to stick to some 7 bit encoding, can I get the implementation to use quoted printable for those text parts where that encoding is shorter than base64?
  • Can I avoid base64 encoding for binary file content as well?
  • If I can avoid it, should I write the Content-Transfer-Encoding as 8bit or as binary?
  • If I had to serialize the body myself, how could I use the email.header package on its own to just format header values? (email.utils.encode_rfc2231 does this.)
  • Is there some implementation that already did all I'm trying to do?

这些问题是非常密切的关系,并且可以概括为你将如何实现这个。在许多情况下,回答一个问题提出的答案或废弃另一个。所以,我希望你同意,所有的人都一个职位是合适的。

These questions are very closely related, and could be summarized as "how would you implement this". In many cases, answering one question either answers or obsoletes another one. So I hope you agree that a single post for all of them is appropriate.

推荐答案

这是一个占位符的答案,说明我做了什么在等待一些权威输入我的一些问题。我会很乐意接受不同的答案,如果它证明这种做法是错误的或不合适的设计决策的至少一个。

This is a placeholder answer, describing what I did while waiting for some authoritative input to some of my questions. I'll be happy to accept a different answer if it demonstrates that this approach is wrong or unsuitable in at least one of the design decisions.

这里是code我用于根据我的口味,现在来完成这项工作。 我做出了以下决定:

Here is the code I used to make this work according to my taste for now. I made the following decisions:

我可以使用8位数据,我的文本字段,并仍然符合规范?

Can I use 8 bit data for my text fields and still conform to the specification?

我决定这样做。至少在本申请中,它的工作。

I decided to do so. At least for this application, it does work.

我可以得到email包,无需额外的编码序列化我的文字领域的8位数据?

Can I get the email package to serialize my text fields as 8 bit data without extra encoding?

我发现没有办法,所以我做我自己的序列化,就如同所有的其他我看到这个食谱。

I found no way, so I'm doing my own serialization, just as all the other recipes I saw on this.

我能避免对二进制文件的内容base64编码呢?

Can I avoid base64 encoding for binary file content as well?

简单地发送二进制文件的内容似乎工作不够好,至少在我的单个应用程序。

Simply sending the file content in binary seems to work well enough, at least in my single application.

如果我能避免它,我应该写的内容传输编码为8位或二进制?

If I can avoid it, should I write the Content-Transfer-Encoding as 8bit or as binary?

由于 RFC 2045第2.8节的状态,即 8位资料如有的CRLF对之间998个字节的线路长度的限制,我决定,是更为普遍,因此更恰当的描述在这里。

As RFC 2045 Section 2.8 states, that 8bit data is subject to a line length limitation of 998 octets between CRLF pairs, I decided that binary is the more general and thus the more appropriate description here.

如果我的身体我自己,我怎么能使用email.header包装自己,只是格式标头值序列化?

If I had to serialize the body myself, how could I use the email.header package on its own to just format header values?

由于已经编辑成我的问题,的 email.utils.en code_rfc2231 是对这个非常有用的。我尝试连接code首先使用ASCII,但使用该方法的情况下,这是一个双引号字符串内禁止任何非ASCII数据或ASCII字符。

As already edited into my question, email.utils.encode_rfc2231 is very useful for this. I try to encode using ascii first, but use that method in case of either non-ascii data or ascii characters which are forbidden inside a double-quoted string.

有一些实现,已经做了所有我想要做什么?

Is there some implementation that already did all I'm trying to do?

这并不是说我知道的。其他实现方式邀请通过从我的code ,虽然

Not that I'm aware of. Other implementations are invited to adopt ideas from my code, though.

编辑:

由于此评论我现在知道的是,使用RFC 2231的头是没有被普遍接受:HTML 5的禁止其使用​​。它也被视为引起野生的问题。但由于柱插头并不总是对应于特定HTML文档(思网络的API为例),我不知道我相信在这方面,该草案要么。也许是正确的方式去是给了en codeD和unen codeD的名字,顺便的 RFC 5987第4.2节建议。但是,RFC是HTTP标头,而多部分/格式数据头在技术上是HTTP主体。这RFC因此不适用。我不知道任何RFC这将明确地允许(甚至鼓励)采用两种形式同时进行的multipart / form-data的的。

Thanks to this comment I'm now aware that the use of RFC 2231 for headers is not universally accepted: the current draft of HTML 5 forbids its use. It has also been seen to cause problems in the wild. But since POST headers not always correspond to a specific HTML document (think web APIs for example), I'm not sure I'd trust that draft in that regard either. Perhaps the right way to go is giving both encoded and unencoded name, the way RFC 5987 Section 4.2 suggests. But that RFC is for HTTP headers, while a multipart/form-data header is technically HTTP body. That RFC therefore doesn't apply, and I do not know of any RFC which would explicitely allow (or even encourage) the use of both forms simultaneously for multipart/form-data.

这篇关于正确格式的multipart / form-data的身体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆