multipart / form-data中的二进制行(文件上传) [英] Binary lines in multipart/form-data (file upload)

查看:320
本文介绍了multipart / form-data中的二进制行(文件上传)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中编写一个简单的Web服务器,允许用户使用multipart / form-data上传文件。据我所知,多部分MIME数据应该是基于行的。例如,边界必须位于一行的开头。

I'm writing a simple webserver in python that allows a user to upload a file using multipart/form-data. As far as I can tell, multipart MIME data is supposed to be line based. For instance, the boundary has to be at the beginning of a line.

我无法弄清楚二元数据在这方面的处理方式。我的客户端(Firefox)将其编码为7位ASCII或任何东西,它只是它发送的原始二进制数据。它是否将数据拆分为任意位置的行?是否为多部分数据指定了最大行长度?我已经尝试通过RFC查看multipart / form-data,但没有找到任何内容。

I can't figure out how binary data is handled in this regard. My client (Firefox) is not encoding it into 7bit ASCII or anything, it's just raw binary data it's sending. Does it split the data into lines at arbitrary locations? Is there a maximum line length specified for multipart data? I've tried looking through the RFC for multipart/form-data, but didn't find anything.

推荐答案

挖掘完成后RFC,我想我终于完全理解了我的想法。身体部位(即 multipart / * 消息中各个部分的主体内容)只需要基于行,因为部分末尾的边界开始使用 CR + LF 。但除此之外,数据不一定是基于行的,如果内容中恰好有换行符,它们之间没有最大距离,也不需要进行转义(好吧,除非 Content-Transfer-Encoding 是quoted-string)。 Content-Transfer-Encoding 的7位,8位和二进制选项实际上并不表示已对数据进行了任何编码(因此没有编码)需要撤消),它们只是为了表明你可以在身体部分看到的数据类型。

After digging through the RFCs, I think I finally got it all straight in my head. The body parts (i.e., the body content of an individual part in a multipart/* message) only need to be line based in that the boundary at the end of the part begins with a CR+LF. But otherwise, the data need not be line-based, and if the content happens to have linebreaks in it, there is no maximum distance between them, nor do they need to be escaped in anyway (well, unless perhaps the Content-Transfer-Encoding is quoted-string). The 7-bit, 8-bit, and binary options for Content-Transfer-Encoding don't actually indicate that any encoding has been done on the data (and therefore no encoding needs to be undone), they're just meant to indicate the type of data you can expect to see in the body part.

我真正得到的是什么[表达不好]的问题是如何从套接字读取/缓冲数据,以便我可以确保我抓住了边界,而不必拥有任意大的缓冲区(例如,如果内容中没有发生任何换行符,所以 readline 最终缓冲整个事情。)

What I was really getting at in my [poorly expressed] question was how to read/buffer the data from the socket so that I could make sure I caught the boundary, and without having to have an arbitrarily large buffer (e.g., if there happened to be no linebreaks in the content, and so a readline ended up buffering the entire thing).

我最后做的是从套接字缓冲使用最大长度 readline ,因此缓冲区永远不会超过该值,但如果遇到换行符,也会确保终止。这确保了当边界到来时(在 CR + LF 之后),它将位于缓冲区的开头。为了确保我没有在实际的身体内容中包含最终的 CR + LF ,我不得不做一些额外的monkeying,因为根据RFC在边界之前需要,因此不是内容本身的一部分。

What I ended up doing was buffering from the socket with a readline using a maximum length, so the buffer would never be longer than that, but would also make sure to terminate if a linebreak was encountered. This ensured that when the boundary came (following a CR+LF), it would be at the beginning of the buffer. I had to do a little extra monkeying around to ensure I didn't include that final CR+LF in the actual body content, because according to the RFC it's required before the boundary, and therefore not part of the content itself.

这篇关于multipart / form-data中的二进制行(文件上传)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆