我正确解析这个HTTP POST请求吗？ [英] Am I parsing this HTTP POST request properly?

查看：135 发布时间：2017/11/7 20:57:17 python http parsing file-upload twisted.web

本文介绍了我正确解析这个HTTP POST请求吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我开始说，我正在使用 twisted.web 框架。 Twisted.web 的文件上传没有像我想要的那样工作（它只包含文件数据，没有其他信息）， cgi.parse_multipart 不能像我想要的那样工作（同样的事情， twisted.web 使用这个函数）， cgi.FieldStorage 不起作用（因为我通过扭曲获取POST数据，而不是CGI接口 - 据我所知， FieldStorage 试图通过stdin得到请求）， twisted.web2 对我不起作用，因为使用 Deferred 困惑和激怒我（对于我想要的太复杂了）。

这就是说，我决定尝试自己解析HTTP请求。

使用Chrome浏览器，HTTP请求就是这样形成的：

  ------ WebKitFormBoundary7fouZ8mEjlCe92pq 
 Content-Disposition：form-data; name =upload_file_nonce
 
 11b03b61-9252-11df-a357-00266c608adb 
 ------ WebKitFormBoundary7fouZ8mEjlCe92pq 
 Content-Disposition：form-data; NAME = 文件;文件名=login.html
内容类型：text / html 
 
<！DOCTYPE html> 
< html> 
< head> 
 
 ... 
 
 ------ WebKitFormBoundary7fouZ8mEjlCe92pq 
 Content-Disposition：form-data; NAME = 文件; filename =
 
 
 ------ WebKitFormBoundary7fouZ8mEjlCe92pq--

这总是会如何形成？我正在用正则表达式解析它，就像这样（请原谅代码的墙壁）：

（注意，我剪掉了大部分代码，只显示了我的想法是相关的（正则表达式（是，嵌套圆括号），这是一个 __ init __ 方法（迄今为止唯一的方法）在一个 Uploads class I build。完整的代码可以在修订历史记录中看到（我希望我没有不匹配任何括号）

 <$ c格式（边界）：
已完成=真
 
如果in_header ==真并且不是$：
 in_header = False 
如果'type'不在current_file中：
 ignore_current_file = True 
 
如果in_header == True：
m = re.match（
 Content-Disposition：form-data; name = \（。*？）\; filename = \（。*？）\$，line）
 if m：
输入名，current_file ['filename'] = m.group（1），m.group（2）
 
m = re.match（Content-Type：（。*）$，line） 
如果m：
 current_file ['type'] = m.group（1）
 
 else：
如果'data'不在current_file中：
 current_file ['data '] =行
 else：
 current_file ['data'] + =行

<你可以看到，我开始一个新的文件字典每当到达边界。我将 in_header 设置为 True 来表示我正在解析标题。当我到达空行时，我将它切换到 False - 但在检查 Content-Type 是否为设置为表单值 - 如果没有，我设置 ignore_current_file 因为我只是在寻找文件上传。

我知道我应该使用一个库，但是我厌倦了阅读文档，试图获得不同的解决方案在我的项目中工作，仍然有代码看起来合理。我只是想过去这部分 - 如果解析一个HTTP POST的文件上传是这么简单，那么我会坚持。

注意：此代码工作完美的现在，我只是想知道是否会呛/吐出来自某些浏览器的请求。

解决方案

试图避免阅读文档，但我认为最好的建议是实际阅读：

rfc 2388 从表单返回值：multipart / form-data rfc 1867 以HTML格式上传文件

以确保您不会错过任何情况。更简单的路线可能是使用海报图书馆。

Let me start off by saying, I'm using the twisted.web framework. Twisted.web's file uploading didn't work like I wanted it to (it only included the file data, and not any other information), cgi.parse_multipart doesn't work like I want it to (same thing, twisted.web uses this function), cgi.FieldStorage didn't work ('cause I'm getting the POST data through twisted, not a CGI interface -- so far as I can tell, FieldStorage tries to get the request via stdin), and twisted.web2 didn't work for me because the use of Deferred confused and infuriated me (too complicated for what I want).

That being said, I decided to try and just parse the HTTP request myself.

Using Chrome, the HTTP request is formed like this:
------WebKitFormBoundary7fouZ8mEjlCe92pq Content-Disposition: form-data; name="upload_file_nonce" 11b03b61-9252-11df-a357-00266c608adb ------WebKitFormBoundary7fouZ8mEjlCe92pq Content-Disposition: form-data; name="file"; filename="login.html" Content-Type: text/html <!DOCTYPE html> <html> <head> ... ------WebKitFormBoundary7fouZ8mEjlCe92pq Content-Disposition: form-data; name="file"; filename="" ------WebKitFormBoundary7fouZ8mEjlCe92pq--
Is this always how it will be formed? I'm parsing it with regular expressions, like so (pardon the wall of code):

(note, I snipped out most of the code to show only what I thought was relevant (the regular expressions (yeah, nested parentheses), this is an __init__ method (the only method so far) in an Uploads class I built. The full code can be seen in the revision history (I hope I didn't mismatch any parentheses)
if line == "--{0}--".format(boundary): finished = True if in_header == True and not line: in_header = False if 'type' not in current_file: ignore_current_file = True if in_header == True: m = re.match( "Content-Disposition: form-data; name=\"(.*?)\"; filename=\"(.*?)\"$", line) if m: input_name, current_file['filename'] = m.group(1), m.group(2) m = re.match("Content-Type: (.*)$", line) if m: current_file['type'] = m.group(1) else: if 'data' not in current_file: current_file['data'] = line else: current_file['data'] += line
you can see that I start a new "file" dict whenever a boundary is reached. I set in_header to True to say that I'm parsing headers. When I reach a blank line, I switch it to False -- but not before checking if a Content-Type was set for that form value -- if not, I set ignore_current_file since I'm only looking for file uploads.

I know I should be using a library, but I'm sick to death of reading documentation, trying to get different solutions to work in my project, and still having the code look reasonable. I just want to get past this part -- and if parsing an HTTP POST with file uploads is this simple, then I shall stick with that.

Note: this code works perfectly for now, I'm just wondering if it will choke on/spit out requests from certain browsers.
解决方案
You're trying to avoid reading documentation, but I think the best advice is to actually read:

rfc 2388 Returning Values from Forms: multipart/form-data

rfc 1867 Form-based File Upload in HTML

to make sure you don't miss any cases. An easier route might be to use the poster library.

这篇关于我正确解析这个HTTP POST请求吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我正确解析这个HTTP POST请求吗？ [英] Am I parsing this HTTP POST request properly?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

我正确解析这个HTTP POST请求吗？ [英] Am I parsing this HTTP POST request properly?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭