使用 urllib3 进行多部分表单编码和发布 [英] Multipart form encoding and posting with urllib3
问题描述
我正在尝试将 csv
文件上传到 本网站.但是,我遇到了一些问题,我认为这源于不正确的mimetype
(可能).
I'm attempting to upload a csv
file to this site. However, I've encountered a few issues, and I think it stems from the incorrect mimetype
(maybe).
我正在尝试通过 urllib2
手动发布文件,因此我的代码如下所示:
I'm attempting to manually post the file via urllib2
, so my code looks as follows:
import urllib
import urllib2
import mimetools, mimetypes
import os, stat
from cStringIO import StringIO
#============================
# Note: I found this recipe online. I can't remember where exactly though..
#=============================
class Callable:
def __init__(self, anycallable):
self.__call__ = anycallable
# Controls how sequences are uncoded. If true, elements may be given multiple values by
# assigning a sequence.
doseq = 1
class MultipartPostHandler(urllib2.BaseHandler):
handler_order = urllib2.HTTPHandler.handler_order - 10 # needs to run first
def http_request(self, request):
data = request.get_data()
if data is not None and type(data) != str:
v_files = []
v_vars = []
try:
for(key, value) in data.items():
if type(value) == file:
v_files.append((key, value))
else:
v_vars.append((key, value))
except TypeError:
systype, value, traceback = sys.exc_info()
raise TypeError, "not a valid non-string sequence or mapping object", traceback
if len(v_files) == 0:
data = urllib.urlencode(v_vars, doseq)
else:
boundary, data = self.multipart_encode(v_vars, v_files)
contenttype = 'multipart/form-data; boundary=%s' % boundary
if(request.has_header('Content-Type')
and request.get_header('Content-Type').find('multipart/form-data') != 0):
print "Replacing %s with %s" % (request.get_header('content-type'), 'multipart/form-data')
request.add_unredirected_header('Content-Type', contenttype)
request.add_data(data)
return request
def multipart_encode(vars, files, boundary = None, buf = None):
if boundary is None:
boundary = mimetools.choose_boundary()
if buf is None:
buf = StringIO()
for(key, value) in vars:
buf.write('--%s\r\n' % boundary)
buf.write('Content-Disposition: form-data; name="%s"' % key)
buf.write('\r\n\r\n' + value + '\r\n')
for(key, fd) in files:
file_size = os.fstat(fd.fileno())[stat.ST_SIZE]
filename = fd.name.split('/')[-1]
contenttype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'
buf.write('--%s\r\n' % boundary)
buf.write('Content-Disposition: form-data; name="%s"; filename="%s"\r\n' % (key, filename))
buf.write('Content-Type: %s\r\n' % contenttype)
# buffer += 'Content-Length: %s\r\n' % file_size
fd.seek(0)
buf.write('\r\n' + fd.read() + '\r\n')
buf.write('--' + boundary + '--\r\n\r\n')
buf = buf.getvalue()
return boundary, buf
multipart_encode = Callable(multipart_encode)
https_request = http_request
import cookielib
cookies = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),
MultipartPostHandler)
opener.addheaders = [(
'User-agent',
'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'
)]
params = {"FILENAME" : open("weather_scrape.csv", 'rb'),
'CGIREF' : '/calludt.cgi/DDFILE1',
'USE':'MODEL',
'MODEL':'CM',
'CROP':'APPLES',
'METHOD': 'SS',
'UNITS' : 'E',
'LOWTHRESHOLD': '50',
'UPTHRESHOLD': '88',
'CUTOFF':'H',
'COUNTY':'AL',
'ACTIVE':'Y',
'FROMMONTH':'3',
'FROMDAY':'15',
'FROMYEAR': '2013',
'THRUMONTH':'5',
'THRUDAY':'13',
'THRUYEAR':'2013',
'DATASOURCE' : 'FILE'
}
response = opener.open("http://www.ipm.ucdavis.edu/WEATHER/textupload.cgi", params)
现在,当我发布此内容时,一切似乎都很好,直到我单击第一个 POST
返回的后续网页上的提交按钮.然后我收到此错误消息:
Now, when I post this, all seems to be fine, until I click the submit button on the subsequent webpage that the first POST
returns. I then get this error message:
ERROR (bad data) in file 'weather.csv' at line 135.
Data record = [--192.168.117.2.1.4404.1368589639.796.1--]
Too few values found. Check delimiter specification.
现在,在调查我在浏览器中执行操作时发出的 post 请求时,我注意到 content-type
非常具体,即:
Now, upon investigating the post request that gets made when I do the actions in browser, I notice that the content-type
is very specific, namely:
------WebKitFormBoundaryfBp6Jfhv7LlPZLKd
Content-Disposition: form-data; name="FILENAME"; filename="weather.csv"
Content-Type: application/vnd.ms-excel
我不完全确定内容类型是导致错误的原因,但是..这是我目前正在排除的(因为我不知道实际上出了什么问题.)我看不到通过 urllib2 设置内容类型的任何方式,所以经过一些谷歌搜索后,我偶然发现了 urllib3.
I'm not entirely sure it the content-type is what's causing the error, but.. it's what I'm currently ruling out (as I don't know what is actually going wrong.) I don't see any way to set the content type via urllib2, so after some googling, I stumbled upon urllib3.
Urllib3
具有内置的文件发布功能,但我不完全确定如何使用它.
Urllib3
has a build in file posting capability, but I'm not entirely sure how to use it.
Filepost
urllib3.filepost.encode_multipart_formdata(fields, boundary=None)
Encode a dictionary of fields using the multipart/form-data MIME format.
Parameters:
fields –
Dictionary of fields or list of (key, value) or (key, value, MIME type) field tuples. The key is treated as the field name, and the value as the body of the form-data bytes. If the value is a tuple of two elements, then the first element is treated as the filename of the form-data section and a suitable MIME type is guessed based on the filename. If the value is a tuple of three elements, then the third element is treated as an explicit MIME type of the form-data section.
Field names and filenames must be unicode.
boundary – If not specified, then a random boundary will be generated using mimetools.choose_boundary().
urllib3.filepost.iter_fields(fields)
Iterate over fields.
Supports list of (k, v) tuples and dicts.
使用此库,我尝试将值编码为文档中的描述,但出现错误.
Using this library, I tried encoding the values as a decribes in the doc, but I'm getting errors.
我最初尝试过,只是为了测试一下,作为 dict
.
I tried initially, just to test things out, as a dict
.
params = {"FILENAME" : open("weather.csv", 'rb'),
'CGIREF' : '/calludt.cgi/DDFILE1',
'USE':'MODEL',
'MODEL':'CM',
'CROP':'APPLES',
'METHOD': 'SS',
'UNITS' : 'E',
'LOWTHRESHOLD': '50',
'UPTHRESHOLD': '88',
'CUTOFF':'H',
'COUNTY':'AL',
'ACTIVE':'Y',
'FROMMONTH':'3',
'FROMDAY':'15',
'FROMYEAR': '2013',
'THRUMONTH':'5',
'THRUDAY':'13',
'THRUYEAR':'2013',
'DATASOURCE' : 'FILE'
}
values = urllib3.filepost.encode_multipart_formdata(params)
然而,这会引发以下错误:
however, this raises the following error:
values = urllib3.filepost.encode_multipart_formdata(params)
File "c:\python27\lib\site-packages\urllib3-dev-py2.7.egg\urllib3\filepost.py", line 90, in encode_multipart_formdata
body.write(data)
TypeError: 'file' does not have the buffer interface
不确定是什么原因造成的,我尝试传入一个元组列表(键、值、mimetype),但这也会引发错误:
Not sure what caused it, I tried passing in a list of tuples (key, value, mimetype), but that also throws an error:
params = [
("FILENAME" , open("weather_scrape.csv"), 'application/vnd.ms-excel'),
('CGIREF' , '/calludt.cgi/DDFILE1'),
('USE','MODEL'),
('MODEL','CM'),
('CROP','APPLES'),
('METHOD', 'SS'),
('UNITS' , 'E'),
('LOWTHRESHOLD', '50'),
('UPTHRESHOLD', '88'),
('CUTOFF','H'),
('COUNTY','AL'),
('ACTIVE','Y'),
('FROMMONTH','3'),
('FROMDAY','15'),
('FROMYEAR', '2013'),
('THRUMONTH','5'),
('THRUDAY','13'),
('THRUYEAR','2013'),
('DATASOURCE' , 'FILE)')
]
values = urllib3.filepost.encode_multipart_formdata(params)
>>ValueError: too many values to unpack
推荐答案
如果你想为此使用 urllib3,它看起来像这样:
If you wanted to use urllib3 for this, it would look something like this:
import urllib3
http = urllib3.PoolManager()
headers = urllib3.make_headers(user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')
url = "http://www.ipm.ucdavis.edu/WEATHER/textupload.cgi"
csv_data = open("weather_scrape.csv").read()
params = {
"FILENAME": csv_data,
'CGIREF': '/calludt.cgi/DDFILE1',
'USE': 'MODEL',
'MODEL': 'CM',
'CROP': 'APPLES',
'METHOD': 'SS',
'UNITS' : 'E',
'LOWTHRESHOLD': '50',
'UPTHRESHOLD': '88',
'CUTOFF': 'H',
'COUNTY': 'AL',
'ACTIVE': 'Y',
'FROMMONTH': '3',
'FROMDAY': '15',
'FROMYEAR': '2013',
'THRUMONTH': '5',
'THRUDAY': '13',
'THRUYEAR': '2013',
'DATASOURCE' : 'FILE',
}
response = http.request('POST', url, params, headers)
我无法使用您的目标 url 和 csv 数据集对此进行测试,因此其中可能存在一些小错误.但这是总体思路.
I couldn't test this with your target url and csv data set, so it may have some small bugs in it. But that's the general idea.
这篇关于使用 urllib3 进行多部分表单编码和发布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!