动态组装scrapy GET请求字符串 [英] Dynamically assembling scrapy GET request string
问题描述
我一直在使用 firebug,我有以下字典来查询 api.
I've been working with firebug and I've got the following dictionaries to query an api.
url = "htp://my_url.aspx#top"
querystring = {"dbkey":"x1","stype":"id","s":"27"}
headers = {
'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
'upgrade-insecure-requests': "1",
'user-agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.125
}
对于python请求,使用起来很简单:
with python requests, using this is as simple as:
import requests
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
如何在 Scrapy 中使用这些?我一直在阅读 http://doc.scrapy.org/en/latest/topics/request-response.html 并且我知道以下内容适用于发布:
How can I use these in Scrapy? I've been reading http://doc.scrapy.org/en/latest/topics/request-response.html and I know that the following works for post:
r = Request(my_url, method="post", headers= headers, body=payload, callback=self.parse_method)
我试过了:
r = Request("GET", url, headers=headers, body=querystring, callback=self.parse_third_request)
我得到:
r = Request("GET", url, headers=headers, body=querystring, callback=self.parse_third_request)
TypeError: __init__() got multiple values for keyword argument 'callback'
改为:
r = Request(method="GET", url=url, headers=headers, body=querystring, callback=self.parse_third_request)
现在得到:
File "C:\envs\r2\tutorial\tutorial\spiders\parker_spider.py", line 90, in parse_second_request
r = Request(method="GET", url=url, headers=headers, body=querystring, callback=self.parse_third_request)
File "C:\envs\virtalenvs\teat\lib\site-packages\scrapy\http\request\__init__.py", line 26, in __init__
self._set_body(body)
File "C:\envs\virtalenvs\teat\lib\site-packages\scrapy\http\request\__init__.py", line 68, in _set_body
self._body = to_bytes(body, self.encoding)
File "C:\envs\virtalenvs\teat\lib\site-packages\scrapy\utils\python.py", line 117, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got dict
编辑 2:
我现在有:
yield Request(method="GET", url=url, headers=headers, body=urllib.urlencode(querystring), callback=self.parse_third_request)
def parse_third_request(self, response):
from scrapy.shell import inspect_response
inspect_response(response, self)
print("hi")
return None
没有错误,但在 shell 中,当我执行response.url"时,我只得到没有获取参数的基本 url.
There are no errors but in the shell when I do "response.url" I only get the base url with no get parameters.
推荐答案
看Request
初始化方法的签名:
class scrapy.http.Request(url[, callback, method='GET', headers, body, cookies, meta, encoding='utf-8', priority=0, dont_filter=False, errback])
GET
字符串在您的情况下用作 callback
参数的位置值.
GET
string in your case is used as a positional value for the callback
argument.
使用 关键字参数代替 method
(尽管 GET
是默认值):
Use a keyword argument for the method
instead (though GET
is the default):
r = Request(url, method="GET", headers=headers, body=querystring, callback=self.parse_third_request)
这篇关于动态组装scrapy GET请求字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!