从scrapy函数内向API发出请求 [英] making request to API from within scrapy function
问题描述
我正在使用scrapy.我想在每个请求的基础上轮换代理,并从我拥有的返回单个代理的 api 获取代理.我的计划是向api发出请求,得到一个代理,然后用它来设置基于:
I'm working with scrapy. I want to rotate proxies on a per request basis and get a proxy from an api I have that returns a single proxy. My plan is to make a request to the api, get a proxy, then use it to set the proxy based on :
http://stackoverflow.com/questions/4710483/scrapy-and-proxies
我会在哪里分配它:
request.meta['proxy'] = 'your.proxy.address';
我有以下几点:
class ContactSpider(Spider):
name = "contact"
def parse(self, response):
for i in range(1,3,1):
PR= Request('htp//myproxyapi.com', headers= self.headers)
newrequest= Request('htp//sitetoscrape.com', headers= self.headers)
newrequest.meta['proxy'] = PR
但我不确定如何使用 Scrapy Request 对象来执行 api 调用.调试时我没有收到对 PR 请求的响应.我是否需要在单独的函数中执行此操作并使用 yield 语句,还是我的方法错误?
but I'm not sure how to use The Scrapy Request object to perform the api call. I'm Not getting a response to the PR request while debugging. Do I need to do this in a separate function and use a yield statement or is my approach wrong?
推荐答案
我需要在单独的函数中执行此操作并使用 yield 语句还是我的方法错误?
Do I need to do this in a separate function and use a yield statement or is my approach wrong?
是的.Scrapy 使用回调模型.您需要:
Yes. Scrapy uses a callback model. You would need to:
- 将
PR
对象返回给scrapy 引擎. - 解析 PR 的响应,并在其回调中产生
newrequest
.
- Yield the
PR
objects back to the scrapy engine. - Parse the response of PR, and in its callback, yield
newrequest
.
一个简单的例子:
def parse(self, response):
for i in range(1,3,1):
PR = Request(
'http://myproxyapi.com',
headers=self.headers,
meta={'newrequest': Request('htp//sitetoscrape.com', headers=self.headers),},
callback=self.parse_PR
)
yield PR
def parse_PR(self, response):
newrequest = response.meta['newrequest']
proxy_data = get_data_from_response(PR)
newrequest.meta['proxy'] = proxy_data
yield newrequest
这篇关于从scrapy函数内向API发出请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!