抓取多个请求并填充单个项目 [英] Scrapy multiple requests and fill single item
本文介绍了抓取多个请求并填充单个项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要向不同的 url 发出 2 个请求,并将该信息放到同一个项目中.我试过这种方法,但结果写在不同的行中.回调返回项目.我尝试了很多方法,但似乎都不起作用.
I need to make 2 request to different urls and put that information to the same item. I have tried this method, but the result is written in different rows. The callbacks returns item. I have tried many methods but none seems to work.
def parse_companies(self, response):
data = json.loads(response.body)
if data:
item = ThalamusItem()
for company in data:
comp_id = company["id"]
url = self.request_details_URL + str(comp_id) + ".json"
request = Request(url, callback=self.parse_company_details)
request.meta['item'] = item
yield request
url2 = self.request_contacts + str(comp_id)
yield Request(url2, callback=self.parse_company_contacts, meta={'item': item})
推荐答案
由于scrapy 是异步的,您需要手动链接您的请求.为了在请求之间传输数据,您可以使用请求的 meta
属性:
Since scrapy is asynchronious you need to chain your requests manually. For transfering data between requests you can use Request's meta
attribute:
def parse(self, response):
item = dict()
item['name'] = 'foobar'
yield request('http://someurl.com', self.parse2,
meta={'item': item})
def parse2(self, response):
print(response.meta['item'])
# {'name': 'foobar'}
在你的情况下,当你应该有一个连续的链条时,你最终得到了一条分裂的链条.
您的代码应如下所示:
In your case you end up with a split chain when you should have one continuous chain.
Your code should look something like this:
def parse_companies(self, response):
data = json.loads(response.body)
if not data:
return
for company in data:
item = ThalamusItem()
comp_id = company["id"]
url = self.request_details_URL + str(comp_id) + ".json"
url2 = self.request_contacts + str(comp_id)
request = Request(url, callback=self.parse_details,
meta={'url2': url2, 'item': item})
yield request
def parse_details(self, response):
item = response.meta['item']
url2 = response.meta['url2']
item['details'] = '' # add details
yield Request(url2, callback=self.parse_contacts, meta={'item': item})
def parse_contacts(self, response):
item = response.meta['item']
item['contacts'] = '' # add details
yield item
这篇关于抓取多个请求并填充单个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文