使用多个解析创建 Scrapy 项目数组 [英] Creating Scrapy array of items with multiple parse

查看：30 发布时间：2021/11/18 4:19:57 python arrays scrapy scrapy-spider

本文介绍了使用多个解析创建 Scrapy 项目数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Scrapy 抓取列表.我的脚本首先使用 parse_node 解析列表 URL，然后使用 parse_listing 解析每个列表，对于每个列表，它使用 parse_agent.我想创建一个数组，该数组通过列表和列表的代理进行scrapy 解析，并为每个新列表进行重置.


I am scraping listings with Scrapy. My script parses first for the listing urls using parse_node, then it parses each listing using parse_listing, for each listing it parses the agents for the listing using parse_agent. I would like to create an array, that builds up as scrapy parses through the listings and the agents for the listings and resets for each new listing. 
这是我的解析脚本:
 def parse_node(self,response,node):
  yield Request('LISTING LINK',callback=self.parse_listing)
 def parse_listing(self,response):
  yield response.xpath('//node[@id="ListingId"]/text()').extract_first()
  yield response.xpath('//node[@id="ListingTitle"]/text()').extract_first()
  for agent in string.split(response.xpath('//node[@id="Agents"]/text()').extract_first() or "",'^'):
   yield Request('AGENT LINK',callback=self.parse_agent)
 def parse_agent(self,response):
  yield response.xpath('//node[@id="AgentName"]/text()').extract_first()
  yield response.xpath('//node[@id="AgentEmail"]/text()').extract_first()

我希望 parse_listing 导致:
I would like parse_listing to result in:
{
 'id':123,
 'title':'Amazing Listing'
}

然后将 parse_agent 添加到列表数组中:
then parse_agent to add to the listing array:
{
 'id':123,
 'title':'Amazing Listing'
 'agent':[
  {
   'name':'jon doe',
   'email:'jon.doe@email.com'
  },
  {
   'name':'jane doe',
   'email:'jane.doe@email.com'
  }
 ]
}

如何获取每个级别的结果并构建数组?
How do I get the results from each level and build up an array?
推荐答案
这个有点复杂的发布:
您需要从多个不同的网址形成一个项目.
This is somewhat complicated issued:

You need to form a single item from multiple different urls.
Scrapy 允许您在请求的元属性中携带数据，以便您可以执行以下操作:
Scrapy allows you to carry over data in request's meta attribute so you can do something like:
def parse_node(self,response,node):
    yield Request('LISTING LINK', callback=self.parse_listing)

def parse_listing(self,response):
    item = defaultdict(list)
    item['id'] = response.xpath('//node[@id="ListingId"]/text()').extract_first()
    item['title'] = response.xpath('//node[@id="ListingTitle"]/text()').extract_first()
    agent_urls = string.split(response.xpath('//node[@id="Agents"]/text()').extract_first() or "",'^')
    # find all agent urls and start with first one
    url = agent_urls.pop(0)
    # we want to go through agent urls one-by-one and update single item with agent data
    yield Request(url, callback=self.parse_agent, 
                  meta={'item': item, 'agent_urls' agent_urls})

def parse_agent(self,response):
    item = response.meta['item']  # retrieve item generated in previous request
    agent = dict() 
    agent['name'] = response.xpath('//node[@id="AgentName"]/text()').extract_first()
    agent['email'] =  response.xpath('//node[@id="AgentEmail"]/text()').extract_first()
    item['agents'].append(agent)
    # check if we have any more agent urls left
    agent_urls = response.meta['agent_urls']
    if not agent_urls:  # we crawled all of the agents!
        return item
    # if we do - crawl next agent and carry over our current item
    url = agent_urls.pop(0)
    yield Request(url, callback=self.parse_agent, 
                  meta={'item': item, 'agent_urls' agent_urls})


                        这篇关于使用多个解析创建 Scrapy 项目数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用多个解析创建 Scrapy 项目数组 [英] Creating Scrapy array of items with multiple parse

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用多个解析创建 Scrapy 项目数组 [英] Creating Scrapy array of items with multiple parse

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭