来自多个请求的 Scrapy yeild 项目 [英] Scrapy yeild items from multiple requests

查看:50
本文介绍了来自多个请求的 Scrapy yeild 项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从不同的请求中生成项目,如下所示.如果我将 items = PrintersItem() 添加到每个请求中,我会陷入无限循环.. 我将其取出会发生其他错误.不确定如何将收益请求与每个收益项目结合起来

I am trying to yield items from different requests as shown here. If I add items = PrintersItem() to each request I get endless loops.. It I take it out other errors occur. Not sure how to combine yield request with yield items for each

import scrapy
from scrapy.http import Request, FormRequest
from ..items import PrintersItem
from scrapy.utils.response import open_in_browser

class PrinterSpider(scrapy.Spider):
    name = 'printers'
    start_urls = ['http://192.168.137.9', 'http://192.168.137.35', 'http://192.168.137.34', 'http://192.168.137.27', 'http://192.168.137.21' ]


    def parse(self, response):
            items = PrintersItem()
            token = response.xpath('//*[@name="CSRFToken"]/@value').extract_first()
            print(token)

            yield  FormRequest.from_response(response, formnumber=1, formdata={
                'CSRFToken' : token,
                'B55d' : 'password',
                'loginurl' : '/general/status.html'
             }, callback=self.postlogin2)


    def  postlogin2(self,response):
            items = PrintersItem()
            contact = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[5]/dl[1]/dd[1]/ul[1]/li[1]/text()[last()]').extract()
            location = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[5]/dl[1]/dd[1]/ul[1]/li[2]/text()[last()]').extract()
            items['contact'] = contact
            items['location'] = location

            yield Request(
            url = response.url.split('/general')[0] + "/general/information.html?kind=item",
            callback=self.action)

            for items in self.postlogin2(response):
                yield items

    def action(self,response):
            drum = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[7]/dl[1]/dd[1]/text()').extract()
            items['drum'] = drum
            print(drum)
            printermodel = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[5]/dl[1]/dd[1]/text()').extract()
            items['printermodel'] = printermodel
            yield Request(
            url = response.url.split('/general')[0] + "/net/wired/tcpip.html",
            callback=self.action2)
            for items in self.action(response):
                yield items

    def action2(self, response):
            tcpip = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[4]/dl[1]/dd[2]/input[1]/@value').extract()
            items['tcpip'] = tcpip
            for items in self.action2(response):
                yield items

推荐答案

如果你想将 itemsparse 发送到 postlogin2 等. 然后将其作为 meta 数据添加到 Request

If you want to send items from parse to postlogin2, etc. then add it as meta data in Request

yield Request( ..., meta={"items": items})

并在其他函数中获取它

items = response.meta["items"]

并且只在最后一个函数中产生它

and yield it only in the last function

yield items

文档:请求和响应Request.meta 特殊键

class PrinterSpider(scrapy.Spider):
    name = 'printers'
    start_urls = ['http://192.168.137.9', 'http://192.168.137.35',
                  'http://192.168.137.34', 'http://192.168.137.27', 'http://192.168.137.21' ]


    def parse(self, response):
            token = response.xpath('//*[@name="CSRFToken"]/@value').extract_first()
            print(token)

            yield  FormRequest.from_response(response, formnumber=1, formdata={
                'CSRFToken' : token,
                'B55d' : 'password',
                'loginurl' : '/general/status.html'
             }, callback=self.postlogin2)


    def  postlogin2(self, response):
            items = PrintersItem()

            contact = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[5]/dl[1]/dd[1]/ul[1]/li[1]/text()[last()]').extract()
            location = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[5]/dl[1]/dd[1]/ul[1]/li[2]/text()[last()]').extract()
            items['contact'] = contact
            items['location'] = location

            yield Request(
                #url=response.urljoin("/general/information.html?kind=item"),
                url=response.url.split('/general')[0] + "/general/information.html?kind=item",
                callback=self.action,
                meta={"items": items})


    def action(self, response):
            items = response.meta["items"]

            drum = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[7]/dl[1]/dd[1]/text()').extract()
            items['drum'] = drum
            print(drum)

            printermodel = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[5]/dl[1]/dd[1]/text()').extract()
            items['printermodel'] = printermodel

            yield Request(
                #url=response.urljoin("/net/wired/tcpip.html"),
                url=response.url.split('/general')[0] + "/net/wired/tcpip.html",
                callback=self.action2,
                meta={"items": items})

    def action2(self, response):
            items = response.meta["items"]

            tcpip = response.xpath('//html[1]/body[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]/div[1]/div[2]/form[1]/div[4]/dl[1]/dd[2]/input[1]/@value').extract()
            items['tcpip'] = tcpip

            yield items

这篇关于来自多个请求的 Scrapy yeild 项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆