scrapy 从多个站点获取值 [英] scrapy getting values from multiple sites

查看:42
本文介绍了scrapy 从多个站点获取值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从函数中传递一个值.

I'm trying to pass a value from a function.

我查阅了文档,只是不明白.参考:

i looked up the docs and just didn't understand it. ref:

def parse_page1(self, response):
    item = MyItem()
    item['main_url'] = response.url
    request = scrapy.Request("http://www.example.com/some_page.html",
                             callback=self.parse_page2)
    request.meta['item'] = item
    yield request

def parse_page2(self, response):
    item = response.meta['item']
    item['other_url'] = response.url
    yield item

这是我想要实现的伪代码:

here is a psudo code of what i want to achive:

import scrapy

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com]
    start_urls = ['http://first.com/']

def parse(self, response):
    name = response.xpath(...)
    price = scrapy.Request(second.com, callback = self.parse_check)
    yield(name, price)


def parse_check(self, response):
    price = response.xpath(...)
    return price

推荐答案

这是您可以将任何值、链接等传递给其他方法的方式:

This is how you can pass any value, link etc to other methods:

import scrapy

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com']
    start_urls = ['http://first.com/']

    def parse(self, response):
        name = response.xpath(...)
        link = response.xpath(...)  # link for second.com where you may find the price
        request = scrapy.Request(url=link, callback = self.parse_check)
        request.meta['name'] = name
        yield request

    def parse_check(self, response):
        name = response.meta['name']
        price = response.xpath(...)
        yield {"name":name,"price":price} #Assuming that in your "items.py" the fields are declared as name, price

这篇关于scrapy 从多个站点获取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆