类实例中的空变量,尽管专门设置了它 [英] Empty variable within instance of a class, despite specifically setting it

查看:42
本文介绍了类实例中的空变量,尽管专门设置了它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行以下代码时:

import scrapy
from scrapy.crawler import CrawlerProcess

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    search_url = ''

    def start_requests(self):
        print ('self.search_url is currently: ' + self.search_url)
        yield scrapy.Request(url=self.search_url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

test_spider = QuotesSpider()
test_spider.search_url='http://quotes.toscrape.com/page/1/'

process.crawl(test_spider)
process.start() # the script will block here until the crawling is finished

我收到以下错误:

self.search_url is currently:
...
   ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url:
...

似乎在函数 start_requests 中,self.search_url 似乎是一个空变量,即使我在调用该函数之前已将其值显式设置为某个值.我似乎无法弄清楚为什么会这样.

It seems that within the function start_requests, self.search_url seems to be an empty variable, even though I have explicitly set its value to something before calling the function. I cannot seem to figure out why that is.

推荐答案

最简洁的方法是使用构造函数 __init__(),但更简单(也许只是更快你想要)是在类中移动 start_url 的定义.例如:

The neatest way to do this, would be to use the constructor __init__(), but an easier(maybe just faster for what you want) is to move the definition of start_url inside the class. For example:

import scrapy
from scrapy.crawler import CrawlerProcess

class QuotesSpider(scrapy.Spider):

    name = "quotes"
    search_url = 'http://quotes.toscrape.com/page/1/'

    def start_requests(self):
        print ('search_url is currently: ' + self.search_url)
        yield scrapy.Request(url=self.search_url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'quotes-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

test_spider = QuotesSpider()

process.crawl(test_spider)
process.start()

这篇关于类实例中的空变量,尽管专门设置了它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆