Scrapy 错误:TypeError:__init__() 得到了一个意外的关键字参数“拒绝" [英] Scrapy Error: TypeError: __init__() got an unexpected keyword argument 'deny'

查看:62
本文介绍了Scrapy 错误:TypeError:__init__() 得到了一个意外的关键字参数“拒绝"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经组装了一个蜘蛛,它一直按预期运行,直到我将关键字 deny 添加到规则中.

I've put together a spider and it was running as intended until I've added the keyword deny into the rules.

这是我的蜘蛛:

from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from bhg.items import BhgItem

class BhgSpider (CrawlSpider):
    name = 'bhg'
    start_urls = ['http://www.bhg.com/holidays/st-patricks-day/']
    rules = (Rule(LinkExtractor(allow=[r'/*'], ),
                  deny=('blogs/*', 'videos/*', ),
                  callback='parse_html'), )

def parse_html(self, response):
    hxs = Selector(response)
    item = BhgItem()

    item['title'] = hxs.xpath('//title/text()').extract()
    item['h1'] = hxs.xpath('//h1/text()').extract()
    item['canonical'] = hxs.xpath('//link[@rel = \"canonical\"]/@href').extract()
    item['meta_desc'] = hxs.xpath('//meta[@name=\"description"]/@content').extract()
    item['url'] = response.request.url
    item['status_code'] = response.status
    return item

当我运行这段代码时,我得到:

When I run this code I get:

deny=('blogs/', 'videos/', ),), )
TypeError: __init__() got an unexpected keyword argument 'deny'

我做错了什么?好吧,我猜一个函数或其他东西不期望额外的参数 (deny) 但哪个函数?parse_html()?

What am i doing wrong? Well, I guess a function or something was not expecting the extra argument (deny) but which function? parse_html()?

我没有定义任何其他蜘蛛,也没有 __init__()

I did not define any other spiders and there is no __init__()

推荐答案

deny 应该作为参数传递给 LinkExtractor,但你把它放在括号之外并将其传递给 Rule.把它移到里面,这样你就有了:

deny is supposed to be passed as an argument to LinkExtractor, but you put it outside those parentheses and passed it to Rule. Move it inside, so you have:

rules = (Rule(LinkExtractor(allow=[r'/*'], deny=('blogs/*', 'videos/*', )),
                  callback='parse_html'), )

__init__ 是在实例化类时传递参数时调用的方法,就像您在此处使用 RuleLinkExtractor 类所做的那样.

__init__ is the method that is called when you pass arguments when instantiating a class, as you did here with the Rule and LinkExtractor classes.

这篇关于Scrapy 错误:TypeError:__init__() 得到了一个意外的关键字参数“拒绝"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆