为所有爬虫蜘蛛编写函数 [英] Write functions for all scrapy spiders

查看:34
本文介绍了为所有爬虫蜘蛛编写函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在尝试编写可以从所有爬虫蜘蛛调用的函数.我的项目中是否有一个地方可以定义这些函数,还是需要在每个蜘蛛中导入它们?

So I'm trying to write functions that can be called upon from all scrapy spiders. Is there one place in my project where I can just define these functions or do I need to import them in each spider?

谢谢

推荐答案

你不能在 python 中隐式地导入代码(至少不能在没有 hacking 的情况下),毕竟显式优于隐式 - 所以这不是一个好主意.

You can't implicitly import code (at least not without hacking around) in python, after all explicit is better than implicit - so it's not a good idea.

然而,在scrapy中,拥有通用函数和方法的Spider基类是很常见的.

However in scrapy it's very common to have base Spider class with common functions and methods.

假设你有这棵树:

├── myproject
│   ├── __init__.py
│   ├── spiders
│   │   ├── __init__.py
│   │   ├── spider1.py
│   │   ├── spider2.py
├── scrapy.cfg

我们可以在spiders/__init__.py中创建一个基础蜘蛛:

We can create a base spider in spiders/__init__.py:

class BaseSpider(Spider):
    def common_parse(self, response):
        # do something     

并在您的蜘蛛中继承它:

And inherit from it in your spiders:

from myproject.spiders import BaseSpider
class Spider1(BaseSpider):
    def parse(self, response):
        # use common methods!
        if 'indicator' in response.body:
            self.common_parse(response)

这篇关于为所有爬虫蜘蛛编写函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆