如何从我的 pipelines.py 文件中导入我的蜘蛛类的变量? [英] How to Import a variable of my spider class from my pipelines.py file?
问题描述
我在 github 上发现了一个有趣的爬虫.https://github.com/apetz/email-scraper
I found an interesting scraper on github. https://github.com/apetz/email-scraper
蜘蛛抓取来自网站的电子邮件.
The spider scrapes email from a website.
这个scraper需要以网站为参数通过命令行调用:
This scraper need to be called by command line with a website as argument:
scrapy crawl spider -a domain="your.domain.name" -o emails-found.csv
我想编辑此抓取工具,以便将电子邮件存储在我的数据库中而不是 json 文件中.
所以我尝试获取域"参数位于/spider/thorough_spider.py 类ThoroughSpider"中.
So I tried to get the "domain" argument located in /spiders/thorough_spider.py in the class "ThoroughSpider".
所以在我的 pipelines.py 文件中,我写道:
So in my pipelines.py file, I wrote:
import spiders.thorough_spider
为了导入包含变量ThoroughSpider.domain的模块tough_spider
in order to import the module thorough_spider which conains the varaible ThoroughSpider.domain
但 pycharm 告诉我
But pycharm is telling me
没有名为蜘蛛的模块"
.
所以我尝试了这一行:
from spiders import thorough_spider
这次 pycharm 告诉我了
And pycharm is telling me this time
未解决的参考蜘蛛".
这是位于fodlerspiders"中的蜘蛛tough_spider.py的代码:
Here is the code of the spider thorough_spider.py located in fodler "spiders":
class ThoroughSpider(scrapy.Spider):
name = "spider"
def __init__(self, domain=None, subdomain_exclusions=[], crawl_js=False):
self.allowed_domains = [domain]
start_url = "http://" + domain
self.start_urls = [
start_url
]
这是我的 pipelines.py 中的代码,它位于文件夹spiders"上方:
and here is the code in my pipelines.py which is located above the folder "spiders":
from scrapy.exceptions import DropItem
import mysql.connector
import spiders.thorough_spider
from spiders import thorough_spider
你知道如何在pipelines.py 中将域作为参数传递吗?
推荐答案
如果你想从当前目录导入模块,你可以使用 dot .
if you wanna import from currently directory module you can use dot .
所以你可以试试:
from .spiders.thorough_spider import ThoroughSpider
它应该可以工作
这篇关于如何从我的 pipelines.py 文件中导入我的蜘蛛类的变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!