Scrapy 自定义导出器 [英] Scrapy custom exporter
问题描述
我正在定义一个将项目推送到消息队列的项目导出器.下面是代码.
I am defining an item exporter that pushes items to a message queue. Below is the code.
from scrapy.contrib.exporter import JsonLinesItemExporter
from scrapy.utils.serialize import ScrapyJSONEncoder
from scrapy import log
from scrapy.conf import settings
from carrot.connection import BrokerConnection, Exchange
from carrot.messaging import Publisher
log.start()
class QueueItemExporter(JsonLinesItemExporter):
def __init__(self, **kwargs):
log.msg("Initialising queue exporter", level=log.DEBUG)
self._configure(kwargs)
host_name = settings.get('BROKER_HOST', 'localhost')
port = settings.get('BROKER_PORT', 5672)
userid = settings.get('BROKER_USERID', "guest")
password = settings.get('BROKER_PASSWORD', "guest")
virtual_host = settings.get('BROKER_VIRTUAL_HOST', "/")
self.encoder = settings.get('MESSAGE_Q_SERIALIZER', ScrapyJSONEncoder)(**kwargs)
log.msg("Connecting to broker", level=log.DEBUG)
self.q_connection = BrokerConnection(hostname=host_name, port=port,
userid=userid, password=password,
virtual_host=virtual_host)
self.exchange = Exchange("scrapers", type="topic")
log.msg("Connected", level=log.DEBUG)
def start_exporting(self):
spider_name = "test"
log.msg("Initialising publisher", level=log.DEBUG)
self.publisher = Publisher(connection=self.q_connection,
exchange=self.exchange, routing_key="scrapy.spider.%s" % spider_name)
log.msg("done", level=log.DEBUG)
def finish_exporting(self):
self.publisher.close()
def export_item(self, item):
log.msg("In export item", level=log.DEBUG)
itemdict = dict(self._get_serialized_fields(item))
self.publisher.send({"scraped_data": self.encoder.encode(itemdict)})
log.msg("sent to queue - scrapy.spider.naukri", level=log.DEBUG)
我遇到了一些问题.项目没有被提交到队列.我已将以下内容添加到我的设置中:
I'm having a few problems. The items are not being submitted to the queue. Ive added the following to my settings:
FEED_EXPORTERS = {
"queue": 'scrapers.exporters.QueueItemExporter'
}
FEED_FORMAT = "queue"
LOG_STDOUT = True
代码不会引发任何错误,我也看不到任何日志消息.我对如何调试这个问题无能为力.
The code does not raise any errors, and neither can I see any of the logging messages. Im at my wits end on how to debug this.
任何帮助将不胜感激.
推荐答案
Feed Exporters"是调用某些标准"项目导出器的快捷方式(但有点脏).不是从设置中设置提要导出器,而是将您的自定义项目导出器硬连接到您的自定义管道,如下所述 http://doc.scrapy.org/en/0.14/topics/exporters.html#using-item-exporters:
"Feed Exporters" are quick (and somehow dirty) shortcuts to call some "standard" item exporters. Instead of setting up a feed exporter from settings, hard wire your custom item exporter to your custom pipeline, as explained here http://doc.scrapy.org/en/0.14/topics/exporters.html#using-item-exporters :
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.contrib.exporter import XmlItemExporter
class MyPipeline(object):
def __init__(self):
...
dispatcher.connect(self.spider_opened, signals.spider_opened)
dispatcher.connect(self.spider_closed, signals.spider_closed)
...
def spider_opened(self, spider):
self.exporter = QueueItemExporter()
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
def process_item(self, item, spider):
# YOUR STUFF HERE
...
self.exporter.export_item(item)
return item
这篇关于Scrapy 自定义导出器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!