并行进程中 Scrapy Spider 的多处理 [英] Multiprocessing of Scrapy Spiders in Parallel Processes

查看:51
本文介绍了并行进程中 Scrapy Spider 的多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在 Stack Overflow 上阅读了几个类似的问题.不幸的是,我丢失了所有链接,因为我的浏览历史记录被意外删除.

There as several similar questions that I have already read on Stack Overflow. Unfortunately, I lost links of all of them, because my browsing history got deleted unexpectedly.

以上所有问题,都帮不了我.要么,他们中的一些人使用了 CELERY 或他们中的一些人 SCRAPYD,我想使用 MULTIPROCESSISNG 库.此外,Scrapy 官方文档展示了如何在单个进程上运行多个蜘蛛,而不是在多个进程上运行.

All of the above questions, couldn't help me. Either, some of them have used CELERY or some of them SCRAPYD, and I want to use the MULTIPROCESSISNG Library. Also, the Scrapy Official Documentation shows how to run multiple spiders on a SINGLE PROCESS, not on MULTIPLE PROCESSES.

他们都帮不了我,所以我决定问这个问题.

None of them couldn't help me, and hence I decided to ask this question.

经过多次尝试,我想出了这个代码.

After several try's, I came up with this code.

我的输出-:

Enter a product to search for: apple
2015-06-27 14:34:15 [scrapy] INFO: Scrapy 1.0.0 started (bot: scrapybot)
2015-06-27 14:34:15 [scrapy] INFO: Scrapy 1.0.0 started (bot: scrapybot)
2015-06-27 14:34:15 [scrapy] INFO: Optional features available: ssl, http11
2015-06-27 14:34:15 [scrapy] INFO: Optional features available: ssl, http11
2015-06-27 14:34:15 [scrapy] INFO: Overridden settings: {}
2015-06-27 14:34:15 [scrapy] INFO: Overridden settings: {}
2015-06-27 14:34:15 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-06-27 14:34:15 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2015-06-27 14:34:15 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-06-27 14:34:15 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-06-27 14:34:15 [scrapy] INFO: Enabled item pipelines: 
2015-06-27 14:34:15 [scrapy] INFO: Spider opened
2015-06-27 14:34:15 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-06-27 14:34:15 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-06-27 14:34:15 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-06-27 14:34:15 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-06-27 14:34:15 [scrapy] INFO: Enabled item pipelines: 
2015-06-27 14:34:15 [scrapy] INFO: Spider opened
2015-06-27 14:34:15 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-06-27 14:34:15 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6024
2015-06-27 14:34:15 [twisted] ERROR: Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 88, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 619, in _doReadOrWrite
    why = selectable.doWrite()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1117, in doWrite
    "doWrite called on a %s" % reflect.qual(self.__class__))
exceptions.RuntimeError: doWrite called on a twisted.internet.tcp.Port

Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 88, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 619, in _doReadOrWrite
    why = selectable.doWrite()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1117, in doWrite
    "doWrite called on a %s" % reflect.qual(self.__class__))
exceptions.RuntimeError: doWrite called on a twisted.internet.tcp.Port
2015-06-27 14:34:16 [twisted] ERROR: Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 88, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 619, in _doReadOrWrite
    why = selectable.doWrite()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1117, in doWrite
    "doWrite called on a %s" % reflect.qual(self.__class__))
exceptions.RuntimeError: doWrite called on a twisted.internet.tcp.Port

Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 88, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 619, in _doReadOrWrite
    why = selectable.doWrite()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1117, in doWrite
    "doWrite called on a %s" % reflect.qual(self.__class__))
exceptions.RuntimeError: doWrite called on a twisted.internet.tcp.Port
2015-06-27 14:34:17 [scrapy] DEBUG: Crawled (200) <GET http://bigbasket.com/ps/?q=apple> (referer: None)
hello, world
Current second: 17
Current microsecond: 546862
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 170', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/10000007_18-fresho-apple-washington.jpg', 'product_link': 'http://bigbasket.com/pd/10000007/fresho-apple-washington-1-kg/', 'productname': 'Apple - Washington', 'current_price': 'Rs. 170'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 199', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/10000003_7-fresho-apple-fuji.jpg', 'product_link': 'http://bigbasket.com/pd/10000003/fresho-apple-fuji-1-kg/', 'productname': 'Apple - Fuji', 'current_price': 'Rs. 199'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 229', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/10000005_16-fresho-apple-royal-gala.jpg', 'product_link': 'http://bigbasket.com/pd/10000005/fresho-apple-royal-gala-1-kg/', 'productname': 'Apple - Royal Gala', 'current_price': 'Rs. 229'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 156.75', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/205988_2-american-garden-vinegar-apple-cider.jpg', 'product_link': 'http://bigbasket.com/pd/205988/american-garden-vinegar-apple-cider-473-ml-bottle/', 'productname': 'Vinegar - Apple Cider', 'current_price': 'Rs. 156.75'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 151', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/10000397_7-fresho-apple-green.jpg', 'product_link': 'http://bigbasket.com/pd/10000397/fresho-apple-green-500-gm/', 'productname': 'Apple - Green', 'current_price': 'Rs. 151'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 114', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/229785_5-tropicana-100-juice-apple.jpg', 'product_link': 'http://bigbasket.com/pd/229785/tropicana-100-juice-apple-1-ltr-tetra/', 'productname': '100% Juice - Apple', 'current_price': 'Rs. 114'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 266', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/40015763_1-mylife-vinegar-apple-cider.jpg', 'product_link': 'http://bigbasket.com/pd/40015763/mylife-vinegar-apple-cider-300-ml/', 'productname': 'Vinegar - Apple Cider', 'current_price': 'Rs. 266'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 175', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/40015525_1-fresho-apple-chilli.jpg', 'product_link': 'http://bigbasket.com/pd/40015525/fresho-apple-chilli-1-kg/', 'productname': 'Apple - Chilli', 'current_price': 'Rs. 175'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 94.05', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/229791_3-tropicana-juice-apple.jpg', 'product_link': 'http://bigbasket.com/pd/229791/tropicana-juice-apple-1-ltr-tetra/', 'productname': 'Juice - Apple', 'current_price': 'Rs. 94.05'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 93', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/40015526_1-fresho-apple-chilli.jpg', 'product_link': 'http://bigbasket.com/pd/40015526/fresho-apple-chilli-500-gm/', 'productname': 'Apple - Chilli', 'current_price': 'Rs. 93'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 94.05', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/265854_2-real-fruit-power-juice-apple.jpg', 'product_link': 'http://bigbasket.com/pd/265854/real-fruit-power-juice-apple-1-ltr-carton/', 'productname': 'Fruit Power Juice - Apple', 'current_price': 'Rs. 94.05'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 143.10', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/252445_3-biotique-shampoo-and-conditioner-bio-green-apple.jpg', 'product_link': 'http://bigbasket.com/pd/252445/biotique-shampoo-and-conditioner-bio-green-apple-190-ml/', 'productname': 'Shampoo and Conditioner - Bio Green...', 'current_price': 'Rs. 143.10'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 250', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/30006470_1-fresho-apple-fuji-premium.jpg', 'product_link': 'http://bigbasket.com/pd/30006470/fresho-apple-fuji-premium-1-kg/', 'productname': 'Apple Fuji Premium', 'current_price': 'Rs. 250'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 19', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/282654_2-real-fruit-power-juice-apple.jpg', 'product_link': 'http://bigbasket.com/pd/282654/real-fruit-power-juice-apple-200-ml-carton/', 'productname': 'Fruit Power Juice - Apple', 'current_price': 'Rs. 19'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 14.25', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/100535016_4-quaker-oats-strawberry-flavor-with-apple.jpg', 'product_link': 'http://bigbasket.com/pd/100535016/quaker-oats-strawberry-flavor-with-apple-40-gm-pouch/', 'productname': 'Oats - Strawberry Flavor with Apple', 'current_price': 'Rs. 14.25'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 12.60', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/265705_1-appy-apple-juice-drink-classic.jpg', 'product_link': 'http://bigbasket.com/pd/265705/appy-apple-juice-drink-classic-200-ml-carton/', 'productname': 'Apple Juice Drink - Classic', 'current_price': 'Rs. 12.60'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 19', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/40012961_2-candy-clouds-cotton-candy-orange-green-apple.jpg', 'product_link': 'http://bigbasket.com/pd/40012961/candy-clouds-cotton-candy-orange-green-apple-30-gm-cup/', 'productname': 'Cotton Candy - Orange & Green Apple', 'current_price': 'Rs. 19'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 96', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/229945_1-real-activ-juice-apple.jpg', 'product_link': 'http://bigbasket.com/pd/229945/real-activ-juice-apple-1-ltr-carton/', 'productname': 'Activ Juice - Apple', 'current_price': 'Rs. 96'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 23.75', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/286759_1-minute-maid-juice-apple.jpg', 'product_link': 'http://bigbasket.com/pd/286759/minute-maid-juice-apple-400-ml-bottle/', 'productname': 'Juice - Apple', 'current_price': 'Rs. 23.75'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://bigbasket.com/ps/?q=apple>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 85.50', 'imageurl': 'http://bigbasket.com/media/uploads/p/s/40020508_1-fresho-freshly-baked-apple-pie.jpg', 'product_link': 'http://bigbasket.com/pd/40020508/fresho-freshly-baked-apple-pie-100-gm-pouch/', 'productname': 'Freshly Baked - Apple Pie', 'current_price': 'Rs. 85.50'}
2015-06-27 14:34:17 [scrapy] INFO: Closing spider (finished)
2015-06-27 14:34:17 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 222,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 54881,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 6, 27, 9, 4, 17, 621449),
 'item_scraped_count': 20,
 'log_count/DEBUG': 22,
 'log_count/ERROR': 1,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2015, 6, 27, 9, 4, 15, 879467)}
2015-06-27 14:34:17 [scrapy] INFO: Spider closed (finished)
2015-06-27 14:34:17 [scrapy] DEBUG: Crawled (200) <GET http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0> (referer: None)
hello, world
Current second: 17
Current microsecond: 734324
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 31,800', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/178153.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-4-16-GB-Unlocked-Mobile-Phone-(Black)-pc-19326-97.aspx', 'productname': 'Apple iPhone 4 16 GB Unlocked Mobile Phone (Black)', 'current_price': 'Rs. 26,999'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/180356.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-5c-32-GB-GSM-Mobile-Phone-(White)-pc-20258-97.aspx', 'productname': 'Apple iPhone 5c 32 GB GSM Mobile Phone (White)', 'current_price': 'Rs. 53,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 53,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/180360.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-5s-16-GB-GSM-Mobile-Phone-(Grey)-pc-20262-97.aspx', 'productname': 'Apple iPhone 5s 16 GB GSM Mobile Phone (Grey)', 'current_price': 'Rs. 44,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 53,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/180362.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-5s-16-GB-GSM-Mobile-Phone-(Gold)-pc-20263-97.aspx', 'productname': 'Apple iPhone 5s 16 GB GSM Mobile Phone (Gold)', 'current_price': 'Rs. 44,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 53,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/180376.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-5s-16-GB-GSM-Mobile-Phone-(Silver)-pc-20277-97.aspx', 'productname': 'Apple iPhone 5s 16 GB GSM Mobile Phone (Silver)', 'current_price': 'Rs. 44,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 31,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/180443.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-4S-8-GB-GSM-Mobile-Phone-(Black)-pc-20318-97.aspx', 'productname': 'Apple iPhone 4S 8 GB GSM Mobile Phone (Black)', 'current_price': 'Rs. 16,990'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 31,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/180444.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-4S-8-GB-GSM-Mobile-Phone-(White)-pc-20319-97.aspx', 'productname': 'Apple iPhone 4S 8 GB GSM Mobile Phone (White)', 'current_price': 'Rs. 16,990'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 53,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/185039.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-5S-16-GB-GSM-Mobile-Phone-(Gold)-pc-23555-97.aspx', 'productname': 'Apple iPhone 5S 16 GB GSM Mobile Phone (Gold)', 'current_price': 'Rs. 49,999'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 53,500', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/185802.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-16-GB-GSM-Mobile-Phone-(Silver)-pc-23996-97.aspx', 'productname': 'Apple iPhone 6 16 GB GSM Mobile Phone (Silver)', 'current_price': 'Rs. 52,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/185805.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-64-GB-GSM-Mobile-Phone-(Silver)-pc-23999-97.aspx', 'productname': 'Apple iPhone 6 64 GB GSM Mobile Phone (Silver)', 'current_price': 'Rs. 62,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/185808.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-128-GB-GSM-(Silver)-pc-24002-97.aspx', 'productname': 'Apple iPhone 6 128 GB GSM (Silver)', 'current_price': 'Rs. 71,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/185880.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-Plus-16-GB-GSM-(Space-Grey)-pc-24004-97.aspx', 'productname': 'Apple iPhone 6 Plus 16 GB GSM (Space Grey)', 'current_price': 'Rs. 62,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/185881.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-Plus-16-GB-GSM-(Silver)-pc-24005-97.aspx', 'productname': 'Apple iPhone 6 Plus 16 GB GSM (Silver)', 'current_price': 'Rs. 62,500'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': 'Rs. 56,000', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/189360.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-16-GB-(Gold)-pc-26002-97.aspx', 'productname': 'Apple iPhone 6 16 GB (Gold)', 'current_price': 'Rs. 53,499'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/189363.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-64-GB-(Gold)-pc-26005-97.aspx', 'productname': 'Apple iPhone 6 64 GB (Gold)', 'current_price': 'Rs. 65,000'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/189364.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-64-GB-(Grey)-pc-26006-97.aspx', 'productname': 'Apple iPhone 6 64 GB (Grey)', 'current_price': 'Rs. 65,000'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/189365.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-64-GB-(Silver)-pc-26007-97.aspx', 'productname': 'Apple iPhone 6 64 GB (Silver)', 'current_price': 'Rs. 65,000'}
2015-06-27 14:34:17 [scrapy] DEBUG: Scraped from <200 http://www.cromaretail.com/productsearch.aspx?txtSearch=apple&x=0&y=0>
{'outofstock_status': 'In Stock', 'offer': 'No additional offer available', 'mrp': '', 'imageurl': 'http://www.cromaretail.com/Images/Catalogue/Product/medium/189366.jpg', 'product_link': 'http://www.cromaretail.comApple-iPhone-6-128-GB-(Gold)-pc-26008-97.aspx', 'productname': 'Apple iPhone 6 128 GB (Gold)', 'current_price': 'Rs. 74,000'}
2015-06-27 14:34:17 [scrapy] INFO: Closing spider (finished)
2015-06-27 14:34:17 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 259,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 16851,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2015, 6, 27, 9, 4, 17, 764861),
 'item_scraped_count': 18,
 'log_count/DEBUG': 20,
 'log_count/ERROR': 1,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2015, 6, 27, 9, 4, 15, 930386)}
2015-06-27 14:34:17 [scrapy] INFO: Spider closed (finished)
<Deferred at 0x7f02a29b7c68 current result: None>

如果你正确地看到了我的输出,最初会出现一些错误并且程序会在之前暂停一下

If you see my output correctly, initially some error comes and the program pauses just a bit before

2015-06-27 14:34:17 [scrapy] DEBUG: Crawled (200) <GET http://bigbasket.com/ps/?q=apple> (referer: None) 

——这行在我的输出中,然后运行,但也会产生一个输出.

- this line in my output, and then runs, but also produces an output.

我无法弄清楚以下两件事:

I am not able to figure out the following 2 things:

  • 为什么会出现这个错误?为什么程序在那里暂停(是因为scrapy正在抓取页面,所以是正常的,还是有什么问题?)?为了解决这个问题,我必须对代码进行哪些更改?
  • 如果您看到我正在打印当前秒和当前微秒,以检查我的两个函数何时被输入并开始被处理.我已经这样做了,基本上是为了检查我的两个函数是否实际上都在进行多处理.我知道,即使在使用 Multiprocessing 之后,也有几微秒的时间延迟,这在我看来是可以接受的.但是,我多次运行这个脚本,有时我注意到只有几微秒的时间延迟(我可以接受,但这种情况通常出现在网站只返回几个搜索结果时,所以我不知道是否它们实际上是否得到了多处理,因为通常我的函数在大约 1.3 秒内独立执行并产生大约 20 个结果.),有时我也注意到大约 1 或 2 秒的时间延迟(这肯定是说,我的函数没有得到了多处理,至少在阅读了时间之后 - 我的函数独立执行大约 1.3 秒 - 如果它们在不同的 python 脚本上).那么,为什么会出现这种时间变化呢?我如何检查我的函数是否真的得到了多处理?

请在我的代码中提供更正,并解释我的 2 个问题.

Please provide corrections in my code, and explanations to my 2 issues.

请帮忙!任何答案,将不胜感激!

Please do help! Any answers, shall be well appreciated!

谢谢!

推荐答案

Scrapy 是用 Twisted 创建的,这个框架已经有了运行多个进程的方式.这里有一个很好的问题.在您的方法中,您实际上是在尝试将两个不兼容且相互竞争的库(Scrapy/Twisted + multiprocessing)结合起来.这可能不是最好的主意,您可能会遇到很多问题.

Scrapy is created with Twisted, and this framework already has its way of running multiple processes. There is nice question about this here. In your approach you are actually trying to marry two incompatible and competing libraries (Scrapy/Twisted + multiprocessing). This is probably not best idea, you can run into lots of problems with that.

如果您想在多个进程中运行 Scrapy 蜘蛛,那么使用 Twisted 会容易得多.您可以阅读 Twisted 文档 spawnProcess 和其他调用并尝试使用这些工具来实现您的目标.例如,这里是在两个进程中运行两个蜘蛛的快速而肮脏的实现

If you would like to run Scrapy spiders in multiple processes it will much easier to just use Twisted. You could just read Twisted docs for spawnProcess and other calls and try to those tools for your goal. For example here's quick and dirty implementation that runs two spiders in two processes

from twisted.internet import defer, protocol, reactor
import os


class SpiderRunnerProtocol(protocol.ProcessProtocol):
    def __init__(self, d, inputt=None):
        self.deferred = d
        self.inputt = inputt
        self.output = ""
        self.err = ""

    def connectionMade(self):
        if self.inputt:
            self.transport.write(self.inputt)
        self.transport.closeStdin()

    def outReceived(self, data):
        self.output += data

    def processEnded(self, reason):
        print(reason.value)
        print(self.err)
        self.deferred.callback(self.output)

    def errReceived(self, data):
        self.err += data


def run_spider(cmd, *args, **kwargs):
    d = defer.Deferred()
    pipe = SpiderRunnerProtocol(d)
    args = [cmd] + list(args)
    env = os.environ.copy()
    x = reactor.spawnProcess(pipe, cmd, args, env=env)
    print(x.pid)
    print(x)
    return d


def print_out(result):
    print(result)

d = run_spider("scrapy", "crawl", "reddit")
d = run_spider("scrapy", "crawl", "dmoz")
d.addCallback(print_out)
d.addCallback(lambda _: reactor.stop())
reactor.run()

有一篇不错的博客文章 解释了 Twisted 子进程的用法这里

There's a nice blog post explaining usage of Twisted subprocesses here

这篇关于并行进程中 Scrapy Spider 的多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆