如何在Heroku云上部署Scrapy蜘蛛 [英] How to deploy a Scrapy spider on Heroku cloud

查看:232
本文介绍了如何在Heroku云上部署Scrapy蜘蛛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在scrapy&公司开发了几只蜘蛛。我想测试那些Heroku云。有人知道如何在Heroku云上部署Scrapy蜘蛛吗?

解决方案

是的,部署和运行相当简单您在Heroku上的Scrapy蜘蛛。

以下是以真正的Scrapy项目为例的步骤:


  1. 克隆项目(注意它必须有一个 requirements.txt 文件让Heroku识别它为Python项目):



    git clone https://github.com/scrapinghub/testspiders.git


  2. 将cffi添加到require.txt文件(例如cffi == 1.1.0)。

  3. 创建Heroku应用程序添加一个新的heroku git remote):
    $ b $ heroku create


  4. 部署项目(第一次创建slug时需要一段时间):

    git push heroku master $

  5. > heroku运行scrapy抓取followall

    $ b ul>
  6. Heroku磁盘是短暂的。如果您想将抓取的数据存储在持久的位置,则可以使用 S3 feed export (通过附加 -o s3://mybucket/items.jl )或使用插件(如MongoHQ或Redis To Go)并编写一个管道来存储您的项目

  7. 在Heroku上运行Scrapyd服务器将会很酷,但目前不可能,因为 sqlite3 模块(Scrapyd需要)在Heroku上无法使用

  8. 如果您想要一个更复杂的解决方案来部署Scrapy蜘蛛,可以考虑设置自己的 Scrapyd服务器或使用托管服务,如 Scrapy Cloud


  9. I developed few spiders in scrapy & I want to test those on Heroku cloud. Does anybody have any idea about how to deploy a Scrapy spider on Heroku cloud?

    解决方案

    Yes, it's fairly simple to deploy and run your Scrapy spider on Heroku.

    Here are the steps using a real Scrapy project as example:

    1. Clone the project (note that it must have a requirements.txt file for Heroku to recognize it as a Python project):

      git clone https://github.com/scrapinghub/testspiders.git

    2. Add cffi to the requirement.txt file (e.g. cffi==1.1.0).

    3. Create the Heroku application (this will add a new heroku git remote):

      heroku create

    4. Deploy the project (this will take a while the first time, when the slug is built):

      git push heroku master

    5. Run your spider:

      heroku run scrapy crawl followall

    Some notes:

    • Heroku disk is ephemeral. If you want to store the scraped data in a persistent place, you can use a S3 feed export (by appending -o s3://mybucket/items.jl) or use an addon (like MongoHQ or Redis To Go) and write a pipeline to store your items there
    • It would be cool to run a Scrapyd server on Heroku, but it's not currently possible because the sqlite3 module (which Scrapyd requires) doesn't work on Heroku
    • If you want a more sophisticated solution for deploying your Scrapy spiders, consider setting up your own Scrapyd server or using a hosted service like Scrapy Cloud

    这篇关于如何在Heroku云上部署Scrapy蜘蛛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆