如何在 Heroku 云上部署 Scrapy 蜘蛛 [英] How to deploy a Scrapy spider on Heroku cloud
问题描述
我在scrapy & 中开发了很少的蜘蛛我想在 Heroku 云上测试这些.有人知道如何在 Heroku 云上部署 Scrapy 蜘蛛吗?
I developed few spiders in scrapy & I want to test those on Heroku cloud. Does anybody have any idea about how to deploy a Scrapy spider on Heroku cloud?
推荐答案
是的,在 Heroku 上部署和运行 Scrapy spider 相当简单.
Yes, it's fairly simple to deploy and run your Scrapy spider on Heroku.
以下是使用真实 Scrapy 项目的步骤:
Here are the steps using a real Scrapy project as example:
克隆项目(注意它必须有一个
requirements.txt
文件,Heroku 才能将其识别为 Python 项目):
Clone the project (note that it must have a
requirements.txt
file for Heroku to recognize it as a Python project):
git clone https://github.com/scrapinghub/testspiders.git
将 cffi 添加到 requirements.txt 文件中(例如 cffi==1.1.0).
Add cffi to the requirement.txt file (e.g. cffi==1.1.0).
创建 Heroku 应用程序(这将添加一个新的 heroku git remote):
Create the Heroku application (this will add a new heroku git remote):
heroku 创建
部署项目(第一次需要一段时间,当slug被构建时):
Deploy the project (this will take a while the first time, when the slug is built):
git push heroku main
运行你的蜘蛛:
heroku run scrapy crawl followall
一些注意事项:
- Heroku 磁盘是短暂的.如果要将抓取的数据存储在持久的位置,可以使用 S3 提要导出(通过附加
-o s3://mybucket/items.jl
)或使用插件(如 MongoHQ 或 Redis To Go)并编写一个管道将您的项目存储在那里 - 在 Heroku 上运行 Scrapyd 服务器会很酷,但目前不可能,因为
sqlite3
模块(Scrapyd 需要)在 Heroku 上不起作用 - 如果您想要一个更复杂的解决方案来部署您的 Scrapy 蜘蛛,请考虑设置您自己的 Scrapyd 服务器 或使用诸如 Scrapy Cloud 之类的托管服务
- Heroku disk is ephemeral. If you want to store the scraped data in a persistent place, you can use a S3 feed export (by appending
-o s3://mybucket/items.jl
) or use an addon (like MongoHQ or Redis To Go) and write a pipeline to store your items there - It would be cool to run a Scrapyd server on Heroku, but it's not currently possible because the
sqlite3
module (which Scrapyd requires) doesn't work on Heroku - If you want a more sophisticated solution for deploying your Scrapy spiders, consider setting up your own Scrapyd server or using a hosted service like Scrapy Cloud
这篇关于如何在 Heroku 云上部署 Scrapy 蜘蛛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!