在服务器上运行selenium浏览器(Flask / Python / Heroku) [英] Running selenium browser on server (Flask/Python/Heroku)

查看:360
本文介绍了在服务器上运行selenium浏览器(Flask / Python / Heroku)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在拼凑一些似乎对它有很好保护的网站。唯一的办法,我可以得到它的工作是使用Selenium加载页面,然后从中刮去的东西。

目前这个工作在我的本地计算机上(当我访问我的页面时,一个Firefox窗口打开和关闭,而且HTML在我的脚本中进一步处理)。但是,我需要我的刮板可以在网上访问。刮板嵌入在Heroku的Flask应用程序中。有没有办法使Selenium浏览器在Heroku服务器上工作?或者是否有任何托管服务提供商可以工作?

解决方案

不能使用自定义软件或者在许多情况下使用库。在提供一个易于使用,集中控制,管理的堆栈,Heroku剥离他们的服务器,以防止其他用法。



这可以归结为Heroku dyno上没有Xorg。缺乏Xorg和缺乏安装定制软件的能力意味着没有xvfb,也没有能力运行硒预计存在的浏览器。而且,浏览器通常不可用。

您可以使用AWS等云提供更好的运气,您可以在其中安装自定义软件,包括firefox,xvfb(不需要所有Xorg开销),当然还有其他的堆栈。 不能从AWS中的python硒调用firefox\">这个答案解释如何正确地做到这一点。 p>

I am scraping some websites that seem to have pretty good protection against it. The only way I can get it to work is to use Selenium to load the page and then scrape stuff from that.

Currently this works on my local computer (a firefox windows opens and closed when I access my page and it's HTML is processed further in my script). However, I need my scraper to be accessible on the web. The scraper is embedded within a Flask app on Heroku. Is there a way to make the Selenium browser work on Heroku servers? Or are there any hosting providers where it can work?

解决方案

Heroku, wonderful as it is, has a major limitation in that one cannot use custom software or in many cases, libraries. In providing an easy to use, centrally-controlled, managed stack, Heroku strips their servers down to prevent other usage.

What this boils down to is there is no Xorg on a Heroku dyno. Lack of Xorg and lack of ability to install custom software means no xvfb either, and no ability to run the browser that selenium expects to exist. Further, the browser is not generally available.

You'll have better luck with a cloud offering like AWS, where you can install custom software, including firefox, xvfb (to keep from needing all the Xorg overhead), and of course the rest of your scraping stack. This answer explains how to do it properly.

这篇关于在服务器上运行selenium浏览器(Flask / Python / Heroku)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆