在服务器上运行 selenium 浏览器 (Flask/Python/Heroku) [英] Running selenium browser on server (Flask/Python/Heroku)

查看:82
本文介绍了在服务器上运行 selenium 浏览器 (Flask/Python/Heroku)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在抓取一些似乎对它有很好保护的网站.我让它工作的唯一方法是使用 Selenium 加载页面,然后从中抓取内容.

目前这适用于我的本地计算机(当我访问我的页面时,Firefox 窗口打开和关闭,它的 HTML 在我的脚本中进一步处理).但是,我需要可以在网络上访问我的刮刀.刮板嵌入在 Heroku 上的 Flask 应用程序中.有没有办法让 Selenium 浏览器在 Heroku 服务器上工作?或者是否有任何托管服务提供商可以使用它?

解决方案

Heroku 虽然很棒,但有一个主要限制,即不能使用自定义软件或在许多情况下不能使用库.在提供易于使用、集中控制、托管的堆栈时,Heroku 剥离了他们的服务器以防止其他用途.

这归结为 Heroku dyno 上没有 Xorg.缺乏 Xorg 和安装自定义软件的能力意味着也没有 xvfb,也没有能力运行 selenium 期望存在的浏览器.此外,浏览器通常不可用.

使用像 AWS 这样的云产品会带来更好的运气,您可以在其中安装自定义软件,包括 firefox、xvfb(以避免需要所有 Xorg 开销),当然还有其他的抓取堆栈.这个答案解释了如何正确执行.>

I am scraping some websites that seem to have pretty good protection against it. The only way I can get it to work is to use Selenium to load the page and then scrape stuff from that.

Currently this works on my local computer (a firefox windows opens and closed when I access my page and it's HTML is processed further in my script). However, I need my scraper to be accessible on the web. The scraper is embedded within a Flask app on Heroku. Is there a way to make the Selenium browser work on Heroku servers? Or are there any hosting providers where it can work?

解决方案

Heroku, wonderful as it is, has a major limitation in that one cannot use custom software or in many cases, libraries. In providing an easy to use, centrally-controlled, managed stack, Heroku strips their servers down to prevent other usage.

What this boils down to is there is no Xorg on a Heroku dyno. Lack of Xorg and lack of ability to install custom software means no xvfb either, and no ability to run the browser that selenium expects to exist. Further, the browser is not generally available.

You'll have better luck with a cloud offering like AWS, where you can install custom software, including firefox, xvfb (to keep from needing all the Xorg overhead), and of course the rest of your scraping stack. This answer explains how to do it properly.

这篇关于在服务器上运行 selenium 浏览器 (Flask/Python/Heroku)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆