Selenium Chrome驱动程序的局限性 [英] Selenium Chrome Driver Limitations Web Scraping at Scale

查看:118
本文介绍了Selenium Chrome驱动程序的局限性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我计划在我的项目中使用Selenium Chrome驱动程序,该驱动程序将用于对多个公共网站(例如皮划艇或Skyscanner等)进行网页抓取.因此,将有一个REST GET端点,我的后端将在该端点上启动无头的Chrome来抓取多个网站,并最终返回可操纵的JSON.

I'm planning to use Selenium Chrome Driver for my project which will be used to do web scraping to multiple public websites (something like kayak or skyscanner). So there will be a REST GET endpoint where my backend would launch headless Chrome to scrape multiple websites, and eventually return a manipulated JSON.

我想知道Chrome驱动程序的可扩展性,因为听起来好像每当有请求传入时就启动一个无头的Chrome实例.

I want to know how scalable is Chrome Driver as it sounds like a headless Chrome instance needs to be launched whenever a request comes in.

更新:使用Google Chrome浏览器Headless的问题

Updated: Question using Google Chrome Headless

推荐答案

请找到我在实现过程中注意到的幻影js的优缺点.希望这会有所帮助.

Please find the pros and cons of phantom js which I noticed during implementation .Hope this helps.

缺点:

1)It will fail to recognize the browser elements like id,xpath,csselector
when compared to chrome driver.
2)If you have login mechanism ,redirects won't work as you expect when compared to chrome driver.
3)You need to manually implement the custom logic for screen shots for the test failures if you need it.
4)If you want to switch between multiple drivers like chrome,html etc then it is very difficult

优点:

1)Test case execution is faster when compared to chrome driver
2)No browser is required it will run without GUI. 
3)No much configurations are needed when compared to chromedriver.

您也可以使用html驱动程序,它比phantom速度要快得多,但是即使它有其自身的局限性,您也需要在实施之前加以照顾.

You can go with html driver also which is quite faster then phantom but even it has its own limitations that you need take care of before implementation.

这篇关于Selenium Chrome驱动程序的局限性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆