Selenium无头浏览器WebDriver [Errno 104]由对等方重置连接 [英] Selenium headless browser webdriver [Errno 104] Connection reset by peer

查看:181
本文介绍了Selenium无头浏览器WebDriver [Errno 104]由对等方重置连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从下面的网址中抓取数据.但是,当driver.get(url)时硒会失败,有时错误是[Errno 104] Connection reset by peer,有时是[Errno 111] Connection refused. 在极少的日子里,它仍然可以正常工作,并且在我的带有实际浏览器的Mac上,同一蜘蛛每次都能正常工作.因此,这与我的spider无关.

I am trying to scrape data from the URLs below. But selenium fails when driver.get(url) Some times the error is [Errno 104] Connection reset by peer, sometimes [Errno 111] Connection refused. On rare days it works just fine and on my mac with real browser the same spider works fine every single time. So this isn't related to my spider.

已经尝试了许多解决方案,例如在页面上等待获取选择器,隐式等待,使用selenium-requests以及传递正确的请求标头等,但是似乎没有任何效果.

Have tried many solutions like waiting got selectors on page, implicit wait, using selenium-requests yo pass proper request headers, etc. But nothing seems to work.

http://www.snapdeal.com/offers/deal-of-the-day
https://paytm.com/shop/g/paytm-home/exclusive-discount-deals

我正在使用pythonselenium& headless Firefox webdriver实现此目的.操作系统是centos 6.5.

I am using python, selenium & headless Firefox webdriver to achieve this. The os is centos 6.5.

注意:我有很多AJAX较重的页面都已成功刮取,下面是一些页面.

Note: I have many AJAX heavy pages that gets scraped successfully some are below.

http://www.infibeam.com/deal-of-the-day.html, http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals

已经花了很多天的时间来调试问题,没有运气.任何帮助将不胜感激.

Already spent many days trying to debug the issue with no luck. Any help would be appreciated.

推荐答案

经过数天的争论,终于找到了原因.在这里写出来是为了社区的利益.无头浏览器由于服务器上缺少RAM而失败,来自webdriver的奇怪错误消息是真实的pita.

After days of jingling around this issue, finally found the cause. Writing it here for the benefit of the community. The headless browser was failing due to lack of RAM on the server, strange error messages from webdriver were real pita.

服务器在没有重启的情况下一直运行了60天,重新启动就可以了.将交换次数增加3倍后,过去几天一直没有遇到问题.还安排了清理页面文件缓存的任务( http://www.yourownlinux.com/2013/10/how-to-free-up-release-unused-cached-memory-in-linux.html ).

The server was running straight up for 60 days without reboot, Rebooting it did the trick. After increasing the swap by 3 times, has not faced issue for past few days. Also scheduled a task to cleanup page file caches (http://www.yourownlinux.com/2013/10/how-to-free-up-release-unused-cached-memory-in-linux.html).

这篇关于Selenium无头浏览器WebDriver [Errno 104]由对等方重置连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆