Phantomjs:某些页面无法打开 [英] Phantomjs: certain pages failing to open

查看:437
本文介绍了Phantomjs:某些页面无法打开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个涉及网络抓取的网络应用程序。为了解决这个问题,我正在使用phantomjs的帮助。但是,某些(但不是全部)网页返回status =fail。

I am currently writing a web-application that involves some web-scraping. To help with this, I am employing the help of phantomjs. However, certain (but not all) web pages are returning a status="fail".

这是代码(注意:这实际上是使用节点在nodejs中编写的) -phantom库在这里找到: https://github.com/alexscheelmeyer/node-phantom 。虽然语法可能不同,库实际上直接与phantomjs一起使用,所以它不应该做任何不同的事情:

Here is the code (note: This is actually written in nodejs using the node-phantom library found here: https://github.com/alexscheelmeyer/node-phantom. While the syntax may be different, the library actually works directly with phantomjs so it shouldn't be doing anything different:

phantom.create(function (err,ph) {
    ph.createPage(function (err,page) {
        page.onResourceError = function(errorData) {
            console.log('Unable to load resource (URL:' + errorData.url + ')');
            console.log('Error code: ' + errorData.errorCode + '. Description: ' + errorData.errorString);
        };
        page.onLoadFinished = function(status) {
            console.log('Status: ' + status);
            if(status==='success') {
                page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js', function () {
                    if(fetch_results) {
                        //THIS IS WHERE YOU WILL DO RESULTS SHIT
                        console.log("results page stuff entered");
                        page.render('phantomjs-test2.png');
                        ph.exit();
                    } else {
                        page.evaluate(function () {
                            //page evaluate stuff
                        }, function(err, result) {
                            console.log("entering here");
                            page.render('phantomjs-test.png');
                            if(!err) fetch_results = true;
                        });
                    }
                });
            } else {
                console.log(
                    "Error opening url \"" + page.reason_url
                    + "\": " + page.reason
                );
                console.log("Connection failed.");
                ph.exit();
            }
        }
        //page.open("https://www.google.com",function (err,status) {});
        page.open("https://www.pavoterservices.state.pa.us/Pages/PollingPlaceInfo.aspx",function (err,status) {});
    });
}, {parameters:{'ignore-ssl-errors':'yes'}});

因此对于google.com的page.open,页面会成功加载。但是,如果列出了其他网址,则会返回以下错误:

So for page.open with google.com, the page loads succesfully. However, with the other url listed, it returns the following error:

 Unable to load resource (URL:https://www.pavoterservices.state.pa.us/Pages/PollingPlaceInfo.aspx);  Error code: 2. Description: connection closed;  Error opening url "undefined": undefined

有关谷歌为何会加载但未列出网址的任何帮助非常感谢!

Any help as to why google will load but not the url listed would be greatly appreciated!

推荐答案

(注意:我在尝试使用PhantomJS处理网页的问题

尝试使用--ssl-protocol = any调用phantomjs

Try calling phantomjs with --ssl-protocol=any

我遇到了同样的问题,外部网站有效一周前。

I had the same exact problem, with an external site that worked one week ago.

所以我搜索了一下,发现了一个相关的问题,描述于 Qt QNetworkReply连接关闭。它帮助我调查了phantomjs的嵌入式Qt:它默认强制在SSLv3中建立新连接,这对于旧站点来说太新了,或者对于新站点来说太旧了(但是在Qt 4.8.4的时候是非常合理的默认值)释放)。

So I searched, and found a related issue described at Qt QNetworkReply connection closed. It helped me look into the phantomjs' embedded Qt: it defaults to forcing new connections in SSLv3, which is either too new for old sites, or too old for new sites (but was quite a reasonable default at the time Qt 4.8.4 was released).

使用any,你告诉phantomjs尝试所有协议,这应该可以帮助你通过测试。它将尝试比SSLv3更安全的协议,但SSLv3的安全性也低于SSLv3(SSLv3处于中间范围)。因此,如果any有效,那么您应该尝试强制使用比SSLv3更安全的值而不是any。在我的例子中,指定--ssl-protocol = tlsv1工作。

With "any", you tell phantomjs to try all protocols, which should help you pass the test. It will try more-secure-than-SSLv3 protocols, but less-secure-than-SSLv3 too (SSLv3 is at middle range). So, if "any" works, you should then try to force a more-secure-than-SSLv3 value instead of letting "any". In my case, specifying --ssl-protocol=tlsv1 worked.

猜猜最近SSL的问题(goto fail,heartbleed,poodle等)做了一个很多网站升级他们的服务器,现在拒绝SSLv3连接。
但是如果您的服务器使用比SSLv3更旧的协议,请保留any(以及所有相关的安全风险......)。

Guess that the recent issues with SSL (goto fail, heartbleed, poodle, and so on) made a whole lot of websites upgrade their servers, now refusing SSLv3 connections. But in case your server uses an older-than-SSLv3 protocol, keep the "any" (and all the security risks associated…).

这篇关于Phantomjs:某些页面无法打开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆