尝试使用PhantomJS处理网页时出现问题 [英] Issue trying to use PhantomJS to process a web page

查看:488
本文介绍了尝试使用PhantomJS处理网页时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图为SEO制作爬虫,但我似乎无法让PhantomJS至少下载此特定页面: https://tablet.euroslots.com/home/

I'm trying to make a crawler for SEO purposes, and I can't seem to get PhantomJS to at least download this particular page: https://tablet.euroslots.com/home/

如果我使用cURL,它可以正常工作(但显然不能处理javascript):

If I use cURL it works fine (but obviously doesn't process the javascript):

✓ 1344:0 /cherrytech/js-crawler root› curl https://tablet.euroslots.com/home/
<!doctype html><!--[if lt IE 7]><html class="no-js lt-ie9 lt-ie8 lt-ie7"> ...

我的PhantomJS脚本:

My PhantomJS script:

var page = require('webpage').create();

page.onResourceRequested = function (request) {
  console.log('Request ' + JSON.stringify(request, undefined, 4));
};

page.onResourceReceived = function(response) {
  console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};

page.onResourceError = function(resourceError) {
  console.log('Unable to load resource (#' + resourceError.id + 'URL:' + resourceError.url + ')');
  console.log('Error code: ' + resourceError.errorCode + '. Description: ' + resourceError.errorString);
};

page.settings.userAgent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A406 Safari/8536.25';
page.open('https://tablet.euroslots.com/home/', function() {
  console.log(page.content);
  phantom.exit();
});

这是运行它的结果:

✓ 1347:0 /cherrytech/js-crawler root› phantomjs crawler.js
Request {
    "headers": [
        {
            "name": "User-Agent",
            "value": "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A406 Safari/8536.25"
        },
        {
            "name": "Accept",
            "value": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
        }
    ],
    "id": 1,
    "method": "GET",
    "time": "2014-09-16T16:02:24.426Z",
    "url": "https://tablet.euroslots.com/home/"
}
Unable to load resource (#1URL:https://tablet.euroslots.com/home/)
Error code: 2. Description: Connection closed
Response (#1, stage "end"): {"contentType":null,"headers":[],"id":1,"redirectURL":null,"stage":"end","status":null,"statusText":null,"time":"2014-09-16T16:02:24.763Z","url":"https://tablet.euroslots.com/home/"}
<html><head></head><body></body></html>

推荐答案

尝试使用--ssl-protocol = any

Try calling phantomjs with --ssl-protocol=any

我有一个完全相同的问题,一个外部网站一周前就工作了.

I had the same exact problem, with an external site that worked one week ago.

因此,我进行了搜索,并发现了 Qt QNetworkReply连接已关闭中描述的相关问题.它帮助我研究了phantomjs的嵌入式Qt:默认情况下强制在SSLv3中建立新连接,这对于旧站点来说太新了,或者对于新站点来说太老了(但是在Qt 4.8.4当时是相当合理的默认设置)已发布).

So I searched, and found a related issue described at Qt QNetworkReply connection closed. It helped me look into the phantomjs' embedded Qt: it defaults to forcing new connections in SSLv3, which is either too new for old sites, or too old for new sites (but was quite a reasonable default at the time Qt 4.8.4 was released).

使用"any",您告诉phantomjs尝试所有协议,这将有助于您通过测试.它将尝试比SSLv3安全得多的协议,但也尝试比SSLv3安全更少的协议(SSLv3处于中等范围).因此,如果"any"有效,则应尝试强制使用比SSLv3更为安全的值,而不要使用"any".就我而言,指定--ssl-protocol = tlsv1是有效的.

With "any", you tell phantomjs to try all protocols, which should help you pass the test. It will try more-secure-than-SSLv3 protocols, but less-secure-than-SSLv3 too (SSLv3 is at middle range). So, if "any" works, you should then try to force a more-secure-than-SSLv3 value instead of letting "any". In my case, specifying --ssl-protocol=tlsv1 worked.

猜想SSL的最新问题(goto失败,令人伤心,贵宾犬等等)使很多网站升级了服务器,现在拒绝了SSLv3连接. 但是,如果您的服务器使用的是SSLv3之前的协议,请保留"any"(以及所有与之相关的安全风险……).

Guess that the recent issues with SSL (goto fail, heartbleed, poodle, and so on) made a whole lot of websites upgrade their servers, now refusing SSLv3 connections. But in case your server uses an older-than-SSLv3 protocol, keep the "any" (and all the security risks associated…).

这篇关于尝试使用PhantomJS处理网页时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆