PhantomJS querySelectorAll().textcontent不返回任何内容 [英] PhantomJS querySelectorAll().textcontent returns nothing

查看:195
本文介绍了PhantomJS querySelectorAll().textcontent不返回任何内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个简单的Web抓取工具,以使用phantomjs从网站获取数据.当我使用querySelectorAll获取我想要的内容时,它对我不起作用.这是我的完整代码.

I create a simple web scraper to grab data from a website by using phantomjs. It's doesn't work for me when I used querySelectorAll to get content which I want. Here is my whole code.

 var page = require('webpage').create();

var url = 'https://www.google.com.kh/?gws_rd=cr,ssl&ei=iE7jV87UKsrF0gSDw4zAAg';

page.open(url, function(status){

  if(status === 'success'){

    var title = page.evaluate(function(){
      return document.querySelectorAll('.logo-subtext')[0].textContent;
    });

    console.log(title);
  }
  phantom.exit();
});

请帮助我解决这个问题.

Please help me to solve this out.

非常感谢.

推荐答案

默认情况下,PhantomJS的虚拟屏幕尺寸为400x300.

By default the virtual screen size of PhantomJS is 400x300.

var page = require('webpage').create();
console.log(page.viewportSize.width);
console.log(page.viewportSize.height);

400
300

400
300

有些网站会注意到这一点,而不是您在桌面浏览器中看到的普通版本,而是提供了HTML和CSS的移动摘要版本.但是我们可以通过设置所需的视口大小来解决此问题:

There are sites that take note of that and instead of the normal version that you see in your desktop browser they present a mobile, stripped version of the HTML and CSS. But we can fix that by setting the desired viewport size:

page.viewportSize = { width: 1280, height: 800 };

也有一些站点进行用户代理嗅探并据此做出决策.如果他们不知道您的浏览器,他们可以显示移动版本是安全的,或者如果他们不想被抓取,则可以拒绝与PhantomJS的连接,因为它确实声明了自己:

There are also sites that do useragent sniffing and make decisions based on that. If they don't know your browser, they can show a mobile version to be on the safe side, or if they don't want to be scraped they could deny connection to PhantomJS, because it honestly declares itself:

console.log(page.settings.userAgent);

Mozilla/5.0(Windows NT 6.1; WOW64)AppleWebKit/538.1(KHTML,Gecko一样)PhantomJS/2.1.1 Safari/538.1

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1

但是我们可以设置所需的用户代理:

But we can set the desired user agent:

 page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0';


在处理此类易碎物品和网络抓取时,您确实应该注意可能会收到的任何错误以及系统消息.


When working with such fragile things and web scraping you really really should take notice of any errors ans system messages you can get.

因此,任何PhantomJS脚本都不应没有onError和onConsoleMessage回调:

So no PhantomJS script should be without onError and onConsoleMessage callbacks:

page.onError = function (msg, trace) {
    var msgStack = ['ERROR: ' + msg];
    if (trace && trace.length) {
      msgStack.push('TRACE:');
      trace.forEach(function(t) {
        msgStack.push(' -> ' + t.file + ': ' + t.line + (t.function ? ' (in function "' + t.function +'")' : ''));
      });
    }
    console.log(msgStack.join('\n'));
};   

page.onConsoleMessage = function (msg) {
    console.log(msg);
};   

PhantomJS脚本调试的另一项至关重要的技术是制作屏幕截图. 您确定PhantomJS可以在Chrome浏览器中看到您看到的内容吗?

Another vital technique of PhantomJS scripts debugging is making screenshots. Are you sure that PhantomJS sees what you see in you Chrome?

 page.render("google.com.png");

设置用户代理之前:

设置Firefox用户代理后

After setting Firefox user agent

这篇关于PhantomJS querySelectorAll().textcontent不返回任何内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆