PhantomJS并获得修改的DOM [英] PhantomJS and getting modified DOM

查看:222
本文介绍了PhantomJS并获得修改的DOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个工具,需要从第三方服务器下载网页,然后像浏览器一样执行,然后解析HTML。我所苦恼的是,在所有的javascript被执行并且DOM被修改之后,该工具需要解析HTML。我试图使用PhantomJS来达到这个目的,它适用于小部分代码片段(只是一个带有外部javascript的小型html文档,它为DOM添加了一些节点),但是当我对一个真实站点( http://www.dba.dk/ )经过js代码完成的所有修改后,我没有收到最终的HTML。 / p>

我真的需要这方面的帮助,因为我坚持了一个多星期。



My PhantomJS代码很简单:

  if(phantom.state.length === 0){
if(phantom.args .length === 0){
console.log('Usage:test.js< some URL>');
phantom.exit();
} else {
var address = phantom.args [0];
phantom.state = Date.now()。toString();
phantom.viewportSize = {width:1280,height:800};
phantom.open(地址);
}
} else {
var elapsed = Date.now() - new Date()。setTime(phantom.state);
if(phantom.loadStatus ==='success'){
if(!first_time){
var first_time = true;
if(!document.addEventListener){
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(标记);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();


转储到控制台的HTML不包含内容动态生成

解决方案

问题出在Flash插件中。网页正在检测它的缺席。一旦正确加载,问题就消失了

I'm developing a tool that needs to download a web page from 3rd party server, execute it as a browser would and then parse the HTML. What I struggle with is that the tool need to parse the HTML after all javascript is executed and DOM is modified. I'm trying to use PhantomJS for this purpose and it works on small snippets of code (just a tiny html document with external javascript that adds some nodes to DOM) but when I do the same with a real site (http://www.dba.dk/) I'm not getting the final HTML after all modifications done by the js code.

I really need help on this as I have been stuck with it for more than a week.

My PhantomJS code is simple:

if (phantom.state.length === 0) {
     if (phantom.args.length === 0) {
             console.log('Usage: test.js <some URL>');
             phantom.exit();
     } else {
             var address = phantom.args[0];
             phantom.state = Date.now().toString();
             phantom.viewportSize = { width: 1280, height: 800 };
             phantom.open(address);
     }
} else {
     var elapsed = Date.now() - new Date().setTime(phantom.state);
     if (phantom.loadStatus === 'success') {
             if (!first_time) {
                     var first_time = true;
                     if (!document.addEventListener) {
                             console.log('Not SUPPORTED!');
                     }
                     phantom.render('result.png');
                     var markup = document.documentElement.innerHTML;
                     console.log(markup);
                     phantom.exit();
             }
     } else {
             console.log('FAIL to load the address');
             phantom.exit();
     }
}

the HTML dumped to the console doesn't contain content generated dynamically

解决方案

The problem was in the Flash plugin. The pages were detecting its absense. Once it was loaded correctly the problem was gone

这篇关于PhantomJS并获得修改的DOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆