执行页面的javascript后保存页面的html输出 [英] save html output of page after execution of the page's javascript

查看:191
本文介绍了执行页面的javascript后保存页面的html输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个我试图抓取的网站,首先加载一个html / js
使用js然后POST修改表单输入字段。
如何获得POSTed页面的最终html输出?

There is a site I am trying to scrape, that first loads an html/js modifies the form input fields using js and then POSTs. How can I get the final html output of the POSTed page?

我试图用phantomjs做这个,但它似乎只有渲染选项图像文件。谷歌搜索表明它应该是可能的,但我无法弄清楚如何。我的尝试:

I tried to do this with phantomjs, but it seems to only have an option to render image files. Googling around suggests it should be possible , but I can't figure out how. My attempt:

var page = require('webpage').create();
var fs = require('fs');
page.open('https://www.somesite.com/page.aspx', function () {
    page.evaluate(function(){

    });

    page.render('export.png');
    fs.write('1.html', page.content, 'w');
    phantom.exit();
});

此代码将用于客户端,我不能指望他安装太多包( nodejs,casperjs等)

This code will be used for a client, I can't expect him to install too many packages (nodejs , casperjs etc)

谢谢

推荐答案

输出代码你是正确的,但同步性存在问题。在页面加载完成之前,您正在执行输出行。
您可以绑定onLoadFinished回调以查明何时发生。请参阅下面的完整代码。

the output code you have is correct, but there is an issue with synchronicity. The output lines that you have are being executed before the page is done loading. You can tie into the onLoadFinished Callback to find out when that happens. See full code below.

    var page = new WebPage()
    var fs = require('fs');

    page.onLoadFinished = function() {
      console.log("page load finished");
      page.render('export.png');
      fs.write('1.html', page.content, 'w');
      phantom.exit();
    };

    page.open("http://www.google.com", function() {
      page.evaluate(function() {
      });
    });

当使用像谷歌这样的网站时,它可能是欺骗性的,因为它加载速度更快,你可以经常像你一样执行屏幕内联。时间在phantomjs中是一件棘手的事情,有时我会用setTimeout测试时间是否有问题。

When using a site like google, it can be deceiving because it loads so quicker, that you can often execute a screengrab inline like you have it. Timing is a tricky thing in phantomjs, sometimes I test with setTimeout to see if timing is an issue.

这篇关于执行页面的javascript后保存页面的html输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆