使用phantomjs获取JavaScript呈现的html源代码 [英] Get javascript rendered html source using phantomjs
问题描述
我有一个html页面,其正文由一些javascript代码呈现。我需要的是我想用phantomjs下载呈现的html内容。
我没有任何想法使用phantomjs。我有一些shell脚本的经验。所以我试图用 curl
来做到这一点。但是,由于curl不足以呈现JavaScript,我只能获取默认源代码的html。呈现的内容未被下载。我听说红宝石机械化可能会完成这项工作。但是我对ruby没有任何了解。因此,在进一步调查中,我发现了命令行工具 phantomjs
。我该如何使用 phantomjs
?
来做到这一点。请随时询问我需要提供哪些附加信息。
不幸的是,使用PhantomJS命令行是不可能的。您必须使用Javascript文件才能真正完成PhantomJS的任何功能。
以下是您可以使用的非常简单的脚本版本
代码大部分来自 https://stackoverflow.com/a/12469284/4499924
printSource.js $ b
var system = require('system');
var page = require('webpage')。create();
// system.args [0]是文件名,所以system.args [1]是第一个真正的参数
var url = system.args [1];
//呈现页面并运行回调函数
page.open(url,function(){
// page.content是源
console.log(page .content);
//需要调用phantom.exit()以防止挂起
phantom.exit();
});
将页面源打印为标准输出。
phantomjs printSource.js http://todomvc.com/examples/emberjs/
将页面源保存到文件中
phantomjs printSource.js http:// todomvc。 com / examples / emberjs /> ember.html
First of all, I am not looking for any help in development or testing environment. Also I am new to phantomjs and all I want is just the command line operation of phantomjs on linux terminal.
I have an html page whose body is rendered by some javascript code. What I need is I wanted to download that rendered html content using phantomjs.
I don't have any idea using phantomjs. I have a bit of experience in shell scripting. So I have tried to do this with curl
. But as curl is not sufficient to render javascript, I was able to get the html of the default source code only. The rendered contents weren't downloaded. I heard that ruby mechanize may do this job. But I have no knowledge about ruby. So on further investigation I found the command line tool phantomjs
. How can I do this with phantomjs
?
Please feel free to ask what all additional information do I need to provide.
Unfortunately, that is not possible using just the PhantomJS command line. You have to use a Javascript file to actually accomplish anything with PhantomJS.
Here is a very simple version of the script you can use
Code mostly copied from https://stackoverflow.com/a/12469284/4499924
printSource.js
var system = require('system');
var page = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
// page.content is the source
console.log(page.content);
// need to call phantom.exit() to prevent from hanging
phantom.exit();
});
To print the page source to standard out.
phantomjs printSource.js http://todomvc.com/examples/emberjs/
To save the page source in a file
phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html
这篇关于使用phantomjs获取JavaScript呈现的html源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!