使用Javascript获取最终HTML将Java呈现为字符串 [英] Getting Final HTML with Javascript rendered Java as String
问题描述
我想从HTML页面获取数据(刮掉它)。但它包含javascript的评论。在普通的java url fetch中,我只获取了没有执行Javascript的HTML(实际的)。我想要执行Javascript的最终页面。
I want to fetch data from an HTML page(scrape it). But it contains reviews in javascript. In normal java url fetch I am only getting the HTML(actual one) without Javascript executed. I want the final page with Javascript executed.
示例: - http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy- akki-051207.asp
此页面的评论为facebook插件,以Javascript格式提取。
This page has comments as a facebook plugin which are fetched as Javascript.
即便如此。
http://www.imdb.com/title/tt0848228/reviews
我该怎么办?
推荐答案
使用 phantomjs : http://phantomjs.org
var page = require('webpage').create();
page.open("http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy-akki-051207.asp")
setTimeout(function(){
// Where you want to save it
page.render("screenshoot.png")
// You can access its content using jQuery
var fbcomments = page.evaluate(function(){
return $(".fb-comments iframe").contents().find(".postContainer")
})
},10000)
你必须在幻影中使用该选项 - web-security = no
允许跨域交互(即对于facebook iframe)
You have to use the option in phantom --web-security=no
to allow cross-domain interaction (ie for facebook iframe)
要与phantomjs中的其他应用程序通信,您可以使用Web服务器或进行POST要求: https://github.com/ariya/phantomjs/blob/master /examples/post.js
To communicate with other applications from phantomjs you can use a web server or make a POST request: https://github.com/ariya/phantomjs/blob/master/examples/post.js
这篇关于使用Javascript获取最终HTML将Java呈现为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!