使用 Phantom.js 评估,如何获取页面的 HTML? [英] Using Phantom.js evaluate, how can I get the HTML of the page?

查看:21
本文介绍了使用 Phantom.js 评估,如何获取页面的 HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

page.evaluate(function() { return document; }, function(result){    
    console.log(result)                    
    next();
});

result 实际上是一个巨大的对象.我不知道该对象的属性和属性.我只想要页面的 HTML ,就像您在 Chrome 检查器中看到的那样.

result is actually a huge object. I don't know the properties and attributes of that object. I just want the HTML of the page as you would see it in Chrome inspector.

从对象的外观来看,HTML 似乎包含 CSS 和 javascript..这很奇怪.用户不应看到 CSS 和 javascript,因为它们不是网页的 HTML.那些是外部文件.我只想要用户会看到的 HTML.

From the look of the object, it seems that the HTML includes CSS and javascript..which is weird. The user should not see the CSS and javascript, because they are not the web page's HTML. Those are external files. I only want the HTML that the user would see.

推荐答案

document 的类型是 HTML 文档.要将整个 DOM 作为字符串获取,您可以执行 document.documentElement.outerHTML.

The type of document is an HTML document. To get the entire DOM as a string, you could do document.documentElement.outerHTML.

从外部evaluate,你可以使用page.content.这是一个字符串.

From outside evaluate, you can use page.content. It is a string.

我不知道您所说的HTML 包含 CSS 和 JavaScript"或网页的 HTML"是什么意思.您指的是页面源代码和脚本修改后的 DOM 之间的区别吗?以上两个都为您提供了当前的 DOM,而不是原始页面源.

I don't know what you mean by "HTML includes CSS and JavaScript" or "the web page's HTML". Are you referring to the difference between the page source and the DOM as modified by scripting? Both the above give you the current DOM, not the original page source.

这篇关于使用 Phantom.js 评估,如何获取页面的 HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆