无头Chrome(Puppeteer)-如何获取对文档节点元素的访问? [英] Headless Chrome ( Puppeteer ) - how to get access to document node element?

查看:174
本文介绍了无头Chrome(Puppeteer)-如何获取对文档节点元素的访问?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 phantomJs 解析一些内容,并从中获取一些信息(最大例如,页面上的图片大小)等等.我决定转到 puppeteer .而且我遇到了这个问题-在我的函数中,该函数在phantomJs上运行,它们正在处理文档节点元素.因此,据我所知,在puppeteer中,无法从page.evaluate和其他函数返回节点元素.那么,还有其他方法可以解决这个问题吗?还是我必须使用另一个库?谢谢!

I'm using phantomJs to parse some content, get some info from it (max image size on page, for example), etc. I've decided to move to puppeteer. And i had faced the issue - in my functions, that was running at phantomJs, they were working with document node element. So, in puppeteer, as i understood, it's impossible to return node element from page.evaluate and other functions. So, is there any other way to overcome this problem? Or maybe i have to use another library? Thank you!

推荐答案

使用Puppeteer时需要考虑两种环境:

There are two environments to consider when using Puppeteer:

  1. Node.js环境
  2. 页面DOM环境

Node.js环境是基于Google的Chrome V8 JavaScript引擎构建的.

The Node.js environment is built upon Google's Chrome V8 JavaScript engine.

Chrome V8描述其与DOM的关系:

JavaScript最常用于浏览器中的客户端脚本编写,例如用于处理文档对象模型(DOM)对象.但是,DOM通常不是由JavaScript引擎提供,而是由浏览器提供.对于V8,情况也是如此-Google Chrome提供了DOM.但是V8确实提供了ECMA标准中指定的所有数据类型,运算符,对象和函数.

JavaScript is most commonly used for client-side scripting in a browser, being used to manipulate Document Object Model (DOM) objects for example. The DOM is not, however, typically provided by the JavaScript engine but instead by a browser. The same is true of V8—Google Chrome provides the DOM. V8 does however provide all the data types, operators, objects and functions specified in the ECMA standard.

换句话说,默认情况下,Node.js不提供DOM.

In other words, the DOM is not provided by default to Node.js.

这意味着Node.js不能自行解释DOM元素.

This means that Node.js does not have the capability to interpret DOM elements on its own.

这是Puppeteer出现的地方.

Puppeteer函数 page.evaluate() 允许您使用Chrome或Chromium在当前Page DOM上下文中评估表达式.

The Puppeteer function page.evaluate() allows you to evaluate an expression in the current Page DOM context using Chrome or Chromium.

木偶文档描述了什么当您尝试返回不可序列化的值(例如DOM元素)时会发生这种情况:

The Puppeteer documentation describes what happens when you attempt to return a non-serializable value, like a DOM element:

如果传递给 page.evaluate 的函数返回不可序列化的值,则 page.evaluate 解析为 undefined .

If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined.

同样,这是因为Node.js不知道如何在没有帮助的情况下解释DOM元素.

Again, this is because Node.js does not know how to interpret DOM elements without help.

结果,Puppeteer实现了 <代表页内DOM元素的code> ElementHandle 类.

As a result, Puppeteer has implemented an ElementHandle class which represents an in-page DOM element.

您可以使用 elementHandle.$() elementHandle.$$()

You can use elementHandle.$(), elementHandle.$$(), or elementHandle.$x() to return ElementHandles back to Node.js.

ElementHandle 类是可序列化的,因此可以在Node.js环境中正确解释.

The ElementHandle class is serializable, so that it can be interpreted properly in the Node.js environment.

因此,如果需要直接操作元素,则可以在 page.evaluate()内部进行操作.如果需要访问元素的表示形式,请使用 page.$()或其相关功能之一.

Therefore, if you need to manipulate an element directly, you can do so inside page.evaluate(). If you need to access a representation of an element, use page.$() or one of its related functions.

这篇关于无头Chrome(Puppeteer)-如何获取对文档节点元素的访问?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆