Puppeteer:将循环结构转换为 JSON 您是否传递了嵌套的 JSHandle? [英] Puppeteer: Converting circular structure to JSON Are you passing a nested JSHandle?

查看:25
本文介绍了Puppeteer:将循环结构转换为 JSON 您是否传递了嵌套的 JSHandle?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取一页网站.有多种选择组合会导致不同的搜索重定向.我在 page.evaluate 的回调函数中写了一个 for 循环来点击不同的选择并在每个按钮中进行点击搜索.但是,我收到错误:将循环结构转换为 JSON 您是否正在传递嵌套的 JSHandle?

I am trying to scrape a one-page website. There are multiple selection combinations that would result in different search redirects. I wrote a for loop in the page.evaluate's call back function to click the different selections and did the click search in every button. However, I got error: Converting circular structure to JSON Are you passing a nested JSHandle?

请帮忙!

我当前的代码版本如下:

My current version of code looks like this:

const res = await page.evaluate(async (i, courseCountArr, page) => {
    for (let j = 1; j < courseCountArr[i]; j++) {
        await document.querySelectorAll('.btn-group > button, .bootstrap-select > button')['1'].click() // click on school drop down
        await document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a')[`${j}`].click() // click on each school option
        await document.querySelectorAll('.btn-group > button, .bootstrap-select > button')['2'].click() // click on subject drop down
        const subjectLen = document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a').length // length of the subject drop down
        for (let k = 1; k < subjectLen; k++) {
            await document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a')[`${k}`].click() // click on each subject option
            document.getElementById('buttonSearch').click() //click on search button
            page.waitForSelector('.strong, .section-body')
            return document.querySelectorAll('.strong, .section-body').length
        }
    }
}, i, courseCountArr, page);

推荐答案

为什么会发生错误

虽然您没有显示足够的代码来重现问题,但这里有一个显示可能模式的最小重现:

Why the error happens

While you haven't shown enough code to reproduce the problem, here's a minimal reproduction that shows the likely pattern:

const puppeteer = require("puppeteer");

let browser;
(async () => {
  const html = `<ul><li>a</li><li>b</li><li>c</li></ul>`;
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);

// ...
  const nestedHandle = await page.$$("li"); // $$ selects all matches
  await page.evaluate(els => {}, nestedHandle); // throws
// ...

})()
  .catch(err => console.error(err))
  .finally(async () => await browser.close())
;

输出是

TypeError: Converting circular structure to JSON
    --> starting at object with constructor 'BrowserContext'
    |     property '_browser' -> object with constructor 'Browser'
    --- property '_defaultContext' closes the circle Are you passing a nested JSHandle?
    at JSON.stringify (<anonymous>)

为什么会这样?page 回调中的所有代码.评估(和家庭:evaluateHandle, $eval, $$eval) 由 Puppeteer 以编程方式在浏览器控制台内执行.浏览器控制台是一个不同于 Node 的环境,它是 Puppeteer 和 elementHandles 所在的环境.为了弥补进程间的差距,对evaluate、参数和返回值的回调进行了序列化和反序列化.

Why is this happening? All code inside of the callback to page.evaluate (and family: evaluateHandle, $eval, $$eval) is executed inside the browser console programmatically by Puppeteer. The browser console is a distinct environment from Node, where Puppeteer and the elementHandles live. To bridge the inter-process gap, the callback to evaluate, parameters and return value are serialized and deserialized.

这样做的后果是,您无法像在浏览器中尝试使用 page.waitForSelector('.strong, .section-body') 那样访问任何 Node 状态.page 与浏览器处于完全不同的进程中.(顺便说一句,document.querySelectorAll 是完全同步的,所以 await 没有意义.)

The consequence of this is that you can't access any Node state like you're attempting with page.waitForSelector('.strong, .section-body') inside the browser. page is in a totally different process from the browser. (As an aside, document.querySelectorAll is purely synchronous, so there's no point in awaiting it.)

Puppeteer elementHandles 是复杂的结构用于挂钩页面的 DOM,该 DOM 无法在您尝试执行时序列化并传递给页面.Puppeteer 必须在幕后执行翻译.任何传递给 evaluate(或对它们调用 .evaluate())的 elementHandles 都被跟随到它们代表的浏览器中的 DOM 节点,并且该 DOM 节点是你的evaluate 的回调被调用.截至撰写本文时,Puppeteer 无法使用嵌套的 elementHandles 执行此操作.

Puppeteer elementHandles are complex structures used to hook into the page's DOM that can't be serialized and passed to the page as you're trying to do. Puppeteer has to perform the translation under the hood. Any elementHandles passed to evaluate (or have .evaluate() called on them) are followed to the DOM node in the browser that they represent, and that DOM node is what your evaluate's callback is invoked with. Puppeteer can't do this with nested elementHandles, as of the time of writing.

在上面的代码中,如果你改了<代码>.$$.$,您将只检索第一个

  • .这个单一的、非嵌套的 elementHandle 可以转换为一个元素:

    In the above code, if you change .$$ to .$, you'll retrieve only the first <li>. This singular, non-nested elementHandle can be converted to an element:

    // ...
      const handle = await page.$("li");
      const val = await page.evaluate(el => el.innerText, handle);
      console.log(val); // => a
    // ...
    

    或者:

    const handle = await page.$("li");
    const val = await handle.evaluate(el => el.innerText);
    console.log(val); // => a
    

    在您的示例中进行这项工作是交换循环和 evaluate 调用的问题,以便您访问 Puppeteer 土地中的 courseCountArr[i],解包嵌套elementHandles 成单独的参数以evaluate将您的大部分控制台浏览器调用移回 Puppeteer(取决于您的用例和代码目标).

    Making this work on your example is a matter of either swapping the loop and the evaluate call so that you access courseCountArr[i] in Puppeteer land, unpacking the nested elementHandles into separate parameters to evaluate, or moving most of your console browser calls to click on things back to Puppeteer (depending on your use case and goals with the code).

    您可以对每个 elementHandle 应用 evaluate 调用:

    You could apply the evaluate call to each elementHandle:

    const nestedHandles = await page.$$("li");
    
    for (const handle of nestedHandles) {
      const val = await handle.evaluate(el => el.innerText);
      console.log(val); // a b c
    }
    

    要获得一系列结果,您可以这样做:

    To get an array of results, you could do:

    const nestedHandles = await page.$$("li");
    const vals = await Promise.all(
      nestedHandles.map(el => el.evaluate(el => el.innerText))
    );
    console.log(vals); // [ 'a', 'b', 'c' ]
    

    您还可以将 elementHandles 解包为 evaluate 的参数,并在回调中使用 (...els) 参数列表:

    You can also unpack the elementHandles into arguments for evaluate and use the (...els) parameter list in the callback:

    const nestedHandles = await page.$$("li");
    const vals = await page.evaluate((...els) =>
      els.map(e => e.innerText),
      ...nestedHandles
    );
    console.log(vals); // => [ 'a', 'b', 'c' ]
    

    如果您除了句柄之外还有其他参数,您可以这样做:

    If you have other arguments in addition to the handles you can do:

    const nestedHandle = await page.$$("li");
    const vals = await page.evaluate((foo, bar, ...els) => 
      els.map(e => e.innerText + foo + bar)
    , 1, 2, ...nestedHandle);
    console.log(vals); // => [ 'a12', 'b12', 'c12' ]
    

    或:

    const nestedHandle = await page.$$("li");
    const vals = await page.evaluate(({foo, bar}, ...els) => 
      els.map(e => e.innerText + foo + bar)
    , {foo: 1, bar: 2}, ...nestedHandle);
    console.log(vals); // => [ 'a12', 'b12', 'c12' ]
    

    另一种选择可能是使用 $$eval,它选择多个句柄,然后在浏览器上下文中使用所选元素的数组作为其参数运行回调:

    Another option may be to use $$eval, which selects multiple handles, then runs a callback in browser context with the array of selected elements as its parameter:

    const vals = await page.$$eval("li", els => 
      els.map(e => e.innerText)
    );
    console.log(vals); // => [ 'a', 'b', 'c' ]
    

    如果您不使用 Node 中的句柄做任何其他事情,这可能是最干净的.

    This is probably cleanest if you're not doing anything else with the handles in Node.

    同样,您可以完全绕过 Puppeteer 并在浏览器上下文中进行整个选择和操作:

    Similarly, you can totally bypass Puppeteer and do the entire selection and manipulation in browser context:

    const vals = await page.evaluate(() =>
      [...document.querySelectorAll("li")].map(e => e.innerText)
    );
    console.log(vals); // => [ 'a', 'b', 'c' ]
    

    (请注意,获取内部文本只是您可能拥有的任意复杂浏览器代码的占位符)

    这篇关于Puppeteer:将循环结构转换为 JSON 您是否传递了嵌套的 JSHandle?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆