Puppeteer 执行上下文被破坏,很可能是因为导航 [英] Puppeteer Execution context was destroyed, most likely because of a navigation

查看:31
本文介绍了Puppeteer 执行上下文被破坏,很可能是因为导航的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我转到另一个页面获取数据时,我在 for 循环中的 puppeteer 中遇到了这个问题,然后当我返回时出现此错误行:

I am facing this problem in puppeteer in a for loop when i go on another page to get data, then when i go back it comes me this error line:

Error "We have an error Error: the execution context was destroyed, probably because of a navigation."

这是一个目录页面,每页包含 15 家公司,然后我想访问每个公司以获取信息.

It's a directory page that contains 15 companies per page and then I want to visit each company to get information.

try {
    const browser = await pupputer.launch({
        headless: false,
        devtools: true,
        defaultViewport: {
            width: 1100,
            height: 1000
        }
    });

    const page = await browser.newPage();
    await page.goto('MyLink');

    await page.waitForSelector('.list-firms');

    for (var i = 1; i < 10; i++) {

        const listeCompanies = await page.$$('.list-firms > div.firm');

        for (const companie of listeCompanies) {

            const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
            const link = await companie.$eval('.listing-body > h3 > a', link => link.href);

            await Promise.all([
                page.waitForNavigation(),
                page.goto(link),
                page.waitForSelector('.firm-panel'),
            ]);

            const info = await page.$eval('#info', e => e.innerText);

            const data = [{
                name: name,
                information: info,
            }];

            await page.goBack();

        }
        await Promise.all([
            page.waitForNavigation(),
            page.click('span.page > a[rel="next"]')
        ]);
    }
} catch (e) {
    console.log('We have error', e);
}

我只得到了第一家公司的数据.

I managed to only get the data of the first company.

推荐答案

问题

该错误意味着您正在访问由于导航而变得过时/无效的数据.在您的脚本中,错误引用了变量 listeCompanies:

const listeCompanies = await page.$$('.list-firms > div.firm');

您首先在循环中使用此变量,然后通过 page.goto 导航,然后您的循环尝试从变量 listeCompanies 中取出下一项.但是在导航发生后,该变量中的元素句柄不再存在,因此会引发错误.这也是第一次迭代成功的原因.

You first, use this variable in a loop, then you navigate via page.goto and after that your loop tries to get the next item out of the variable listeCompanies. But after the navigation happened the element handles in that variable are not present anymore and therefore the error is thrown. That's also why the first iteration works.

有多种方法可以解决此问题.

There are multiple ways to fix this.

  1. 立即从页面中提取数据(在使用循环之前)
  2. 使用第二个页面进行循环导航",这样您的主页就不需要导航
  3. 通过在调用 page.goBack
  4. 后重新执行选择器来刷新"你的变量
  1. Extract the data from your page at once (before using the loop)
  2. Use a second pageto do the "loop navigation" so that your main page does not need to navigate
  3. "Refresh" your variable by re-executing the selector after calling page.goBack

<小时>

方案一:进入循环前提取数据

这是最干净的方法.您一次提取第一页中的信息,然后迭代提取的数据.nameLinkList 将是一个包含 namelink 值的数组(例如 [{name: '..', link: '..'}, {name: '..', link: '..'}]).也不需要在循环结束时调用 page.goBack 因为数据已经被提取了.


Option 1: Extract the data before entering the loop

This is the cleanest way to do it. You extract the information in the first page at once and then iterate over your extracted data. The nameLinkList will be an array with the name and link values (e.g. [{name: '..', link: '..'}, {name: '..', link: '..'}]). There is also no need to call page.goBack at the end of the loop as the data is already extracted.

const nameLinkList = await page.$$eval(
    '.list-firms > div.firm',
    (firms => firms.map(firm => {
        const a = firm.querySelector('.listing-body > h3 > a');
        return {
            name: a.innerText,
            link: a.href
        };
    }))
);

for (const {name, link} of arr) {
    await Promise.all([
        page.waitForNavigation(),
        page.goto(link),
        page.waitForSelector('.firm-panel'),
    ]);

    const info = await page.$eval('#info', e => e.innerText);

    const data = [{
        name: name,
        information: info,
    }];
}

选项 2:使用第二页

在这种情况下,您的浏览器将有两个打开的页面.第一个只用于读取数据,第二个用于导航.

Option 2: Use a second page

In this case your browser will have two open pages. The first one will only be used to read the data, the second one is used for navigation.

const page2 = await browser.newPage();
for (const companie of listeCompanies ){
    const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
    const link = await companie.$eval('.listing-body > h3 > a', link => link.href);

    await Promise.all([
        page2.goto(link),
        page2.waitForSelector('.firm-panel'),
    ]);

    const info = await page2.$eval('#info', e => e.innerText);
    // ...
}

选项 3:刷新"选择器

在这里,您只需在返回主页"后重新执行选择器即可.请注意,当我们替换数组时,for..of 必须更改为迭代器循环.

Option 3: "Refresh" selectors

Here you simply re-execute your selector after going back to your "main page". Note, that the for..of has to be change to an iterator-loop as we are replacing the array.

let listeCompanies  = await page.$$('.list-firms > div.firm');
for (let i = 0; i < listeCompanies.length; i++){
    // ...

    await page.goBack();
    listeCompanies = await page.$$('.list-firms > div.firm');
}

我建议使用选项 1,因为这也减少了必要的导航请求的数量,因此会加快您的脚本速度.

I recommend to go with option 1 as this also reduced the number of necessary navigation requests and will therefore speed up your script.

这篇关于Puppeteer 执行上下文被破坏,很可能是因为导航的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆