Puppeteer执行上下文被破坏,很可能是因为导航 [英] Puppeteer Execution context was destroyed, most likely because of a navigation

查看:3581
本文介绍了Puppeteer执行上下文被破坏,很可能是因为导航的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在另一个页面上获取数据时,我在for循环中的puppeteer中遇到了这个问题,然后当我回去时它出现了这个错误行:

 错误我们有错误错误:执行上下文被破坏,可能是因为导航。 

这是一个目录页面,每页包含15家公司,然后我想访问每家公司以获取信息。

 试试{
const browser = await pupputer.launch({
headless:false,
devtools:true,
defaultViewport:{
width:1100,
height:1000
}
});

const page = await browser.newPage();
await page.goto('MyLink');

await page.waitForSelector('。list-companies');

for(var i = 1; i< 10; i ++){

const listeCompanies = await page。$$('。list-companies> div.firm );

for(const公司的listeCompanies){

const name = await companie。$ eval('。listing-body> h3> a',name => name.innerText);
const link = await companie。$ eval('。listing-body> h3> a',link => link.href);

等待Promise.all([
page.waitForNavigation(),
page.goto(link),
page.waitForSelector('。firm-panel') ,
]);

const info = await page。$ eval('#info',e => e.innerText);

const data = [{
name:name,
information:info,
}];

await page.goBack();

}
等待Promise.all([
page.waitForNavigation(),
page.click('span.page> a [rel =next ]')
]);
}
} catch(e){
console.log('我们有错误',e);
}

我设法只获取第一家公司的数据。

解决方案

问题



错误意味着您正在访问已过时的数据/因导航无效。在您的脚本中,错误引用变量 listeCompanies

  const listeCompanies =等待页面。$$('。list-firm> div.firm'); 

首先,在循环中使用此变量,然后通过页面导航.goto 然后你的循环试图从变量 listeCompanies 中获取下一个项目。但导航发生后,该变量中的元素句柄不再存在,因此抛出错误。这也是第一次迭代工作的原因。



解决方案



有多种方法可以解决这个问题。


  1. 一次从页面中提取数据(在使用循环之前)

  2. 使用第二页进行循环导航,以便您的主页无需导航

  3. 通过在调用 page.goBack






选项1:在进入循环之前提取数据



这是最干净的方法。您可以立即在第一页中提取信息,然后迭代提取的数据。 nameLinkList 将是一个包含名称链接的数组值(例如 [{name:'..',link:'..'},{name:'..',link:'..'}} ) 。由于数据已被提取,因此无需在循环结束时调用 page.goBack

  const nameLinkList =等待页面。$$ eval(
'.list-firm> div.firm',
(firm = > firm.map(firm => {
const a = firm.querySelector('。listing-body> h3> a');
return {
name:a。 innerText,
link:a.href
};
}))
);

for(const {name,link} of arr){
await Promise.all([
page.waitForNavigation(),
page.goto(link) ,
page.waitForSelector('。firm-panel'),
]);

const info = await page。$ eval('#info',e => e.innerText);

const data = [{
name:name,
information:info,
}];
}



选项2:使用第二页



在这种情况下,您的浏览器将有两个打开的页面。第一个用于读取数据,第二个用于导航。

  const page2 =等待browser.newPage(); 
for(const公司的listeCompanies){
const name = await companie。$ eval('。listing-body> h3> a',name => name.innerText);
const link = await companie。$ eval('。listing-body> h3> a',link => link.href);

等待Promise.all([
page2.goto(link),
page2.waitForSelector('。firm-panel'),
]);

const info = await page2. $ eval('#info',e => e.innerText);
// ...
}



选项3:刷新选择器



在这里,您只需在返回主页后重新执行选择器。注意,当我们替换数组时, for..of 必须更改为迭代器循环。



< pre class =lang-js prettyprint-override> let listeCompanies = await page. $$('。list-firm> div.firm');
for(let i = 0; i< listeCompanies.length; i ++){
// ...

await page.goBack();
listeCompanies =等待页面。$$('。list-firm> div.firm');
}

我建议选择1,因为这也减少了必要的导航数量请求,因此将加快您的脚本。


I am facing this problem in puppeteer in a for loop when i go on another page to get data, then when i go back it comes me this error line:

Error "We have an error Error: the execution context was destroyed, probably because of a navigation."

It's a directory page that contains 15 companies per page and then I want to visit each company to get information.

try {
    const browser = await pupputer.launch({
        headless: false,
        devtools: true,
        defaultViewport: {
            width: 1100,
            height: 1000
        }
    });

    const page = await browser.newPage();
    await page.goto('MyLink');

    await page.waitForSelector('.list-firms');

    for (var i = 1; i < 10; i++) {

        const listeCompanies = await page.$$('.list-firms > div.firm');

        for (const companie of listeCompanies) {

            const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
            const link = await companie.$eval('.listing-body > h3 > a', link => link.href);

            await Promise.all([
                page.waitForNavigation(),
                page.goto(link),
                page.waitForSelector('.firm-panel'),
            ]);

            const info = await page.$eval('#info', e => e.innerText);

            const data = [{
                name: name,
                information: info,
            }];

            await page.goBack();

        }
        await Promise.all([
            page.waitForNavigation(),
            page.click('span.page > a[rel="next"]')
        ]);
    }
} catch (e) {
    console.log('We have error', e);
}

I managed to only get the data of the first company.

解决方案

Problem

The error means that you are accessing data which has become obsolete/invalid because of navigation. In your script the error references the variable listeCompanies:

const listeCompanies = await page.$$('.list-firms > div.firm');

You first, use this variable in a loop, then you navigate via page.goto and after that your loop tries to get the next item out of the variable listeCompanies. But after the navigation happened the element handles in that variable are not present anymore and therefore the error is thrown. That's also why the first iteration works.

Solution

There are multiple ways to fix this.

  1. Extract the data from your page at once (before using the loop)
  2. Use a second pageto do the "loop navigation" so that your main page does not need to navigate
  3. "Refresh" your variable by re-executing the selector after calling page.goBack


Option 1: Extract the data before entering the loop

This is the cleanest way to do it. You extract the information in the first page at once and then iterate over your extracted data. The nameLinkList will be an array with the name and link values (e.g. [{name: '..', link: '..'}, {name: '..', link: '..'}]). There is also no need to call page.goBack at the end of the loop as the data is already extracted.

const nameLinkList = await page.$$eval(
    '.list-firms > div.firm',
    (firms => firms.map(firm => {
        const a = firm.querySelector('.listing-body > h3 > a');
        return {
            name: a.innerText,
            link: a.href
        };
    }))
);

for (const {name, link} of arr) {
    await Promise.all([
        page.waitForNavigation(),
        page.goto(link),
        page.waitForSelector('.firm-panel'),
    ]);

    const info = await page.$eval('#info', e => e.innerText);

    const data = [{
        name: name,
        information: info,
    }];
}

Option 2: Use a second page

In this case your browser will have two open pages. The first one will only be used to read the data, the second one is used for navigation.

const page2 = await browser.newPage();
for (const companie of listeCompanies ){
    const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
    const link = await companie.$eval('.listing-body > h3 > a', link => link.href);

    await Promise.all([
        page2.goto(link),
        page2.waitForSelector('.firm-panel'),
    ]);

    const info = await page2.$eval('#info', e => e.innerText);
    // ...
}

Option 3: "Refresh" selectors

Here you simply re-execute your selector after going back to your "main page". Note, that the for..of has to be change to an iterator-loop as we are replacing the array.

let listeCompanies  = await page.$$('.list-firms > div.firm');
for (let i = 0; i < listeCompanies.length; i++){
    // ...

    await page.goBack();
    listeCompanies = await page.$$('.list-firms > div.firm');
}

I recommend to go with option 1 as this also reduced the number of necessary navigation requests and will therefore speed up your script.

这篇关于Puppeteer执行上下文被破坏,很可能是因为导航的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆