Puppeteer执行上下文被破坏,很可能是因为导航 [英] Puppeteer Execution context was destroyed, most likely because of a navigation
问题描述
当我在另一个页面上获取数据时,我在for循环中的puppeteer中遇到了这个问题,然后当我回去时它出现了这个错误行:
错误我们有错误错误:执行上下文被破坏,可能是因为导航。
这是一个目录页面,每页包含15家公司,然后我想访问每家公司以获取信息。
试试{
const browser = await pupputer.launch({
headless:false,
devtools:true,
defaultViewport:{
width:1100,
height:1000
}
});
const page = await browser.newPage();
await page.goto('MyLink');
await page.waitForSelector('。list-companies');
for(var i = 1; i< 10; i ++){
const listeCompanies = await page。$$('。list-companies> div.firm );
for(const公司的listeCompanies){
const name = await companie。$ eval('。listing-body> h3> a',name => name.innerText);
const link = await companie。$ eval('。listing-body> h3> a',link => link.href);
等待Promise.all([
page.waitForNavigation(),
page.goto(link),
page.waitForSelector('。firm-panel') ,
]);
const info = await page。$ eval('#info',e => e.innerText);
const data = [{
name:name,
information:info,
}];
await page.goBack();
}
等待Promise.all([
page.waitForNavigation(),
page.click('span.page> a [rel =next ]')
]);
}
} catch(e){
console.log('我们有错误',e);
}
我设法只获取第一家公司的数据。
问题
错误意味着您正在访问已过时的数据/因导航无效。在您的脚本中,错误引用变量 listeCompanies
:
const listeCompanies =等待页面。$$('。list-firm> div.firm');
首先,在循环中使用此变量,然后通过页面导航.goto
然后你的循环试图从变量 listeCompanies
中获取下一个项目。但导航发生后,该变量中的元素句柄不再存在,因此抛出错误。这也是第一次迭代工作的原因。
解决方案
有多种方法可以解决这个问题。
- 一次从页面中提取数据(在使用循环之前)
- 使用第二页进行循环导航,以便您的主页无需导航
- 通过在调用
page.goBack $后重新执行选择器来刷新您的变量c $ c>
选项1:在进入循环之前提取数据
这是最干净的方法。您可以立即在第一页中提取信息,然后迭代提取的数据。 nameLinkList
将是一个包含名称
和链接
的数组值(例如 [{name:'..',link:'..'},{name:'..',link:'..'}}
) 。由于数据已被提取,因此无需在循环结束时调用 page.goBack
。
const nameLinkList =等待页面。$$ eval(
'.list-firm> div.firm',
(firm = > firm.map(firm => {
const a = firm.querySelector('。listing-body> h3> a');
return {
name:a。 innerText,
link:a.href
};
}))
);
for(const {name,link} of arr){
await Promise.all([
page.waitForNavigation(),
page.goto(link) ,
page.waitForSelector('。firm-panel'),
]);
const info = await page。$ eval('#info',e => e.innerText);
const data = [{
name:name,
information:info,
}];
}
选项2:使用第二页
在这种情况下,您的浏览器将有两个打开的页面。第一个用于读取数据,第二个用于导航。
const page2 =等待browser.newPage();
for(const公司的listeCompanies){
const name = await companie。$ eval('。listing-body> h3> a',name => name.innerText);
const link = await companie。$ eval('。listing-body> h3> a',link => link.href);
等待Promise.all([
page2.goto(link),
page2.waitForSelector('。firm-panel'),
]);
const info = await page2. $ eval('#info',e => e.innerText);
// ...
}
选项3:刷新选择器
在这里,您只需在返回主页后重新执行选择器。注意,当我们替换数组时, for..of
必须更改为迭代器循环。
< pre class =lang-js prettyprint-override>
let listeCompanies = await page. $$('。list-firm> div.firm');
for(let i = 0; i< listeCompanies.length; i ++){
// ...
await page.goBack();
listeCompanies =等待页面。$$('。list-firm> div.firm');
}
我建议选择1,因为这也减少了必要的导航数量请求,因此将加快您的脚本。
I am facing this problem in puppeteer in a for loop when i go on another page to get data, then when i go back it comes me this error line:
Error "We have an error Error: the execution context was destroyed, probably because of a navigation."
It's a directory page that contains 15 companies per page and then I want to visit each company to get information.
try {
const browser = await pupputer.launch({
headless: false,
devtools: true,
defaultViewport: {
width: 1100,
height: 1000
}
});
const page = await browser.newPage();
await page.goto('MyLink');
await page.waitForSelector('.list-firms');
for (var i = 1; i < 10; i++) {
const listeCompanies = await page.$$('.list-firms > div.firm');
for (const companie of listeCompanies) {
const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
const link = await companie.$eval('.listing-body > h3 > a', link => link.href);
await Promise.all([
page.waitForNavigation(),
page.goto(link),
page.waitForSelector('.firm-panel'),
]);
const info = await page.$eval('#info', e => e.innerText);
const data = [{
name: name,
information: info,
}];
await page.goBack();
}
await Promise.all([
page.waitForNavigation(),
page.click('span.page > a[rel="next"]')
]);
}
} catch (e) {
console.log('We have error', e);
}
I managed to only get the data of the first company.
Problem
The error means that you are accessing data which has become obsolete/invalid because of navigation. In your script the error references the variable listeCompanies
:
const listeCompanies = await page.$$('.list-firms > div.firm');
You first, use this variable in a loop, then you navigate via page.goto
and after that your loop tries to get the next item out of the variable listeCompanies
. But after the navigation happened the element handles in that variable are not present anymore and therefore the error is thrown. That's also why the first iteration works.
Solution
There are multiple ways to fix this.
- Extract the data from your page at once (before using the loop)
- Use a second pageto do the "loop navigation" so that your main page does not need to navigate
- "Refresh" your variable by re-executing the selector after calling
page.goBack
Option 1: Extract the data before entering the loop
This is the cleanest way to do it. You extract the information in the first page at once and then iterate over your extracted data. The nameLinkList
will be an array with the name
and link
values (e.g. [{name: '..', link: '..'}, {name: '..', link: '..'}]
). There is also no need to call page.goBack
at the end of the loop as the data is already extracted.
const nameLinkList = await page.$$eval(
'.list-firms > div.firm',
(firms => firms.map(firm => {
const a = firm.querySelector('.listing-body > h3 > a');
return {
name: a.innerText,
link: a.href
};
}))
);
for (const {name, link} of arr) {
await Promise.all([
page.waitForNavigation(),
page.goto(link),
page.waitForSelector('.firm-panel'),
]);
const info = await page.$eval('#info', e => e.innerText);
const data = [{
name: name,
information: info,
}];
}
Option 2: Use a second page
In this case your browser will have two open pages. The first one will only be used to read the data, the second one is used for navigation.
const page2 = await browser.newPage();
for (const companie of listeCompanies ){
const name = await companie.$eval('.listing-body > h3 > a', name => name.innerText);
const link = await companie.$eval('.listing-body > h3 > a', link => link.href);
await Promise.all([
page2.goto(link),
page2.waitForSelector('.firm-panel'),
]);
const info = await page2.$eval('#info', e => e.innerText);
// ...
}
Option 3: "Refresh" selectors
Here you simply re-execute your selector after going back to your "main page". Note, that the for..of
has to be change to an iterator-loop as we are replacing the array.
let listeCompanies = await page.$$('.list-firms > div.firm');
for (let i = 0; i < listeCompanies.length; i++){
// ...
await page.goBack();
listeCompanies = await page.$$('.list-firms > div.firm');
}
I recommend to go with option 1 as this also reduced the number of necessary navigation requests and will therefore speed up your script.
这篇关于Puppeteer执行上下文被破坏,很可能是因为导航的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!