Puppeteer - 协议错误(Page.navigate):目标已关闭 [英] Puppeteer - Protocol error (Page.navigate): Target closed

查看:57
本文介绍了Puppeteer - 协议错误(Page.navigate):目标已关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如您在下面的示例代码中看到的,我在 Node 中使用 Puppeteer 和一组工作人员通过给定的 URL 运行多个网站屏幕截图请求:

As you can see with the sample code below, I'm using Puppeteer with a cluster of workers in Node to run multiple requests of websites screenshots by a given URL:

const cluster = require('cluster');
const express = require('express');
const bodyParser = require('body-parser');
const puppeteer = require('puppeteer');

async function getScreenshot(domain) {
    let screenshot;
    const browser = await puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'] });
    const page = await browser.newPage();

    try {
        await page.goto('http://' + domain + '/', { timeout: 60000, waitUntil: 'networkidle2' });
    } catch (error) {
        try {
            await page.goto('http://' + domain + '/', { timeout: 120000, waitUntil: 'networkidle2' });
            screenshot = await page.screenshot({ type: 'png', encoding: 'base64' });
        } catch (error) {
            console.error('Connecting to: ' + domain + ' failed due to: ' + error);
        }

    await page.close();
    await browser.close();

    return screenshot;
}

if (cluster.isMaster) {
    const numOfWorkers = require('os').cpus().length;
    for (let worker = 0; worker < numOfWorkers; worker++) {
        cluster.fork();
    }

    cluster.on('exit', function (worker, code, signal) {
        console.debug('Worker ' + worker.process.pid + ' died with code: ' + code + ', and signal: ' + signal);
        Cluster.fork();
    });

    cluster.on('message', function (handler, msg) {
        console.debug('Worker: ' + handler.process.pid + ' has finished working on ' + msg.domain + '. Exiting...');
        if (Cluster.workers[handler.id]) {
            Cluster.workers[handler.id].kill('SIGTERM');
        }
    });
} else {
    const app = express();
    app.use(bodyParser.json());
    app.listen(80, function() {
        console.debug('Worker ' + process.pid + ' is listening to incoming messages');
    });

    app.post('/screenshot', (req, res) => {
        const domain = req.body.domain;

        getScreenshot(domain)
            .then((screenshot) =>
                try {
                    process.send({ domain: domain });
                } catch (error) {
                    console.error('Error while exiting worker ' + process.pid + ' due to: ' + error);
                }

                res.status(200).json({ screenshot: screenshot });
            })
            .catch((error) => {
                try {
                    process.send({ domain: domain });
                } catch (error) {
                    console.error('Error while exiting worker ' + process.pid + ' due to: ' + error);
                }

                res.status(500).json({ error: error });
            });
    });
}

一些解释:

  1. 每次请求到达时,工作人员都会处理它并在最后杀死自己
  2. 每个worker创建一个包含单个页面的新浏览器实例,如果页面加载时间超过60秒,它将重试重新加载它(在同一页面中,因为可能已经加载了一些资源),超时为120秒
  3. 完成后页面和浏览器都将关闭

我的问题是一些合法域出现了我无法解释的错误:

My problem is that some legitimate domains get errors that I can't explain:

Error: Protocol error (Page.navigate): Target closed.

Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.

我在一些 git 问题(我现在找不到)中读到,当页面重定向并在开头添加www"时可能会发生这种情况,但我希望它是错误的...有什么我遗漏的吗?

I read at some git issue (that I can't find now) that it can happen when the page redirects and adds 'www' at the start, but I'm hoping it's false... Is there something I'm missing?

推荐答案

目标已关闭"是什么意思

当您通过 puppeteer.launch 启动浏览器时,它将启动浏览器并连接到它.从那里,您在打开的浏览器上执行的任何功能(如 page.goto)都将通过 Chrome DevTools 协议到浏览器.目标在此上下文中表示选项卡.

What "Target closed" means

When you launch a browser via puppeteer.launch it will start a browser and connect to it. From there on any function you execute on your opened browser (like page.goto) will be send via the Chrome DevTools Protocol to the browser. A target means a tab in this context.

当您尝试运行函数但目标(选项卡)已经关闭时,会抛出 Target closed 异常.

The Target closed exception is thrown when you are trying to run a function, but the target (tab) was already closed.

最近更改了错误消息以提供更有意义的信息.它现在给出以下消息:

The error message was recently changed to give more meaningful information. It now gives the following message:

错误:协议错误(Target.activateTarget):会话已关闭.很可能该页面已关闭.

Error: Protocol error (Target.activateTarget): Session closed. Most likely the page has been closed.

<小时>

为什么会这样

造成这种情况的原因有多种.


Why does it happen

There are multiple reasons why this could happen.

  • 您使用了已关闭的资源

您之所以看到此消息,很可能是因为您关闭了选项卡/浏览器并且仍在尝试使用该资源.举个简单的例子:

Most likely, you are seeing this message because you closed the tab/browser and are still trying to use the resource. To give an simple example:

const browser = await puppeteer.launch();
const page = await browser.newPage();

await browser.close();
await page.goto('http://www.google.com');

在这种情况下,浏览器已关闭,然后调用 page.goto 导致错误消息.大多数时候,它不会那么明显.也许在清理任务期间错误处理程序已经关闭了页面,而您的脚本仍在爬行.

In this case the browser was closed and after that, a page.goto was called resulting in the error message. Most of the time, it will not be that obvious. Maybe an error handler already closed the page during a cleanup task, while your script is still crawling.

浏览器崩溃或无法初始化

每隔几百个请求我也会遇到这种情况.puppeteer 存储库上也有一个有关此问题.当您使用大量内存或 CPU 能力时,情况似乎如此.也许你正在催生很多浏览器?在这些情况下,浏览器可能会崩溃或断开连接.

I also experience this every few hundred requests. There is an issue about this on the puppeteer repository as well. It seems to be the case, when you are using a lot of memory or CPU power. Maybe you are spawning a lot of browser? In these cases the browser might crash or disconnect.

我没有找到解决此问题的银弹"解决方案.但您可能想查看图书馆 puppeteer-cluster(免责声明:我是作者)它处理这些类型的错误情况,让您在错误发生时重试 URL.它还可以管理浏览器实例池并简化您的代码.

I found no "silver bullet" solution to this problem. But you might want to check out the library puppeteer-cluster (disclaimer: I'm the author) which handles these kind of error cases and let's you retry the URL when the error happens. It can also manage a pool of browser instances and would also simplify your code.

这篇关于Puppeteer - 协议错误(Page.navigate):目标已关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆