木偶-协议错误(Page.navigate):目标已关闭 [英] Puppeteer - Protocol error (Page.navigate): Target closed

查看:914
本文介绍了木偶-协议错误(Page.navigate):目标已关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如您在下面的示例代码中看到的那样,我将Puppeteer与Node中的一组工作人员一起使用,以通过给定的URL运行网站截图的多个请求:

As you can see with the sample code below, I'm using Puppeteer with a cluster of workers in Node to run multiple requests of websites screenshots by a given URL:

const cluster = require('cluster');
const express = require('express');
const bodyParser = require('body-parser');
const puppeteer = require('puppeteer');

async function getScreenshot(domain) {
    let screenshot;
    const browser = await puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'] });
    const page = await browser.newPage();

    try {
        await page.goto('http://' + domain + '/', { timeout: 60000, waitUntil: 'networkidle2' });
    } catch (error) {
        try {
            await page.goto('http://' + domain + '/', { timeout: 120000, waitUntil: 'networkidle2' });
            screenshot = await page.screenshot({ type: 'png', encoding: 'base64' });
        } catch (error) {
            console.error('Connecting to: ' + domain + ' failed due to: ' + error);
        }

    await page.close();
    await browser.close();

    return screenshot;
}

if (cluster.isMaster) {
    const numOfWorkers = require('os').cpus().length;
    for (let worker = 0; worker < numOfWorkers; worker++) {
        cluster.fork();
    }

    cluster.on('exit', function (worker, code, signal) {
        console.debug('Worker ' + worker.process.pid + ' died with code: ' + code + ', and signal: ' + signal);
        Cluster.fork();
    });

    cluster.on('message', function (handler, msg) {
        console.debug('Worker: ' + handler.process.pid + ' has finished working on ' + msg.domain + '. Exiting...');
        if (Cluster.workers[handler.id]) {
            Cluster.workers[handler.id].kill('SIGTERM');
        }
    });
} else {
    const app = express();
    app.use(bodyParser.json());
    app.listen(80, function() {
        console.debug('Worker ' + process.pid + ' is listening to incoming messages');
    });

    app.post('/screenshot', (req, res) => {
        const domain = req.body.domain;

        getScreenshot(domain)
            .then((screenshot) =>
                try {
                    process.send({ domain: domain });
                } catch (error) {
                    console.error('Error while exiting worker ' + process.pid + ' due to: ' + error);
                }

                res.status(200).json({ screenshot: screenshot });
            })
            .catch((error) => {
                try {
                    process.send({ domain: domain });
                } catch (error) {
                    console.error('Error while exiting worker ' + process.pid + ' due to: ' + error);
                }

                res.status(500).json({ error: error });
            });
    });
}

一些解释:

  1. 每次请求到达时,工作人员都会对其进行处理并最终将其杀死
  2. 每个工作程序都使用一个页面创建一个新的浏览器实例,如果页面加载时间超过60秒,它将尝试以120秒的超时尝试重新加载它(在同一页面中,因为可能已经加载了某些资源)
  3. 完成页面后,浏览器将关闭

我的问题是某些合法域会收到我无法解释的错误:

My problem is that some legitimate domains get errors that I can't explain:

Error: Protocol error (Page.navigate): Target closed.

Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed.

我阅读了一些git问题(我现在找不到),它可能在页面重定向并在开始时添加"www"时发生,但是我希望它是错误的... 有什么我想念的吗?

I read at some git issue (that I can't find now) that it can happen when the page redirects and adds 'www' at the start, but I'm hoping it's false... Is there something I'm missing?

推荐答案

目标已关闭"的含义

通过puppeteer.launch启动浏览器时,它将启动浏览器并连接到该浏览器.从那里开始,您在打开的浏览器中执行的任何功能(例如page.goto)都将通过 Chrome DevTools发送浏览器的协议.在这种情况下,目标是指制表符.

What "Target closed" means

When you launch a browser via puppeteer.launch it will start a browser and connect to it. From there on any function you execute on your opened browser (like page.goto) will be send via the Chrome DevTools Protocol to the browser. A target means a tab in this context.

尝试运行函数时会抛出目标已关闭异常,但是目标(选项卡)已关闭.

The Target closed exception is thrown when you are trying to run a function, but the target (tab) was already closed.

该错误消息最近已更改为提供更多有意义的信息.现在,它显示以下消息:

The error message was recently changed to give more meaningful information. It now gives the following message:

错误:协议错误(Target.activateTarget):会话已关闭.页面很可能已关闭.

Error: Protocol error (Target.activateTarget): Session closed. Most likely the page has been closed.


为什么会发生

发生这种情况的原因有多种.


Why does it happen

There are multiple reasons why this could happen.

  • 您使用了已经关闭的资源

很可能您看到此消息,因为您关闭了选项卡/浏览器,并且仍在尝试使用资源.举一个简单的例子:

Most likely, you are seeing this message because you closed the tab/browser and are still trying to use the resource. To give an simple example:

const browser = await puppeteer.launch();
const page = await browser.newPage();

await browser.close();
await page.goto('http://www.google.com');

在这种情况下,浏览器已关闭,此后,调用page.goto导致错误消息.大多数时候,情况不会那么明显.

In this case the browser was closed and after that, a page.goto was called resulting in the error message. Most of the time, it will not be that obvious. Maybe an error handler already closed the page during a cleanup task, while your script is still crawling.

浏览器崩溃或无法初始化

每隔几百个请求,我也会遇到一次.在puppeteer存储库中也有关于此的问题.当您使用大量内存或CPU电源时,情况似乎确实如此.也许您产生了很多浏览器?在这种情况下,浏览器可能会崩溃或断开连接.

I also experience this every few hundred requests. There is an issue about this on the puppeteer repository as well. It seems to be the case, when you are using a lot of memory or CPU power. Maybe you are spawning a lot of browser? In these cases the browser might crash or disconnect.

我没有找到解决此问题的灵丹妙药"解决方案.但是您可能想查看库 puppeteer-cluster (免责声明:我是作者)处理此类错误情况,让我们在发生错误时重试URL.它还可以管理一组浏览器实例,并且还可以简化您的代码.

I found no "silver bullet" solution to this problem. But you might want to check out the library puppeteer-cluster (disclaimer: I'm the author) which handles these kind of error cases and let's you retry the URL when the error happens. It can also manage a pool of browser instances and would also simplify your code.

这篇关于木偶-协议错误(Page.navigate):目标已关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆