同时运行多个Puppeteer实例是否安全? [英] Is it safe to run multiple instances of Puppeteer at the same time?

查看:595
本文介绍了同时运行多个Puppeteer实例是否安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在同时运行

  1. 进程级别(同时多个node screenshot.js)或
  2. 在脚本级别(同时多个puppeteer.launch())?
  1. the process level (multiple node screenshot.js at the same time) or
  2. at the script level (multiple puppeteer.launch() at the same time)?

对并行进程的建议设置/限制是什么?

What are the recommended settings/limits on parallel processes?

(在我的测试中,(1)似乎可以正常工作,但是我想知道Puppeteer与Chrome的单个(?)实例进行交互的可靠性.我没有尝试(2),但似乎不太可能解决.)

(In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.)

推荐答案

可以并行运行多个浏览器,上下文甚至页面.限制取决于您的网络/磁盘/内存和任务设置.

It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network/disk/memory and task setup.

我不时爬行了几百万个页面(在我的设置中,每个〜 10,000页)木偶将崩溃.因此,您应该有一种方法可以自动重新启动浏览器并重试该作业.

I crawled a few million pages and from time to time (in my setup, every ~10,000 pages) puppeteer will crash. Therefore, you should have a way to auto-restart the browser and retry the job.

您可能想查看 puppeteer-cluster ,它负责合并浏览器实例,重新启动以及崩溃检测/重新启动. (免责声明:我是作者)

You might want to check out puppeteer-cluster, which takes care of pooling the browser instances, restarting and crash detection/restarting. (Disclaimer: I'm the author)

创建集群的示例如下:

// create a cluster that handles 10 parallel browsers
const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_BROWSER,
    maxConcurrency: 10,
});

// Queue your jobs (one example)
cluster.queue(async ({ page }) => {
    await page.goto('http://www.wikipedia.org');
    await page.screenshot({path: 'wikipedia.png'});
});

这只是一个最小的例子.还有更多使用群集的方法.

This is just a minimal example. There are many more ways to use the cluster.

这篇关于同时运行多个Puppeteer实例是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆