同时运行多个Puppeteer实例是否安全? [英] Is it safe to run multiple instances of Puppeteer at the same time?
问题描述
在同时运行
- 进程级别(同时多个
node screenshot.js
)或 - 在脚本级别(同时多个
puppeteer.launch()
)?
- the process level (multiple
node screenshot.js
at the same time) or - at the script level (multiple
puppeteer.launch()
at the same time)?
对并行进程的建议设置/限制是什么?
What are the recommended settings/limits on parallel processes?
(在我的测试中,(1)似乎可以正常工作,但是我想知道Puppeteer与Chrome的单个(?)实例进行交互的可靠性.我没有尝试(2),但似乎不太可能解决.)
(In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.)
推荐答案
可以并行运行多个浏览器,上下文甚至页面.限制取决于您的网络/磁盘/内存和任务设置.
It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network/disk/memory and task setup.
我不时爬行了几百万个页面(在我的设置中,每个〜 10,000页)木偶将崩溃.因此,您应该有一种方法可以自动重新启动浏览器并重试该作业.
I crawled a few million pages and from time to time (in my setup, every ~10,000 pages) puppeteer will crash. Therefore, you should have a way to auto-restart the browser and retry the job.
您可能想查看 puppeteer-cluster ,它负责合并浏览器实例,重新启动以及崩溃检测/重新启动. (免责声明:我是作者)
You might want to check out puppeteer-cluster, which takes care of pooling the browser instances, restarting and crash detection/restarting. (Disclaimer: I'm the author)
创建集群的示例如下:
// create a cluster that handles 10 parallel browsers
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10,
});
// Queue your jobs (one example)
cluster.queue(async ({ page }) => {
await page.goto('http://www.wikipedia.org');
await page.screenshot({path: 'wikipedia.png'});
});
这只是一个最小的例子.还有更多使用群集的方法.
This is just a minimal example. There are many more ways to use the cluster.
这篇关于同时运行多个Puppeteer实例是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!