限制 Chrome 无头 CPU 和内存使用 [英] Limit chrome headless CPU and memory usage

查看:62
本文介绍了限制 Chrome 无头 CPU 和内存使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 selenium 通过以下命令运行 chrome headless:

system "LC_ALL=C google-chrome --headless --enable-logging --hide-scrollbars --remote-debugging-port=#{debug_port} --remote-debugging-address=0.0.0.0--disable-gpu --no-sandbox --ignore-certificate-errors &"

然而,chrome headless 似乎消耗了太多内存和 CPU,有人知道我们如何限制 chrome headless 的 CPU/内存使用吗?或者如果有一些解决方法.

提前致谢.

解决方案

关于不可预测的CPU内存消耗Chrome Headless 会话.

根据讨论

图:Headless Chrome 的易失性资源使用情况

  • 不要运行无头浏览器:

    从各方面来看,如果可能的话,不要运行无头浏览器.无头浏览器是不可预测和饥饿的.几乎您可以使用浏览器执行的所有操作(除了插入和运行 JavaScript)都可以使用简单的 Linux 工具完成.有些库提供了优雅的 Node API,用于通过 HTTP 请求抓取获取数据(如果这是您的最终目标).

  • 在不需要时不要运行无头浏览器:

    有些用户试图让浏览器保持打开状态,即使在不使用时,它也始终可用于连接.虽然这可能是帮助加快会话启动的好策略,但它只会在几个小时后以痛苦告终.这主要是因为浏览器喜欢缓存内容并慢慢消耗更多内存.任何时候您不主动使用浏览器,请关闭它!

  • 与浏览器并行,而不是页面:

    我们应该只在绝对必要时运行一个会话,下一个最佳实践是通过每个浏览器只运行一个会话.虽然您实际上可以通过页面并行工作来节省一些开销,但如果一个页面崩溃,它可能会导致整个浏览器崩溃.此外,不能保证每个页面都完全干净(cookie 和存储可能会泄漏).

  • page.waitForNavigation:

    观察到的最常见问题之一是触发页面加载的操作以及脚本执行的突然丢失.这是因为触发 pageload 的操作通常会导致后续工作被吞没.为了解决这个问题,您通常必须调用页面加载操作并立即等待下一个页面加载.

  • 使用 docker 来包含所有内容:

    Chrome 需要很多依赖才能正常运行.即使在所有这些都完成之后,您还必须担心字体和幻影进程之类的事情,因此使用某种容器来容纳它是理想的选择.Docker 几乎是为此任务定制的,因为您可以限制可用资源的数量并将其沙箱化.自己创建Dockerfile.

    为了避免遇到僵尸进程(Chrome 经常发生这种情况),您需要使用类似 dumb-init 之类的东西来正确启动.

  • 两种不同的运行时:

    可能有两个 JavaScript 运行时在运行(Node 和浏览器).这对于可共享性而言非常有用,但它以混淆为代价,因为某些页面方法将要求您显式传入引用(而不是使用闭包或提升来这样做).

    举个例子,当在协议的内部使用 page.evaluate 时,这实际上字符串化 函数并将它传递给 Chrome,所以像闭包这样的东西并且吊装根本不起作用.如果您需要将一些引用或值传递到评估调用中,只需将它们作为参数附加即可得到正确处理.

参考:运行 200 万次无头会话的观察

I am using selenium to run chrome headless with the following command:

system "LC_ALL=C google-chrome --headless --enable-logging --hide-scrollbars --remote-debugging-port=#{debug_port} --remote-debugging-address=0.0.0.0 --disable-gpu --no-sandbox --ignore-certificate-errors &"

However it appears that chrome headless is consuming too much memory and cpu,anyone know how we can limit CPU/Memory usage of chrome headless? Or if there is some workaround.

Thanks in advance.

解决方案

There had been a lot of discussion going around about the unpredictable CPU and Memory Consumption by Chrome Headless sessions.

As per the discussion Building headless for minimum cpu+mem usage the CPU + Memory usage can be optimized by:

  • Using either a custom proxy or C++ ProtocolHandlers you could return stub 1x1 pixel images or even block them entirely.
  • Chromium Team is working on adding a programmatic control over when frames are produced. Currently headless chrome is still trying to render at 60 fps which is rather wasteful. Many pages do need a few frames (maybe 10-20 fps) to render properly (due to usage of requestAnimationFrame and animation triggers) but we expect there are a lot of CPU savings to be had here.
  • MemoryInfra should help you determine which component is the biggest consumer of memory in your setup.
  • An usage can be:

    $ headless_shell --remote-debugging-port=9222 --trace-startup=*,disabled-by-default-memory-infra http://www.chromium.org
    

  • Chromium is always going to use as much resources as are available to it. If you want to effectively limit it's utilization, you should look into using cgroups


Having said the above mentioned points here are some of the common best practices to adapt when running headless browsers in a production environment:

Fig: Volatile resource usage of Headless Chrome

  • Don't run a headless browser:

    By all accounts, if at all possible, just don't run a headless browser. Headless browsers are un-predictable and hungry. Almost everything you can do with a browser (save for interpolating and running JavaScript) can be done with simple Linux tools. There are libraries those offer elegant Node API's for fetching data via HTTP requests and scraping if that's your end-goal.

  • Don't run a headless browser when you don't need to:

    There are users those attempt to keep the browser open, even when not in use, so that it's always available for connections. While this might be a good strategy to help expedite session launch it'll only end in misery after a few hours. This is largely because browsers like to cache stuff and slowly eat more memory. Any time you're not actively using the browser, close it!

  • Parallelize with browsers, not pages:

    We should only run one when absolutely necessary, the next best-practice is to run only one session through each browser. While you actually might save some overhead by parallelizing work through pages, if one page crashes it can bring down the entire browser with it. That, plus each page isn't guaranteed to be totally clean (cookies and storage might bleed-through).

  • page.waitForNavigation:

    One of the most common issues observed are the actions that trigger a pageload, and the sudden loss of your scripts execution. This is because actions that trigger a pageload can often cause subsequent work to get swallowed. In order to get around this issue, you will generally have to invoke the page-loading-action and immediately wait for the next pageload.

  • Use docker to contain it all:

    Chrome takes a lot of dependencies to get running properly. Even after all of that's complete then there's things like fonts and phantom-processes you have to worry about so it's ideal to use some sort of container to contain it. Docker is almost custom-built for this task as you can limit the amount resources available and sandbox it. Create your own Dockerfile yourself.

    And to avoid running into zombie processes (which commonly happen with Chrome), you'll want to use something like dumb-init to properly start-up.

  • Two different runtimes:

    There can be two JavaScript runtimes going on (Node and the browser). This is great for the purposes of shareability, but it comes at the cost of confusion since some page methods will require you to explicitly pass in references (versus doing so with closures or hoisting).

    As an example, while using page.evaluate deep down in the bowels of the protocol, this literally stringifies the function and passes it into Chrome, so things like closures and hoisting won't work at all. If you need to pass some references or values into an evaluate call, simply append them as arguments which get properly handled.

Reference: Observations running 2 million headless sessions

这篇关于限制 Chrome 无头 CPU 和内存使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆