如何加速木偶操作? [英] How to speed up puppeteer?

查看:39
本文介绍了如何加速木偶操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

网页有一个按钮,木偶操作者必须尽快点击该按钮,按钮变为可见.此按钮并非始终可见,而是同时对所有人可见.所以我必须不断刷新才能发现按钮变得可见.我写了下面的脚本来做到这一点:

A web page has a button and puppeteer must click that button as soon as possible button becomes visible. This button is not always visible and it is becoming visible for everyone at the same time. So i have to refresh constantly to find that button is became visible. I wrote that script below for to do that:

    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox']
    });
    const page = await browser.newPage()
    await page.setViewport({ width: 1920, height: 1080})


//I am calling my pageRefresher method here

async function pageRefresher(page,browser, url) {
        try {
            await page.goto(url, {waitUntil: 'networkidle2'})
            try {
                await page.waitForSelector('#ourButton', {timeout: 10});
                await page.click('#ourButton')
                console.log(`clicked!`)
                await browser.close()
            } catch (error) {
                console.log('catch2 ' + counter + ' '  + error)
                counter += 1
                await pageRefresher(page, browser, url)
            }
        }catch (error) {
            console.log('catch3' + error)
            await browser.close();
        }
}

如您所见,我的方法是递归的.它转到那个页面并寻找那个按钮.如果没有按钮,则它会再次调用自己以重做相同的工作,直到找到并点击该按钮.

As you can see, my method is recursive. It goes to that page and looking for that button. If there is no button then it calls itself again for redoing the same job until it finds and clicks to that button.

实际上它现在运行良好.但它很慢.我正在运行这个脚本,同时我在我的桌面 chrome 上打开同一个页面,我开始手动刷新该页面.我总是赢,我总是在木偶操纵者之前点击那个按钮.

Actually it works well right now. But it is slow. I am running this script meanwhile i am opening the same page on my desktop chrome and i am starting to refresh that page manually. And i am always winning, i am always clicking to that button before the puppeteer.

我怎样才能加快这个过程?脚本不应该输给只有手动控制(如 F5 按钮)的人.

How can i speed up this process? A script should not lose to a human who has just manual controls like F5 button.

推荐答案

脚本不应该输给只有手动控制(如 F5 按钮)的人.

A script should not lose to a human who has just manual controls like F5 button.

发生这种情况是因为有时 puppeteer 遵循的规则比我们认为的完全加载的网页"要严格得多.即使您作为人类可以决定您想要的元素是否已经在 DOM 中(因为您看到该元素在那里)或者它不在那里(因为您没有看到它).例如:即使背景图像仍在后台加载,或者 webfonts 仍未加载并且您有后备字体,您也会看到您的按钮不存在,但 puppeteer 会在后台等待特定事件以获得许可要么转到 catch 块(超时),要么获取所需的元素(waitForSelector 成功).这确实取决于您正在访问的网站,但您可以加快识别所需元素的过程.

It happens because sometimes the rules that puppeteer follows are much stricter than what we consider as a "fully loaded webpage". Even if you as a human can decide whether your desired element is in the DOM already (because you see the element is there) or it is not there (because you don't see it). E.g.: you will see that your button is not there even if the background image is still loading in the background, or the webfonts are still not loaded and you have the fallback fonts, but puppeteer waits for specific events in the background to get the permission either to go to the catch block (timeout) or to grab the desired element (waitForSelector succeeds). It can really depends on the site you are visiting, but you are able to speed up the process of recognition of your desired element.

我给出了一些示例和想法,您可以如何实现这一点.

I give some examples and ideas how you can achieve this.

1.) 如果您的任务不需要每个网络连接,您可以通过将 waitUntil: 'networkidle2' 替换为 waitUntil: 'domcontentloaded' 来加速页面加载> 因为这个事件通常发生得更早,并且会在 #ourButton 已经存在于 DOM 中时被触发.

1.) If you don't need every network connections for your task you could speed up page loading by replacing waitUntil: 'networkidle2' to waitUntil: 'domcontentloaded' as this event happens usually earlier and will be fired when #ourButton will be already present in the DOM.

page.goto/page.reload 的可能选项:

  • load - 当 load 事件被触发时,认为导航完成.
  • domcontentloaded - 当 DOMContentLoaded 事件被触发时,考虑完成导航.
  • networkidle0 - 当至少 500 ms 的网络连接数不超过 0 时,认为导航已完成.
  • networkidle2 - 当至少 500 毫秒的网络连接不超过 2 个时,认为导航已完成.
  • load - consider navigation to be finished when the load event is fired.
  • domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
  • networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
  • networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.

由于networkidle2 过于严格,您赢得了脚本.您可能需要此选项(例如,您正在访问单页应用程序或稍后您将需要来自 3rd 方网络连接的数据,例如 cookie),但如果不是强制性的,您将使用 domcontentloaded.

You are winning over the script because of networkidle2 is too strict. You may need this option (e.g. you are visiting a single-page application or later you will need data from the 3rd party network connection e.g. cookies) but in case it is not mandatory you will experience better performance with domcontentloaded.

2.) 您可以使用 page.reload 循环中的方法,例如:

2.) Instead of constantly navigating to the same url you could use page.reload method in a loop, e.g.:

await page.goto(url, { waitUntil: 'domcontentloaded' })
let selectorExists = await page.$('#ourButton')

while (selectorExists === null) {
  await page.reload({ waitUntil: 'domcontentloaded' })
  console.log('reload')
  selectorExists = await page.$('#ourButton')
}
await page.click('#ourButton')
// code goes on...

它的主要好处是您可以缩短和简化您的 pageRefresher 函数.但我也体验到了更好的性能(虽然我没有进行基准测试,但我觉得它比重新打开页面要快得多).

Its main benefit is that you are able to shorten and simplify your pageRefresher function. But I experienced also better performance (however I did no benchmarking but I felt it much faster than re-opening a page).

3.) 如果您的任务不需要每种资源类型,您还可以通过使用以下脚本禁用图像或 css 来加速页面加载:

3.) If you don't need every resource type for your task you could also speed up page loading by disabling images or css with the following script:

await page.setRequestInterception(true)
page.on('request', (request) => {
  if (request.resourceType() === 'image') request.abort()
  else request.continue()
})

[来源]

资源类型列表-s.

这篇关于如何加速木偶操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆