Puppeteer:截图懒惰的图像不起作用 [英] Puppeteer: Screenshot lazy images not working

查看:45
本文介绍了Puppeteer:截图懒惰的图像不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法从

解决方案

对于任何想知道的人来说,有许多策略可以在 Puppeteer 中呈现延迟加载的图像或资产,但并非所有策略都同样有效.您尝试截取的网站中的小实现细节可能会改变最终结果,因此如果您想要一个在许多案例场景中都能正常运行的实现,您需要隔离每个通用案例并单独解决.

我知道这一点是因为我运行了一个小的屏幕截图即服务项目 (https://getscreenshot.rasterwise.com/) 并且我不得不分别处理许多情况.这是这个项目的一项重大任务,因为似乎总是有一些新的东西需要通过每天使用的新库和 UI 技术来解决.

话虽如此,我认为有一些渲染策略具有良好的覆盖范围.可能最好的方法是像 OP 那样将等待和滚动页面结合起来,但也要确保考虑到操作的顺序.这是 OP 原始代码的略微修改版本.

//滚动等待策略功能等待(毫秒){return new Promise(resolve => setTimeout(() => resolve(), ms));}异步函数 capturePage(浏览器,网址){//加载您要截屏的页面.const page = await browser.newPage();等待 page.goto(url, {waitUntil: 'load'});//等到 networkidle2 可以更好地工作.//滚动前设置视口等待 page.setViewport({ 宽度:1366,高度:768});//导航到页面后获取页面的高度.//这种计算高度的策略并不总是有效.const bodyHandle = await page.$('body');const { height } = await bodyHandle.boundingBox();等待 bodyHandle.dispose();//按视口滚动视口,允许内容加载const 计算Vh = page.viewport().height;让 vhIncrease = 0;而(vhIncrease +计算出的Vh <高度){//这里我们将计算出的视口高度传递给上下文//页面,我们滚动该数量等待 page.evaluate(_calculatedVh => {window.scrollBy(0, _calculatedVh);}, 计算Vh);等待等待(300);vhIncrease = vhIncrease + 计算的Vh;}//将视口设置为全高可能会显示额外的元素等待 page.setViewport({ 宽度:1366,高度:计算出的 Vh});//稍等片刻等待等待(1000);//再次使用评估滚动回页面顶部.等待 page.evaluate(_ => {window.scrollTo(0, 0);});return await page.screenshot({type: 'png'});}

这里的一些主要区别是:

  • 您想从一开始就设置视口并使用该固定视口进行操作.

  • 您可以更改等待时间并引入任意等待进行实验.有时这会导致隐藏在网络事件后面的元素暴露出来.

  • 将视口更改为页面的全高还可以像滚动一样显示元素.您可以使用垂直监视器在真实浏览器中对此进行测试.但是请确保返回到原始视口高度,因为视口也会影响预期的渲染.

这里要理解的一件事是,单独等待不一定会触发惰性资产的加载.滚动文档的高度允许视口显示那些需要在视口内加载的元素.

另一个警告是,有时您需要等待相对较长的时间来加载资源,因此在上面的示例中,您可能需要试验每次滚动后等待的时间.另外,正如我提到的,一般执行中的任意等待有时会影响资产是否加载.

一般来说,当使用 Puppeteer 进行屏幕截图时,您希望确保您的逻辑类似于真实的用户行为.您的目标是重现渲染场景,就像有人在他们的计算机上启动 Chrome 并导航到该网站一样.

I doesn't seems to be able to capture screenshot from https://today.line.me/HK/pc successfully.

In my Puppeteer script, I have also initiate a scroll to the bottom of the page and up again to ensure images are loaded. But for some reason it does't seems to work on the line URL above.

function wait (ms) {
 return new Promise(resolve => setTimeout(() => resolve(), ms));
}

const puppeteer = require('puppeteer');

async function run() {
let browser = await puppeteer.launch({headless: false});
let page = await browser.newPage();
await page.goto('https://today.line.me/HK/pc', {waitUntil: 'load'});
//https://today.line.me/HK/pc
// Get the height of the rendered page
  const bodyHandle = await page.$('body');
  const { height } = await bodyHandle.boundingBox();
  await bodyHandle.dispose();

  // Scroll one viewport at a time, pausing to let content load
  const viewportHeight = page.viewport().height+200;
  let viewportIncr = 0;
  while (viewportIncr + viewportHeight < height) {
    await page.evaluate(_viewportHeight => {
      window.scrollBy(0, _viewportHeight);
    }, viewportHeight);
    await wait(4000);
    viewportIncr = viewportIncr + viewportHeight;
  }

  // Scroll back to top
  await page.evaluate(_ => {
    window.scrollTo(0, 0);

  });

 // Some extra delay to let images load
 await wait(2000);

await page.setViewport({ width: 1366, height: 768});
await page.screenshot({ path: './image.png', fullPage: true });
}

run();

解决方案

For anyone wondering, there are many strategies to render lazy loaded images or assets in Puppeteer but not all of them work equally well. Small implementation details in the website that you're attempting to screenshot could change the final result so if you want to have an implementation that works well across many case scenarios you will need to isolate each generic case and address it individually.

I know this because I run a small Screenshot As A Service project (https://getscreenshot.rasterwise.com/) and I had to address many cases separately. This is a big task of this project since there seems to be always something new that needs to be addressed with new libraries and UI techniques being used every day.

That being said I think there are some rendering strategies that have good coverage. Probably the best one is a combination of waiting and scrolling through the page like OP did but also making sure to take into account the order of the operations. Here is a slightly modified version of OP's original code.

//Scroll and Wait Strategy

function waitFor (ms) {
  return new Promise(resolve => setTimeout(() => resolve(), ms));
}

async function capturePage(browser, url) {
  // Load the page that you're trying to screenshot.
  const page = await browser.newPage();
  await page.goto(url, {waitUntil: 'load'}); // Wait until networkidle2 could work better.


  // Set the viewport before scrolling
  await page.setViewport({ width: 1366, height: 768});

  // Get the height of the page after navigating to it.
  // This strategy to calculate height doesn't work always though. 
  const bodyHandle = await page.$('body');
  const { height } = await bodyHandle.boundingBox();
  await bodyHandle.dispose();

  // Scroll viewport by viewport, allow the content to load
  const calculatedVh = page.viewport().height;
  let vhIncrease = 0;
  while (vhIncrease + calculatedVh < height) {
    // Here we pass the calculated viewport height to the context
    // of the page and we scroll by that amount
    await page.evaluate(_calculatedVh => {
      window.scrollBy(0, _calculatedVh);
    }, calculatedVh);
    await waitFor(300);
    vhIncrease = vhIncrease + calculatedVh;
  }

  // Setting the viewport to the full height might reveal extra elements
  await page.setViewport({ width: 1366, height: calculatedVh});

  // Wait for a little bit more
  await waitFor(1000);

  // Scroll back to the top of the page by using evaluate again.
  await page.evaluate(_ => {
    window.scrollTo(0, 0);
  });

  return await page.screenshot({type: 'png'});
}

Some key differences here are:

  • You want to set the viewport from the beginning and operate with that fixed viewport.

  • You can change the wait time and introduce arbitrary waits to experiment. Sometimes this causes elements that are hanging behind network events to reveal.

  • Changing the viewport to the full height of the page can also reveal elements as if you were scrolling. You can test this in a real browser by using a vertical monitor. However make sure to go back to the original viewport height, because the viewport also affects the intended rendering.

One thing to understand here is that waiting alone it's not necessarily going to trigger the loading of lazy assets. Scrolling through the height of the document allows the viewport to reveal those elements that need to be within the viewport to get loaded.

Another caveat is that sometimes you need to wait for a relatively long time for the asset to load so in the example above you might need to experiment with the amount of time you're waiting after each scroll. Also as I mentioned arbitrary waits in the general execution sometimes have an effect on whether an asset load or not.

In general, when using Puppeteer for screenshots, you want to make sure that your logic resembles real user behavior. Your goal is to reproduce rending scenarios as if someone was firing Chrome in their computer and navigating to that website.

这篇关于Puppeteer:截图懒惰的图像不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆