使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图 [英] Iterate over multiple payloads and take multiple screenshots with Puppeteer AWS Lambda

查看：94 发布时间：2021/4/13 18:39:32 javascript node.js amazon-web-services aws-lambda chromium

本文介绍了使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用以下Puppeteer AWS Lambda Layer抓取30个URL，并在S3中创建和保存屏幕截图.目前，我发送了30个单独的有效负载，因此运行了30个AWS Lambda函数.

I am currently using the following Puppeteer AWS Lambda Layer to scrape 30 URLs and create and save screenshots in S3. At the moment, I send 30 individual payloads therefore running 30 AWS Lambda functions. https://github.com/shelfio/chrome-aws-lambda-layer

Each JSON payload contains a URL and an image file name that are sent every 2-3 seconds to API Gateway via a POST request. The first 6 or 9 Lambda functions in the list seem to run fine, then they start to fail with Navigation failed because browser has disconnected! as reported in AWS Cloudwatch.

So I am looking for an alternative solution, How could I edit the code below to batch screenshot a set of 30 URLs, by handling a single array of JSON payloads? (eg. For loop etc)

Here is my current code for generating individual AWS Lambda screenshots and sending to S3:

// src/capture.js

// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");

// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");

process.setMaxListeners(0) // <== Important line - Fix MaxListerners Error

// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });

// default browser viewport size
const defaultViewport = {
  width: 1920,
  height: 1080
};

// here starts our function!
exports.handler = async event => {

  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    executablePath: await chromeLambda.executablePath,
    defaultViewport
  });
  console.log("Event URL string is ", event.url)

  const url = event.url;
  const domain = (new URL(url)).hostname.replace('www.', '');

  // open a new tab
  const page = await browser.newPage();

  // navigate to the page
  await page.goto(event.url);

  // take a screenshot
  const buffer = await page.screenshot()

  // upload the image using the current timestamp as filename
  const result = await s3
    .upload({
      Bucket: process.env.S3_BUCKET,
      Key: domain + `.png`,
      Body: buffer,
      ContentType: "image/png",
      ACL: "public-read"
    })
    .promise();

  // return the uploaded image url
  return { url: result.Location };
};

Current Individual JSON Payload

{"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gavurin.com.png","url":"https://gavurin.com"}

解决方案

I tried to replicate the issue and modify the code to use loop.

While working on this issue, I found several things worth pointing out:

the lambda requires a lot of RAM (at least 1GB in my test, but more better). Using small amount of RAM lead to failures.
lambda timeout must be large to handle a number of URLs to screenshot.
your img from the JSON payload is not used at all. I did not modify this behavior, as I don't know if this is by design or not.
similar errors to yours were observed when running async for loop and/or not closing pages opened.
modified return value to output an array of s3 urls.
undefied URL

Modified code

Here is the modified code that worked in my tests using nodejs12.x runtime:

// src/capture.js

var URL = require('url').URL;

// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");

// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");

process.setMaxListeners(0) // <== Important line - Fix MaxListerners Error

// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });

// default browser viewport size
const defaultViewport = {
  width: 1920,
  height: 1080
};

// here starts our function!
exports.handler = async event => {

  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    executablePath: await chromeLambda.executablePath,
    defaultViewport
  });
  
  const s3_urls = [];

  for (const e of event) {
    console.log(e);

    console.log("Event URL string is ", e.url)

    const url = e.url;
    const domain = (new URL(url)).hostname.replace('www.', '');

    // open a new tab
    const page = await browser.newPage();

    // navigate to the page
    await page.goto(e.url);

    // take a screenshot
    const buffer = await page.screenshot()

    // upload the image using the current timestamp as filename
    const result = await s3
      .upload({
        Bucket: process.env.S3_BUCKET,
        Key: domain + `.png`,
        Body: buffer,
        ContentType: "image/png",
        ACL: "public-read"
      })
      .promise();
      
      await page.close();
      
      s3_urls.push({ url: result.Location });
      
  }
  
  await browser.close();

  // return the uploaded image url
  return s3_urls;
};

Example playload

[
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gavurin.com.png","url":"https://gavurin.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/google.com.png","url":"https://google.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/amazon.com","url":"https://www.amazon.com"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/stackoverflow.com","url":"https://stackoverflow.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/duckduckgo.com","url":"https://duckduckgo.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/docs.aws.amazon.com","url":"https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-features.html"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/github.com","url":"https://github.com"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/github.com/shelfio/chrome-aws-lambda-layer","url":"https://github.com/shelfio/chrome-aws-lambda-layer"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gwww.youtube.com","url":"https://www.youtube.com"},   
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/w3docs.com","url":"https://www.w3docs.com"}       
]

Example output in S3

这篇关于使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图 [英] Iterate over multiple payloads and take multiple screenshots with Puppeteer AWS Lambda

问题描述

Modified code

Example playload

Example output in S3

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图 [英] Iterate over multiple payloads and take multiple screenshots with Puppeteer AWS Lambda

问题描述

Modified code

Example playload

Example output in S3

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭