使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图 [英] Iterate over multiple payloads and take multiple screenshots with Puppeteer AWS Lambda

查看:94
本文介绍了使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用以下Puppeteer AWS Lambda Layer抓取30个URL,并在S3中创建和保存屏幕截图.目前,我发送了30个单独的有效负载,因此运行了30个AWS Lambda函数.

I am currently using the following Puppeteer AWS Lambda Layer to scrape 30 URLs and create and save screenshots in S3. At the moment, I send 30 individual payloads therefore running 30 AWS Lambda functions. https://github.com/shelfio/chrome-aws-lambda-layer

Each JSON payload contains a URL and an image file name that are sent every 2-3 seconds to API Gateway via a POST request. The first 6 or 9 Lambda functions in the list seem to run fine, then they start to fail with Navigation failed because browser has disconnected! as reported in AWS Cloudwatch.

So I am looking for an alternative solution, How could I edit the code below to batch screenshot a set of 30 URLs, by handling a single array of JSON payloads? (eg. For loop etc)

Here is my current code for generating individual AWS Lambda screenshots and sending to S3:

// src/capture.js

// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");

// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");

process.setMaxListeners(0) // <== Important line - Fix MaxListerners Error

// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });

// default browser viewport size
const defaultViewport = {
  width: 1920,
  height: 1080
};

// here starts our function!
exports.handler = async event => {

  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    executablePath: await chromeLambda.executablePath,
    defaultViewport
  });
  console.log("Event URL string is ", event.url)

  const url = event.url;
  const domain = (new URL(url)).hostname.replace('www.', '');

  // open a new tab
  const page = await browser.newPage();

  // navigate to the page
  await page.goto(event.url);

  // take a screenshot
  const buffer = await page.screenshot()

  // upload the image using the current timestamp as filename
  const result = await s3
    .upload({
      Bucket: process.env.S3_BUCKET,
      Key: domain + `.png`,
      Body: buffer,
      ContentType: "image/png",
      ACL: "public-read"
    })
    .promise();

  // return the uploaded image url
  return { url: result.Location };
};

Current Individual JSON Payload

{"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gavurin.com.png","url":"https://gavurin.com"}

解决方案

I tried to replicate the issue and modify the code to use loop.

While working on this issue, I found several things worth pointing out:

  • the lambda requires a lot of RAM (at least 1GB in my test, but more better). Using small amount of RAM lead to failures.
  • lambda timeout must be large to handle a number of URLs to screenshot.
  • your img from the JSON payload is not used at all. I did not modify this behavior, as I don't know if this is by design or not.
  • similar errors to yours were observed when running async for loop and/or not closing pages opened.
  • modified return value to output an array of s3 urls.
  • undefied URL

Modified code

Here is the modified code that worked in my tests using nodejs12.x runtime:

// src/capture.js

var URL = require('url').URL;

// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");

// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");

process.setMaxListeners(0) // <== Important line - Fix MaxListerners Error

// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });

// default browser viewport size
const defaultViewport = {
  width: 1920,
  height: 1080
};

// here starts our function!
exports.handler = async event => {

  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    executablePath: await chromeLambda.executablePath,
    defaultViewport
  });
  
  const s3_urls = [];

  for (const e of event) {
    console.log(e);

    console.log("Event URL string is ", e.url)

    const url = e.url;
    const domain = (new URL(url)).hostname.replace('www.', '');

    // open a new tab
    const page = await browser.newPage();

    // navigate to the page
    await page.goto(e.url);

    // take a screenshot
    const buffer = await page.screenshot()

    // upload the image using the current timestamp as filename
    const result = await s3
      .upload({
        Bucket: process.env.S3_BUCKET,
        Key: domain + `.png`,
        Body: buffer,
        ContentType: "image/png",
        ACL: "public-read"
      })
      .promise();
      
      await page.close();
      
      s3_urls.push({ url: result.Location });
      
  }
  
  await browser.close();

  // return the uploaded image url
  return s3_urls;
};         

Example playload

[
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gavurin.com.png","url":"https://gavurin.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/google.com.png","url":"https://google.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/amazon.com","url":"https://www.amazon.com"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/stackoverflow.com","url":"https://stackoverflow.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/duckduckgo.com","url":"https://duckduckgo.com"},
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/docs.aws.amazon.com","url":"https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-features.html"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/github.com","url":"https://github.com"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/github.com/shelfio/chrome-aws-lambda-layer","url":"https://github.com/shelfio/chrome-aws-lambda-layer"},  
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/gwww.youtube.com","url":"https://www.youtube.com"},   
    {"img":"https://s3screenshotbucket-useast1v5.s3.amazonaws.com/w3docs.com","url":"https://www.w3docs.com"}       
]

Example output in S3

这篇关于使用Puppeteer AWS Lambda遍历多个有效负载并拍摄多个屏幕截图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆