AWS Lambda函数-将PDF转换为图像 [英] AWS Lambda function - convert PDF to Image

查看:258
本文介绍了AWS Lambda函数-将PDF转换为图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发应用程序,用户可以在其中上传pdf格式的一些图形.上载的文件存储在S3上.上传后,文件必须转换为图像.为此,我创建了lambda函数,该函数将文件从S3下载到lambda执行环境中的/tmp文件夹中,然后从imagemagick调用"convert"命令.

I am developing application where user can upload some drawings in pdf format. Uploaded files are stored on S3. After uploading, files has to be converted to images. For this purpose I have created lambda function which downloads file from S3 to /tmp folder in lambda execution environment and then I call ‘convert’ command from imagemagick.

convert sourceFile.pdf targetFile.png

Lambda运行时环境是nodejs 4.3.内存设置为128MB,超时30秒.

Lambda runtime environment is nodejs 4.3. Memory is set to 128MB, timeout 30 sec.

现在的问题是,某些文件已成功转换,而另一些文件却因以下错误而失败:

Now the problem is that some files are converted successfully while others are failing with the following error:

{[错误:命令失败:/bin/sh -c转换/tmp/sourceFile.pdf /tmp/targetFile.png转换:%s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo" "-f/tmp/magick-tIe1MjeR" @ error/utility.c/SystemCommand/1890. convert: Postscript delegate failed/tmp/sourceFile.pdf':否这样 文件或目录@ error/pdf.c/ReadPDFImage/678.转换:无图像 定义的`/tmp/targetFile.png'@ 错误/convert.c/ConvertImageCommand/3046. ]杀死:错误,代码:1, 信号:null,cmd:'/bin/sh -c转换/tmp/sourceFile.pdf /tmp/targetFile.png'}

{ [Error: Command failed: /bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png convert: %s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo" "-f/tmp/magick-tIe1MjeR" @ error/utility.c/SystemCommand/1890. convert: Postscript delegate failed/tmp/sourceFile.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/678. convert: no images defined `/tmp/targetFile.png' @ error/convert.c/ConvertImageCommand/3046. ] killed: false, code: 1, signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png' }

起初我不明白为什么会这样,然后我尝试使用相同的命令在本地Ubuntu计算机上转换有问题的文件.这是终端的输出:

At first I did not understand why this happens, then I tried to convert problematic files on my local Ubuntu machine with the same command. This is the output from terminal:

**** Warning: considering '0000000000 XXXXX n' as a free entry. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.

**** Warning: considering '0000000000 XXXXX n' as a free entry. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.

因此消息非常清晰,但是文件仍然转换为png.如果我尝试执行convert source.pdf target.pdf,然后再执行convert target.pdf image.png,则文件将被修复和转换而没有任何错误.这不适用于lambda.

So the message was very clear, but the file gets converted to png anyway. If I try to do convert source.pdf target.pdf and after that convert target.pdf image.png, file is repaired and converted without any errors. This doesn’t work with lambda.

由于相同的事情在一个环境中有效,而在另一个环境中则无效,所以我最好的猜测是Ghostscript的版本就是问题所在. AMI上的安装版本为8.70.在我的本地计算机上,Ghostsript的版本是9.18.

Since the same thing works on one environment but not on the other, my best guess is that the version of Ghostscript is the problem. Installed version on AMI is 8.70. On my local machine Ghostsript version is 9.18.

我的问题是:

  • 是ghostscript问题的版本吗?这是旧的错误吗 版本的ghostscript?如果没有,我该如何告知ghostscript(使用或 而不使用imagemagick)来修复或忽略错误,例如 我当地的环境?
  • 如果旧版本有问题,是否可以构建ghostscript 从源代码中,创建nodejs模块,然后使用该版本的 ghostscript而不是已安装的脚本?
  • 有没有一种更简便的方法可以将pdf转换为图像而不使用 imagemagick和ghostscript?
  • Is the version of ghostscript problem? Is this a bug with older version of ghostscript? If not, how can I tell ghostscript (with or without using imagemagick) to repair or ignore errors like it does on my local environment?
  • If the old version is a problem, is it possible to build ghostscript from source, create nodejs module and then use that version of ghostscript instead the one that is installed?
  • Is there an easier way to convert pdf to image without using imagemagick and ghostscript?

更新 lambda代码的相关部分:

UPDATE Relevant part of lambda code:

var exec = require('child_process').exec;
var AWS = require('aws-sdk');
var fs = require('fs');
...

var localSourceFile = '/tmp/sourceFile.pdf';
var localTargetFile = '/tmp/targetFile.png';

var writeStream = fs.createWriteStream(localSourceFile);
writeStream.write(body);
writeStream.end();

writeStream.on('error', function (err) {
    console.log("Error writing data from s3 to tmp folder.");
    context.fail(err);
});

writeStream.on('finish', function () {
    var cmd = 'convert ' + localSourceFile + ' ' + localTargetFile;

    exec(cmd, function (err, stdout, stderr ) {

        if (err) {
            console.log("Error executing convert command.");
            context.fail(err);
        }

        if (stderr) {
            console.log("Command executed successfully but returned error.");
            context.fail(stderr);
        }else{
            //file converted successfully - do something...
        }
    });
});

推荐答案

您可以在以下存储库中找到Lambda的Ghostscript编译版本. 您应该将文件添加到要上传的zip文件中,作为源代码到AWS Lambda.

You can find a compiled version of Ghostscript for Lambda in the following repository. You should add the files to the zip file that you are uploading as the source code to AWS Lambda.

https://github.com/sina-masnadi/lambda-ghostscript

这是一个用于调用Ghostscript函数的npm软件包:

This is an npm package to call Ghostscript functions:

https://github.com/sina-masnadi/node-gs

将已编译的Ghostscript文件复制到项目中并添加npm软件包后,可以使用executablePath('path to ghostscript')函数将软件包指向先前添加的已编译的Ghostscript文件.

After copying the compiled Ghostscript files to your project and adding the npm package, you can use the executablePath('path to ghostscript') function to point the package to the compiled Ghostscript files that you added earlier.

这篇关于AWS Lambda函数-将PDF转换为图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆