从GraphicsMagick的流中缩略图pdf的第一页 [英] Thumbnail the first page of a pdf from a stream in GraphicsMagick

查看:112
本文介绍了从GraphicsMagick的流中缩略图pdf的第一页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有pdf文件并且在本地运行gm,我知道如何使用GraphicsMagick制作pdf第一页的缩略图.我可以这样做:

I know how to use GraphicsMagick to make a thumbnail of the first page of a pdf if I have a pdf file and am running gm locally. I can just do this:

gm(pdfFileName + "[0]")
  .background("white")
  .flatten()
  .resize(200, 200)
  .write("output.jpg", (err, res) => {
    if (err) console.log(err);
  });

如果我有一个名为doc.pdf的文件,那么将doc.pdf[0]传递给gm的效果很好.

If I have a file called doc.pdf then passing doc.pdf[0] to gm works beautifully.

但是我的问题是我在AWS Lambda函数上生成缩略图,而Lambda将源S3存储桶中的流作为输入数据.我的lambda的相关部分如下所示:

But my problem is I am generating thumbnails on an AWS Lambda function, and the Lambda takes as input data streamed from a source S3 bucket. The relevant slice of my lambda looks like this:

// Download the image from S3, transform, and upload to a different S3 bucket.
async.waterfall([
  function download(next) {
    s3.getObject({
      Bucket: sourceBucket,
      Key: sourceKey
    },
    next);
  },

  function transform(response, next) {
    gm(response.Body).size(function(err, size) {       // <--- gm USED HERE
    .
    .
    .

一切正常,但是对于多页pdf,gm从pdf的最后一页生成缩略图.如何在其中获得[0]?我在 gm文档中没有看到页面选择器,因为它们的所有示例都使用文件名而不是流I相信应该有一个API,但我还没有找到.

Everything works, but for multipage pdfs, gm is generating a thumbnail from the last page of the pdf. How do I get the [0] in there? I did not see a page selector in the gm documentation as all their examples used filenames, not streams I believe there should be an API, but I have not found one.

(注意:[0]确实非常重要,不仅因为多页PDF的最后一页有时是空白的,而且我注意到在命令行上运行大型PDF的gm时,[0]很快返回,而无需扫描[0]整个pdf文件.在AWS Lambda上,重要的是要快速完成操作以节省资源并避免超时!)

(Note: the [0] is really important not only because the last page of multipage PDFs are sometimes blank, but I noticed when running gm on the command line with large pdfs, the [0] returns very quickly while without the [0] the whole pdf is scanned. On AWS Lambda, it's important to finish quickly to save on resources and avoid timeouts!)

推荐答案

您可以使用.selectFrame()方法,等效于直接在文件名中指定[0].

You can use .selectFrame() method, which is equivalent to specifying [0] directly in file name.

在您的代码中:

function transform(response, next) {
    gm(response.Body)
        .selectFrame(0)       // <--- select the first page
        .size(function(err, size) {
        .
        .
        .

不要对函数的名称感到困惑.它不仅适用于框架用于GIF,而且还适用于页面用于PDF.

Don't get confused about the name of function. It work not only with frames for GIFs, but also works just fine with pages for PDFs.

在GitHub上签出此函数.

Checkout this function source on GitHub.

问题.我以它为灵感,并用PDF测试了该解决方案,它确实有效.

Credits to @BenFortune for his answer to similar question about GIFs first frame. I've took it as inspiration and tested this solution with PDFs, it actually works.

希望有帮助.

这篇关于从GraphicsMagick的流中缩略图pdf的第一页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆