Imagemagick并行转换 [英] Imagemagick parallel conversion

查看:177
本文介绍了Imagemagick并行转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 pdf 的每个页面的屏幕截图放到 jpg 中。为此,我在命令行中使用 ImageMagick 转换命令。

I want to get screenshot of each page of a pdf into jpg. To do this I am using ImageMagick's convert command in command line.

我必须达到以下目的 -

I have to achieve the following -


  1. 获取每页的截图pdf文件。

  2. 将屏幕截图调整为3种不同的尺寸(小,中和预览)。

  3. 将不同尺寸存储在不同的文件夹中(小,med和预览)。

我正在使用以下命令,但是,它很慢。如何改善执行时间或平行执行命令。

I am using the following command which works, however, it is slow. How can I improve its execution time or execute the commands parallely.

convert -density 400 -quality 100 /input/test.pdf -resize 170x117> -scene 1 /small/test_%d_small.jpg & convert -density 400 -quality 100 /input/test.pdf -resize 230x160> -scene 1 /med/test_%d_med.jpg & convert -density 400 -quality 100 /input/test.pdf -resize 1310x650> -scene 1 /preview/test_%d_preview.jpg

为了便于阅读而拆分命令

convert -density 400 -quality 100 /input/test.pdf -resize 170x117> -scene 1 /small/test_%d_small.jpg

convert -density 400 -quality 100 /input/test.pdf -resize 230x160> -scene 1 /med/test_%d_med.jpg 

convert -density 400 -quality 100 /input/test.pdf -resize 1310x650> -scene 1 /preview/test_%d_preview.jpg


推荐答案

更新后的答案

我看到你有长篇多页文档,而我的原始答案很适合快速制作单个页面的多个尺寸,它没有解决并行处理页面的问题。所以,这是一种使用GNU Parallel的方法,它可以免费用于OS X(使用 homebrew ),安装在大多数Linux发行版上,也适用于Windows - 如果你真的必须。

I see you have long, multi-page documents and while my original answer is good for making multiple sizes of a single page quickly, it doesn't address doing pages in parallel. So, here is a way of doing it using GNU Parallel which is available for free for OS X (using homebrew), installed on most Linux distros and also available for Windows - if you really must.

代码如下:

#!/bin/bash

shopt -s nullglob
shopt -s nocaseglob

doPage(){
   # Expecting filename as first parameter and page number as second
   # echo DEBUG: File: $1 Page: $2
   noexten=${1%%.*}
   convert -density 400 -quality 100 "$1[$2]"     \
      -resize 1310x650 -write "${noexten}-p-$2-large.jpg" \
      -resize 230x160  -write "${noexten}-p-$2-med.jpg"   \
      -resize 170x117  "${noexten}-p-$2-small.jpg"
}

export -f doPage

# First, get list of all PDF documents
for d in *.pdf; do
   # Now get number of pages in this document - "pdfinfo" is probably quicker
   p=$(identify "$d" | wc -l)
   for ((i=0;i<$p;i++));do
      echo $d:$i
   done
done | parallel --eta --colsep ':' doPage {1} {2}

如果你想看看它是如何工作的,删除 | parallel .... 从最后一行开始,您将看到前面的循环只是将文件名列表和页码的计数器回显到GNU Parallel中。然后,它将为每个CPU核心运行一个进程,除非您指定 -j 8 ,如果您想要并行运行8个进程。如果您不希望在命令可能完成时有任何更新,请删除 - eta

If you want to see how it works, remove the | parallel .... from the last line and you will see that the preceding loop just echoes a list of filenames and a counter for the page number into GNU Parallel. It will then run one process per CPU core, unless you specify -j 8 if you want say 8 processes to run in parallel. Remove the --eta if you don't want any updates on when the command is likely to finish.

在评论中,我提到 pdfinfo 快于识别,如果你有可用的(它是<$ c的一部分) c $ c> poppler OS X上的 homebrew 下的包,然后你可以使用它来获取PDF中的页数:

In the comment I allude to pdfinfo being faster than identify, if you have that available (it's part of the poppler package under homebrew on OS X), then you can use this to get the number of pages in a PDF:

pdfinfo SomeDocument.pdf | awk '/^Pages:/ {print $2}'

原始答案

未经测试,但沿着这些线条的东西,所以你只能读一次,然后从最大的一个生成连续的较小的图像:

Untested, but something along these lines so you only read it in once and then generate successively smaller images from the largest one:

convert -density 400 -quality 100 x.pdf \
   -resize 1310x650 -write large.jpg    \
   -resize 230x160  -write medium.jpg   \
   -resize 170x117  small.jpg

除非你的意思是,否则说,50页PDF,你想并行完成所有50页。如果你这样做,那么,我会告诉你,当我在10小时内起床时使用GNU Parallel ...

Unless you mean you have, say, a 50 page PDF, and you want to do all 50 pages in parallel. If you do, say so, and I'll show you that using GNU Parallel when I get up in 10 hours...

这篇关于Imagemagick并行转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆