Imagemagick并行转换 [英] Imagemagick parallel conversion
问题描述
我想将 pdf
的每个页面的屏幕截图放到 jpg
中。为此,我在命令行中使用 ImageMagick
的转换
命令。
I want to get screenshot of each page of a pdf
into jpg
. To do this I am using ImageMagick
's convert
command in command line.
我必须达到以下目的 -
I have to achieve the following -
- 获取每页的截图pdf文件。
- 将屏幕截图调整为3种不同的尺寸(小,中和预览)。
- 将不同尺寸存储在不同的文件夹中(小,med和预览)。
我正在使用以下命令,但是,它很慢。如何改善执行时间或平行执行命令。
I am using the following command which works, however, it is slow. How can I improve its execution time or execute the commands parallely.
convert -density 400 -quality 100 /input/test.pdf -resize 170x117> -scene 1 /small/test_%d_small.jpg & convert -density 400 -quality 100 /input/test.pdf -resize 230x160> -scene 1 /med/test_%d_med.jpg & convert -density 400 -quality 100 /input/test.pdf -resize 1310x650> -scene 1 /preview/test_%d_preview.jpg
为了便于阅读而拆分命令
convert -density 400 -quality 100 /input/test.pdf -resize 170x117> -scene 1 /small/test_%d_small.jpg
convert -density 400 -quality 100 /input/test.pdf -resize 230x160> -scene 1 /med/test_%d_med.jpg
convert -density 400 -quality 100 /input/test.pdf -resize 1310x650> -scene 1 /preview/test_%d_preview.jpg
推荐答案
更新后的答案
我看到你有长篇多页文档,而我的原始答案很适合快速制作单个页面的多个尺寸,它没有解决并行处理页面的问题。所以,这是一种使用GNU Parallel的方法,它可以免费用于OS X(使用 homebrew
),安装在大多数Linux发行版上,也适用于Windows - 如果你真的必须。
I see you have long, multi-page documents and while my original answer is good for making multiple sizes of a single page quickly, it doesn't address doing pages in parallel. So, here is a way of doing it using GNU Parallel which is available for free for OS X (using homebrew
), installed on most Linux distros and also available for Windows - if you really must.
代码如下:
#!/bin/bash
shopt -s nullglob
shopt -s nocaseglob
doPage(){
# Expecting filename as first parameter and page number as second
# echo DEBUG: File: $1 Page: $2
noexten=${1%%.*}
convert -density 400 -quality 100 "$1[$2]" \
-resize 1310x650 -write "${noexten}-p-$2-large.jpg" \
-resize 230x160 -write "${noexten}-p-$2-med.jpg" \
-resize 170x117 "${noexten}-p-$2-small.jpg"
}
export -f doPage
# First, get list of all PDF documents
for d in *.pdf; do
# Now get number of pages in this document - "pdfinfo" is probably quicker
p=$(identify "$d" | wc -l)
for ((i=0;i<$p;i++));do
echo $d:$i
done
done | parallel --eta --colsep ':' doPage {1} {2}
如果你想看看它是如何工作的,删除 | parallel ....
从最后一行开始,您将看到前面的循环只是将文件名列表和页码的计数器回显到GNU Parallel中。然后,它将为每个CPU核心运行一个进程,除非您指定 -j 8
,如果您想要并行运行8个进程。如果您不希望在命令可能完成时有任何更新,请删除 - eta
。
If you want to see how it works, remove the | parallel ....
from the last line and you will see that the preceding loop just echoes a list of filenames and a counter for the page number into GNU Parallel. It will then run one process per CPU core, unless you specify -j 8
if you want say 8 processes to run in parallel. Remove the --eta
if you don't want any updates on when the command is likely to finish.
在评论中,我提到 pdfinfo
快于识别
,如果你有可用的(它是<$ c的一部分) c $ c> poppler OS X上的 homebrew
下的包,然后你可以使用它来获取PDF中的页数:
In the comment I allude to pdfinfo
being faster than identify
, if you have that available (it's part of the poppler
package under homebrew
on OS X), then you can use this to get the number of pages in a PDF:
pdfinfo SomeDocument.pdf | awk '/^Pages:/ {print $2}'
原始答案
未经测试,但沿着这些线条的东西,所以你只能读一次,然后从最大的一个生成连续的较小的图像:
Untested, but something along these lines so you only read it in once and then generate successively smaller images from the largest one:
convert -density 400 -quality 100 x.pdf \
-resize 1310x650 -write large.jpg \
-resize 230x160 -write medium.jpg \
-resize 170x117 small.jpg
除非你的意思是,否则说,50页PDF,你想并行完成所有50页。如果你这样做,那么,我会告诉你,当我在10小时内起床时使用GNU Parallel ...
Unless you mean you have, say, a 50 page PDF, and you want to do all 50 pages in parallel. If you do, say so, and I'll show you that using GNU Parallel when I get up in 10 hours...
这篇关于Imagemagick并行转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!