PDF - 去除白边 [英] PDF - Remove White Margins

查看:46
本文介绍了PDF - 去除白边的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道一种从 PDF 文件中删除白边的方法.就像 Adob​​e Acrobat X Pro 一样.我知道它不适用于每个 PDF 文件.

I would like to know a way to remove white margins from a PDF file. Just like Adobe Acrobat X Pro does. I understand it will not work with every PDF file.

我猜想这样做的方法是获取文本边距,然后裁剪掉该边距.

I would guess that the way to do it, is by getting the text margins, then cropping out of that margins.

PyPdf 是首选.

PyPdf is preferred.

iText 根据此代码查找文本边距:

iText finds text margins based on this code:

public void addMarginRectangle(String src, String dest)
    throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
    TextMarginFinder finder;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        finder = parser.processContent(i, new TextMarginFinder());
        PdfContentByte cb = stamper.getOverContent(i);
        cb.rectangle(finder.getLlx(), finder.getLly(),
            finder.getWidth(), finder.getHeight());
        cb.stroke();
    }
    stamper.close();
}

推荐答案

我对 PyPDF 不太熟悉,但我知道 Ghostscript 可以为您做到这一点.以下是类似问题的其他一些答案的链接:

I'm not too familiar with PyPDF, but I know Ghostscript will be able to do this for you. Here are links to some other answers on similar questions:

  1. 将 PDF 每页 2 面转换为每页 1 面 (SuperUser.com)
  2. 将 pdf 页面从中间拆分的免费软件? (SuperUser.com)
  3. 使用 Ghostscript 9.01 裁剪 PDF (StackOverflow.com)

第三个答案可能是让您说我知道它不适用于每个 PDF 文件"的原因.它使用 pdfmark 命令尝试将 /CropBox 设置到 PDF 页面对象中.

The third answer is probably what made you say 'I understand it will not work with every PDF file'. It uses the pdfmark command to try and set the /CropBox into the PDF page objects.

前两个答案的方法最有可能成功,而第三个答案失败.此方法使用 <</PageOffset [NNN MMM]>> 的 PostScript 命令片段.setpagedevice 将 PDF 页面移动并放置在由 -gNNNNxMMMM 参数定义的(较小)媒体尺寸上(该参数定义设备宽度和高度,以像素为单位).

The method of the first two answers will most likely succeed where the third one fails. This method uses a PostScript command snippet of <</PageOffset [NNN MMM]>> setpagedevice to shift and place the PDF pages on a (smaller) media size defined by the -gNNNNxMMMM parameter (which defines device width and height in pixels).

如果您了解前两个答案背后的概念,您将能够轻松调整那里使用的方法来裁剪 PDF 页面所有 4 个边缘的边距:

If you understand the concept behind the first two answers, you'll easily be able to adapt the method used there to crop margins on all 4 edges of a PDF page:

将字母大小的 PDF(8.5x11in == 612x792pt)在 4 个边中的每一个上裁剪半英寸(==36pt)的示例命令(命令适用于 Windows):

An example command to crop a letter sized PDF (8.5x11in == 612x792pt) by half an inch (==36pt) on each of the 4 edges (command is for Windows):

gswin32c.exe ^
    -o cropped.pdf ^
    -sDEVICE=pdfwrite ^
    -g5400x7200 ^
    -c "<</PageOffset [-36 -36]>> setpagedevice" ^
    -f input.pdf

生成的页面大小将为 7.5x10in (== 540x720pt).要在 Linux 或 Mac 上执行相同操作,请使用:

The resulting page size will be 7.5x10in (== 540x720pt). To do the same on Linux or Mac, use:

gs 
    -o cropped.pdf 
    -sDEVICE=pdfwrite 
    -g5400x7200 
    -c "<</PageOffset [-36 -36]>> setpagedevice" 
    -f input.pdf

<小时>

更新:如何使用 Ghostscript 确定边距"

一条评论要求自动"确定白边.您也可以为此使用 Ghostscript.它的 bbox 设备可以确定每个页面上(虚拟)墨水覆盖的区域(从而间接确定画布每个边缘的空白).

A comment asked for 'automatic' determination of the white margins. You can use Ghostscript's too for this. Its bbox device can determine the area covered by the (virtual) ink on each page (and hence, indirectly the whitespace for each edge of the canvas).

这是命令:

gs 
  -q -dBATCH -dNOPAUSE 
  -sDEVICE=bbox 
   input.pdf 

输出(示例):

 %%BoundingBox: 57 29 562 764
 %%HiResBoundingBox: 57.265030 29.347046 560.245045 763.649977
 %%BoundingBox: 57 28 562 667
 %%HiResBoundingBox: 57.265030 28.347046 560.245045 666.295011

bbox 设备在内存中呈现每个 PDF 页面(不将任何输出写入磁盘),然后将 BoundingBox 和 HiResBoundingBox 信息打印到 stderr.您可以像这样修改此命令以使结果更易于解析:

The bbox device renders each PDF page in memory (without writing any output to disk) and then prints the BoundingBox and HiResBoundingBox info to stderr. You may modify this command like that to make the results more easy to parse:

gs 
    -q -dBATCH -dNOPAUSE 
    -sDEVICE=bbox 
     input.pdf 
     2>&1   
  | grep -v HiResBoundingBox

输出(示例):

 %%BoundingBox: 57 29 562 764
 %%BoundingBox: 57 28 561 667

这会告诉你...

  • ...Page 1 的内容矩形的左下角位于坐标 [57 29],右上角位于 [562 741]
  • ...Page 2 的内容矩形的左下角位于坐标 [57 28],右上角位于 [561 667]
  • ...that the lower left corner of the content rectangle of Page 1 is at coordinates [57 29] with the upper right corner is at [562 741]
  • ...that the lower left corner of the content rectangle of Page 2 is at coordinates [57 28] with the upper right corner is at [561 667]

这意味着:

  • 第 1 页在左边缘使用 57pt 的空白(72pt == 1in == 25,4mm).
  • 第 1 页在底部边缘使用 29pt 的空白.
  • 第 2 页在左边缘使用 57pt 的空白.
  • 第 2 页在底部边缘使用 28pt 的空白.
  • Page 1 uses a whitespace of 57pt on the left edge (72pt == 1in == 25,4mm).
  • Page 1 uses a whitespace of 29pt on the bottom edge.
  • Page 2 uses a whitespace of 57pt on the left edge.
  • Page 2 uses a whitespace of 28pt on the bottom edge.

正如您已经从这个简单示例中看到的,每个页面的空格并不完全相同.根据您的需要(您可能希望多页 PDF 的每一页都具有相同的大小,不是吗?),您必须计算出文档所有页面的每条边的最小边距是多少.

As you can see from this simple example already, the whitespace is not exactly the same for each page. Depending on your needs (you likely want the same size for each page of a multi-page PDF, no?), you have to work out what are the minimum margins for each edge across all pages of the document.

现在右边和上边缘的空白呢?要计算它,您需要知道每个页面的原始页面大小.确定这一点的最简单方法:pdfinfo 实用程序.5 页 PDF 的示例命令:

Now what about the right and top edge whitespace? To calculate that, you need to know the original page size for each page. The most simple way to determine this: the pdfinfo utility. Example command for a 5 page PDF:

pdfinfo 
  -f 1 
  -l 5 
   input.pdf 
| grep "Page "

输出(示例):

Page    1 size: 612 x 792 pts (letter)
Page    2 size: 612 x 792 pts (letter)
Page    3 size: 595 x 842 pts (A4)
Page    4 size: 842 x 1191 pts (A3)
Page    5 size: 612 x 792 pts (letter)

这将帮助您确定所需的画布大小以及每个新 PDF 页面的顶部和右侧边缘所需的(最大)白边距.

This will help you determine the required canvas size and the required (maximum) white margins of the top and right edges of each of your new PDF pages.

当然,这些计算也都可以编写脚本.

These calculations can all be scripted too, of course.

但是如果您的 PDF 都是 uniq 页面大小,或者如果它们是 1 页文档,那么一切都更容易完成...

But if your PDFs are all of a uniq page size, or if they are 1-page documents, it all is much easier to get done...

这篇关于PDF - 去除白边的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆