从PHP / Bash / C#中删除PDF中的图层/背景 [英] Remove Layers/Background from PDF in PHP/Bash/C#

查看:153
本文介绍了从PHP / Bash / C#中删除PDF中的图层/背景的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用PHP脚本修改一些PDF文件。我也能exec()所以我几乎可以使用在CentOS上运行的任何东西。

I have some PDF files that I need to modify using a PHP script. I'm also able to exec() so I can use pretty much anything that runs on CentOS.

通过Adobe Acrobat Pro X打开时的PDF文件,显示2层在图层面板中:

The PDF files when opened through Adobe Acrobat Pro X, show 2 layers in the "layers" panel:


  1. 背景

  2. 颜色

当我禁用这两个图层时,我最终得到了一个黑色&白色文字&图像(文本不是矢量,它是扫描文档)。

When I disable both of these layers I end up with a black & white text & images (the text is not vector tho, it's a scanned document).

我想使用PHP和/或C#或任何命令行工具禁用PDF中的这些图层和任何其他类似图层。

I want to disable these layers and any other similar layer found in the PDFs using PHP and/or C# or any command-line tool.

其他有用的信息:

当我在我的PDF上运行pdfimages(随XPDF提供)时,它会精确地提取我实际需要从每个文件中删除的内容页面...

When I run pdfimages (provided with XPDF) on my PDFs, it extracts exactly what I actually need removed from each page...

其他信息更新:
我在这里修改了PDFSharp示例: http://www.pdfsharp.net/wiki/ExportImages-sample.ashx

修改:

第28行: ExportImage(xObject,ref imageCount);

To:

PdfObject obj = xObject.Elements.GetObject(/ OC);

控制台。 WriteLine(obj);

我在控制台中为每个图像获得了以下输出:

<< /姓名背景/类型/ OCG>>

<< / OCGs [2234 0 R] / P / AllOff / Type / OCMD>>

<< /名称文字颜色/类型/ OCG>>

I got the following output in the console for each image:
<< /Name Background /Type /OCG >>
<< /OCGs [ 2234 0 R ] /P /AllOff /Type /OCMD >>
<< /Name Text Color /Type /OCG >>

实际上是层信息,以及/ OC键的PDFSharp文档:

Which is actually the layer information, and the PDFSharp Documentation for the /OC key:


在处理图像之前,其
可见性是根据此
条目确定的。如果确定
不可见,则整个图像被跳过
,好像没有Do
运算符可以调用它。

Before the image is processed, its visibility is determined based on this entry. If it is determined to be invisible, the entire image is skipped, as if there were no Do operator to invoke it.

现在,我如何将/ OC值修改为使这些图层不可见的东西?

So now, how do I modify the /OC value to something that will make these layers invisible?

推荐答案

经过长时间的实验,我找到了方法!我发布了代码,所以有人可能会发现它有用:

After long hours of experimenting, I found the way! I'm posting the code so someone may find it helpful in the future:

using System;
using System.IO;
using System.Collections.Generic;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace LayerHide {

    class MainClass
    {
        public static void Main (string[] args)
        {

            PdfReader reader = new PdfReader("test.pdf");
            PdfStamper stamp = new PdfStamper(reader, new FileStream("test2.pdf", FileMode.Create));
            Dictionary<string, PdfLayer> layers = stamp.GetPdfLayers();

            foreach(KeyValuePair<string, PdfLayer> entry in layers )
            {
                PdfLayer layer = (PdfLayer)entry.Value;
                layer.On = false;
            }

            stamp.Close();
        }
    }
}

这篇关于从PHP / Bash / C#中删除PDF中的图层/背景的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆