以编程方式更改PDF文件中黑匣子的颜色? [英] Programmatically change the color of a black box in a PDF file?

查看:119
本文介绍了以编程方式更改PDF文件中黑匣子的颜色?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由Microsoft Word生成的PDF文件.用户已指定黑色的突出显示"颜色,以使文本看起来像是黑框(并使文本看起来像已被编辑).我想将黑框更改为黄色,以便将文本突出显示.

理想情况下,我想用Python做到这一点.

谢谢!

解决方案

选项1:如果可以选择使用商业图书馆,则可以使用

我想您可以将其翻译为IronPython.
通常的免责声明适用于此建议

选项2:如果没有商业库,并且您没有在开发商业封闭源应用程序,则可以使用iText尝试对页面内容进行一些不可靠的黑客入侵:

您可以尝试解码页面内容(有关详细信息,请参见iText中的ContentByteUtils类),在每个填充操作符之前插入颜色选择操作符,然后重新保存文件.有关这些运算符的更多详细信息,请参见Adobe PDF参考文档的表4.10路径绘制运算符.

操作数f: 填充路径,使用非零绕组数规则确定要填充的区域(请参见第232页的非零绕组数规则").

Operand rg:将不描边的颜色空间设置为DeviceRGB,并将不描边的颜色设置为指定值

操作数q:保存当前图形状态

操作数Q:恢复保存的图形状态

因此,如果页面上有一系列运算符:

0.0 0.0 0.0 rg % Set nonstroking color to black
25 175 175 −150 re % Construct rectangular path
f % Fill path

它应该变成:

0.0 0.0 0.0 rg % Set nonstroking color to black
25 175 175 −150 re % Construct rectangular path
q % Saves the current graphic state
1.0 1.0 0.0 rg % Set nonstroking color to yellow
f % Fill path
Q % Restores the saved graphic state

一些评论:
-这种方法会将所有非文本图形变成黄色(包括直线,曲线等,并且不包括光栅图像),并且还将使用与其他PDF图纸相同的图形运算符将页面上绘制的所有文本绘制为黄色. br> -页面上使用的Xforms和注释将不被处理.
-如果要处理的文档是由同一工具以相同的方式生成的,则可以仅测试几个文件并查看其运行方式.

重要提示:从我的脑海中看,这只是一个 unested 想法,它可能行得通,也可能行不通.

I have a PDF file generated by Microsoft Word. The user has specified a "highlight" color of black to make the text look like it's a black box (and make the text look like its been redacted). I'd like to change the black boxes to yellow so that the text is highlighted instead.

Ideally, I'd like to do this in Python.

Thanks!

解决方案

Option 1: If a commercial library is an option, you can easily implement this with Amyuni PDF Creator .Net, the C# code would look like this:

using System.IO;
using Amyuni.PDFCreator;
using System.Collections;

//open a pdf document
FileStream testfile = new FileStream("test1.pdf", FileMode.Open, FileAccess.Read, FileShare.Read);
IacDocument document = new IacDocument(null);
document.Open(testfile, "");

//get the first page
IacPage page1 = document.GetPage(1);

//get all graphic objects on the page
IacAttribute attribute = page1.AttributeByName("Objects");

// listobj is an arraylist of objects
ArrayList listobj = (ArrayList)attribute.Value;

foreach (IacObject iacObj in listobj)
{
    //if the object is a rectangle and the background color is black then set it to yellow
    if ((IacObjectType)iacObj.AttributeByName("ObjectType").Value == (IacObjectType.acObjectTypeFrame && (int)obj.Attribute("BackColor").Value == 0)
    {
        obj.Attribute("BackColor").Value = 0x00FFFF; //Yellow   
    }
}

I suppose you could translate this to IronPython instead.
Usual disclaimer applies for this suggestion

Option 2: If a commercial library is not an option and you are not developing a commercial closed-source application, you could try a bit of unreliable hacking on the page content using iText:

You can try decoding the page content (see ContentByteUtils class in iText for details), inserting a color selection operator before every fill operator, then resave the file. For more details on these operators see the TABLE 4.10 Path-painting operators of the Adobe PDF reference document.

Operand f: Fill the path, using the nonzero winding number rule to determine the region to fill (see "Nonzero Winding Number Rule" on page 232).

Operand rg: sets the nonstroking color space to DeviceRGB, and sets the nonstroking color to the specified value

Operand q: saves the current graphic state

Operand Q: Restores the saved graphic state

So if you have a sequence of operators on your page:

0.0 0.0 0.0 rg % Set nonstroking color to black
25 175 175 −150 re % Construct rectangular path
f % Fill path

It should become:

0.0 0.0 0.0 rg % Set nonstroking color to black
25 175 175 −150 re % Construct rectangular path
q % Saves the current graphic state
1.0 1.0 0.0 rg % Set nonstroking color to yellow
f % Fill path
Q % Restores the saved graphic state

Some remarks:
-This approach will turn every non-text drawing into yellow (including lines, curves, etc and excluding raster images) and it will also draw as yellow any text that is drawn on the page using the same drawing operators as other PDF drawings.
-Xforms and annotations used on the page will not be processed.
-If the documents you will process are produced by the same tool in the same way you may just test a few files and see how it goes.

Important: This is just an untested idea from the top of my head, it may work, or it may not.

这篇关于以编程方式更改PDF文件中黑匣子的颜色?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆