PDF页面流优化器库? [英] PDF page-stream optimizer library?

查看:56
本文介绍了PDF页面流优化器库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人写过一个库(或只是一个程序)来优化PDF页面流的内容吗?我说的是删除没有整体效果的q ... Q块",合并相邻的BT ... ET块",跟踪图形状态并删除设置的运算符"之类的东西甚至可以在不更改页面外观的情况下重新排序绘图操作以最大程度地减少图形状态更改".我对实现语言并不挑剔,但是开源是非常可取的,因为我可能需要根据自己的特殊需要对其进行破解.

Has anyone written a library (or just a program) that optimizes the contents of PDF page streams? I am talking about things like "delete q...Q blocks that have no overall effect", "merge adjacent BT...ET blocks", "track the graphics state and delete operators that set something to the value it already has", maybe even "reorder drawing operations to minimize graphics state changes, when this can be done without changing the appearance of the page". I ain't picky as to implementation language, but open source is very much preferred, as I may need to hack it up for my particular needs.

这是我要完成的示例的一小部分. R的网格"图形及其PDF后端会产生大量无意义的无意义操作,如下所示:

Here is a small fragment of an example of what I would like done. R's "grid" graphics + its PDF backend generate ridiculous numbers of pointless operations, like this:

1 J 1 j q
Q q
Q q
Q q
Q q
Q q
Q q
Q q
Q q
Q q
BT
0.000 0.000 0.000 rg
/F2 1 Tf 12.00 0.00 -0.00 12.00 168.43 14.40 Tm [(T) 120 (ask)] TJ
ET
Q q
BT
0.000 0.000 0.000 rg
/F2 1 Tf 0.00 12.00 -12.00 0.00 19.42 205.26 Tm 
[(Quer) -15 (ies per min) 10 (ute)] TJ
ET
Q q
Q q 23.02 489.60 26.53 0.00 re W n
Q q
Q q 23.02 489.60 26.53 0.00 re W n
Q q
Q q
Q q
[...]

这可能会被压碎为

1 J 1 j
BT
/F2 1 Tf
12 0 0 12 168.43 14.40 Tm [(T) 120 (ask)] TJ
0 12 -12 0 19.42 205.26 Tm [(Quer) -15 (ies per min) 10 (ute)] TJ
ET

,甚至可能更复杂地使用文本运算符,这在我脑海中是无法实现的.

and possibly even further with more sophisticated use of the text operators, which I can't do in my head.

推荐答案

在比平常糟糕的情况下,它看起来非常类似于iText的PdfGraphics2D接口的PDF输出.普通情况也不是很热,但是还算不错.

That looks remarkably like the PDF output of iText's PdfGraphics2D interface, in a worse-than-usual case. The usual case isn't so hot either, but it's not THAT bad.

如果我是对的,答案仍然是否,但是您可以自己写一个,因为您显然不担心内容流:

If I'm right, the answer is still no, but you can write one yourself, as you clearly have no fear of content streams:

ByteBuffer internalBuf = myPdfContentByte.getInternalBuffer();

String newContents = magic( internalBuf.toString() ); 

internalBuf.reset();
internalBuf.append( newContents );

magic()有点模糊,但是编写删除"q Q"对的代码应该很简单.仅仅使用一些regEx,就可以使剪切区域中没有任何内容的剪切区域(线-线-线W n)变得不那么困难.

magic() is a tad nebulous, but writing code to remove "q Q" pairs should be trivial. Yanking clipping regions with nothing inside them (line-line-line W n) shouldn't be all that much harder with a bit of regEx.

如果不使用线帽/线连接设置(j& J),则要更难一些.与文本块组合在一起或将多余的更改转储到填充/描边颜色,字体和大小等上也是如此.

Getting rid of the line cap/line join settings (j & J) when they aren't used would be Harder. Ditto with combining text blocks or dumping redundant changes to the fill/stroke colors, font&size etc.

复杂地使用文本运算符"将开始看起来像是黑魔法编译器的优化.

"Sophisticated use of the text operators" is going to start looking like black magic compiler optimization in short order.

如果这确实是iText,那么如果您共享代码,我们都会非常感激.我向您保证,我们几乎乐意接受几乎所有的PdfGraphics2D输出清理.

And if this does happen to be iText, we'd all appreciate it if you'd share your code. We'll cheerfully accept just about any PdfGraphics2D output clean up, I assure you.

这篇关于PDF页面流优化器库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆