处理PDF文件 [英] Manipulating PDF file

查看:74
本文介绍了处理PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读取PDF文件作为文本(后记),在文件结构中添加新对象,并将最终输出保存为新PDF,但是如果我只是复制了PDF PostScript内容并将其粘贴到新创建的文件中PDF文件(其中encoding='ansi'),该文件不起作用.

I would like to read a PDF file as a text (postscript), add new objects in the file structure and save the final output as a new PDF but If I just copied the PDF PostScript content and paste it in a newly created PDF file (where encoding='ansi'), the file doesn't work.

我确定这可能是编码问题,但是我不确定在处理原始PostScript内容后应该如何处理才能具有有效的PDF文件格式.

I am sure that this may be encoding issue but I am not sure what I should do to have a valid PDF file format after manipulating the original PostScript content.

以下是与我无关的代码:

Here is the piece of code that didn't work with me:

pdf_file = open('Input.pdf', 'r', encoding='ansi').read()
pdf_file_bytes = bytearray(pdf_file, 'ansi')
pdf_file = open('Output_bytes.pdf', 'wb').write(pdf_file_bytes)

正如我所说,输出的PDF无效!

And as I said, the output PDF is not valid!

推荐答案

PDF文件是由各种对象组成的复杂文件格式,除非您仔细使用PDF规范的低级语法,否则将很难或不可能任意地使用它.将某些字节替换为其他字节,并生成仍然有效的PDF文件.

A PDF file is a complex file format consisting of various objects, unless you under low-level syntax of the PDF specification carefully it will be difficult to impossible to arbitrarily replace some bytes with some other bytes and have it result in a still valid PDF file.

更重要的是,您要完成什么.例如.可能有一种高级方法可以完成您要尝试执行的操作,而这并不涉及直接操作PDF语法.例如.如果需要修改字体,添加注释,设置PDF版本等.否则,如果实际上需要修改PDF语法,则需要使用能够处理低级对象的库.

More to the point what are you trying to accomplish. E.g. there may be a high-level way of doing whatever you're trying to do that doesn't involve manipulating PDF syntax directly. E.g. if you need to modify a font, add an annotation, set the PDF version, etc. Otherwise if you actually need to modify PDF syntax you need to use a library capable of dealing with low-level objects.

这篇关于处理PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆