检查PDF文件在Python中是否有效 [英] Check whether a PDF-File is valid with Python

查看：150 发布时间：2020/11/5 19:09:48 python file pdf

本文介绍了检查PDF文件在Python中是否有效的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我通过HTTP上载获得文件，并且需要确保它是pdf文件.编程语言是Python，但这没关系.

I get a File via a HTTP-Upload and need to be sure its a pdf-file. Programing Language is Python, but this should not matter.

我想到了以下解决方案:

I thought of the following solutions:

检查字符串的第一个字节是否为％PDF". 这不是一个很好的检查，但可以防止用户意外上传其他文件.

尝试libmagic(bash上的"file"命令使用它). 此检查与(1)中的检查完全相同

Try the libmagic (the "file" command on the bash uses it). This does exactly the same check as in (1)

获取一个lib并尝试从文件中读取页数. 如果该库能够读取一个页面计数，则它应该是有效的pdf.问题:我不知道python的库可以做到这一点

Take a lib and try to read the page-count out of the file. If the lib is able to read a pagecount it should be a valid pdf. Problem: I dont know a lib for python which can do this

那么有人为lib或其他技巧找到了解决方案吗?

So anybody got any solutions for a lib or another trick?

推荐答案

两个最常用的Python PDF库是:

The two most commonly used PDF libraries for Python are:

pyPdf
ReportLab

两者都是纯python，因此应该易于安装以及跨平台.

Both are pure python so should be easy to install as well be cross-platform.

使用pyPdf可能就像这样简单:

With pyPdf it would probably be as simple as doing:

from pyPdf import PdfFileReader
doc = PdfFileReader(file("upload.pdf", "rb"))

这应该足够了，但是如果您想进一步检查，doc现在将具有documentInfo()和numPages()方法.

This should be enough, but doc will now have documentInfo() and numPages() methods if you want to do further checking.

正如Carl回答的那样，pdftotext也是一个很好的解决方案，并且在非常大的文档(尤其是具有很多交叉引用的文档)中可能会更快.但是，由于分叉新进程的系统开销等原因，在小PDF上可能会稍慢一些.

As Carl answered, pdftotext is also a good solution, and would probably be faster on very large documents (especially ones with many cross-references). However it might be a little slower on small PDF's due to system overhead of forking a new process, etc.

这篇关于检查PDF文件在Python中是否有效的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

检查PDF文件在Python中是否有效 [英] Check whether a PDF-File is valid with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

检查PDF文件在Python中是否有效 [英] Check whether a PDF-File is valid with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭