获取PDF文档中的页数 [英] Get the number of pages in a PDF document

查看：850 发布时间：2020/5/25 3:52:08 php pdf

本文介绍了获取PDF文档中的页数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我花了很多小时寻找一种快速，简便但又准确的方法来获取PDF文档中的页数.由于我在一家处理PDF的图形印刷和复制公司工作，因此在处理文档之前，必须准确知道文档中的页数. PDF文档来自许多不同的客户端，因此它们不是使用同一应用程序生成的和/或使用不同的压缩方法.

Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method.

以下是我发现的以下一些答案: 不足，或者只是 不起作用 :

Here are some of the answers I found insufficient or simply NOT working:

imagick需要进行大量安装，apache需要重新启动，而当我终于使它工作时，它花费了惊人的时间(每个文档2-3分钟)，并且总是在每个文档中返回1页(天堂)到目前为止，我还没有看到Imagick的工作副本)，所以我把它扔了. getNumberImages()和identifyImage()方法都是这样.

Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1 page in every document (haven't seen a working copy of Imagick so far), so I threw it away. That was with both the getNumberImages() and identifyImage() methods.

FPDI易于使用和安装(只需解压缩文件并调用PHP脚本)，但是许多压缩技术不受FPDI支持.然后返回错误:

FPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. It then returns an error:

FPDF错误:此文档(test_1.pdf)可能使用了FPDI随附的免费解析器不支持的压缩技术.

FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.

打开流并使用正则表达式进行搜索:

这会在流中打开PDF文件，并搜索某种类型的字符串，其中包含页面计数或类似内容.

Opening a stream and search with a regular expression:

This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar.

$f = "test1.pdf";
$stream = fopen($f, "r");
$content = fread ($stream, filesize($f));

if(!$stream || !$content)
    return 0;

$count = 0;
// Regular Expressions found by Googling (all linked to SO answers):
$regex  = "/\/Count\s+(\d+)/";
$regex2 = "/\/Page\W*(\d+)/";
$regex3 = "/\/N\s+(\d+)/";

if(preg_match_all($regex, $content, $matches))
    $count = max($matches);

return $count;

/\/Count\s+(\d+)/(查找/Count <number>)不起作用，因为只有很少的文档内部具有参数/Count，因此大多数情况下它不返回任何内容. 来源.
/\/Page\W*(\d+)/(查找/Page<number>)未获取页数，主要包含其他一些数据. 来源.
/\/N\s+(\d+)/(查找/N <number>)也不起作用，因为文档可以包含/N的多个值.大部分(如果不是全部)包含页面计数的不. 来源.

/\/Count\s+(\d+)/ (looks for /Count <number>) doesn't work because only a few documents have the parameter /Count inside, so most of the time it doesn't return anything. Source.
/\/Page\W*(\d+)/ (looks for /Page<number>) doesn't get the number of pages, mostly contains some other data. Source.
/\/N\s+(\d+)/ (looks for /N <number>) doesn't work either, as the documents can contain multiple values of /N; most, if not all, not containing the pagecount. Source.

那么，什么是可靠且准确的工作呢?

请参阅下面的答案

一个简单的命令行可执行文件，名为: pdfinfo .

对于Linux和Windows，可下载.您下载了一个压缩文件，其中包含几个与PDF相关的小程序.将其提取到某个地方.

A simple command line executable called: pdfinfo.

It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.

其中一个文件是 pdfinfo (对于Windows，是 pdfinfo.exe ).通过在PDF文档上运行返回的数据的示例:

One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:

Title:          test1.pdf
Author:         John Smith
Creator:        PScript5.dll Version 5.2.2
Producer:       Acrobat Distiller 9.2.0 (Windows)
CreationDate:   01/09/13 19:46:57
ModDate:        01/09/13 19:46:57
Tagged:         yes
Form:           none
Pages:          13    <-- This is what we need
Encrypted:      no
Page size:      2384 x 3370 pts (A0)
File size:      17569259 bytes
Optimized:      yes
PDF version:    1.6

我还没有看到PDF文档返回错误的页面计数(还).它的速度也非常快，即使使用200+ MB的大型文档，响应时间也只有几秒钟或更短.

I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.

有一种从输出中提取pagecount的简单方法，在PHP中位于此处:

There is an easy way of extracting the pagecount from the output, here in PHP:

// Make a function for convenience 
function getPDFPages($document)
{
    $cmd = "/path/to/pdfinfo";           // Linux
    $cmd = "C:\\path\\to\\pdfinfo.exe";  // Windows

    // Parse entire output
    // Surround with double quotes if file name has spaces
    exec("$cmd \"$document\"", $output);

    // Iterate through lines
    $pagecount = 0;
    foreach($output as $op)
    {
        // Extract the number
        if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)
        {
            $pagecount = intval($matches[1]);
            break;
        }
    }

    return $pagecount;
}

// Use the function
echo getPDFPages("test 1.pdf");  // Output: 13

当然，此命令行工具可以用其他语言来解析外部程序的输出，但是我在PHP中使用它.

Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.

我知道它不是纯PHP ，但是外部程序在处理PDF方面效果更好(如问题所示).

I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).

我希望这可以对人们有所帮助，因为我花了很多时间试图找到解决方案，而且我看到了很多关于PDF页面数的问题，但没有找到我要找的答案.这就是为什么我提出这个问题并亲自回答.

I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.

这篇关于获取PDF文档中的页数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取PDF文档中的页数 [英] Get the number of pages in a PDF document

问题描述

打开流并使用正则表达式进行搜索:

Opening a stream and search with a regular expression:

那么，什么是可靠且准确的工作呢?

推荐答案

一个简单的命令行可执行文件，名为: pdfinfo .

A simple command line executable called: pdfinfo.

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

获取PDF文档中的页数 [英] Get the number of pages in a PDF document

问题描述

打开流并使用正则表达式进行搜索:

Opening a stream and search with a regular expression:

那么，什么是可靠且准确的工作呢?

推荐答案

一个简单的命令行可执行文件，名为: pdfinfo .

A simple command line executable called: pdfinfo.

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭