使用R,人们如何计算PDF文件中的页数? [英] Using R, how can someone count the number of pages in a PDF file?

查看:139
本文介绍了使用R,人们如何计算PDF文件中的页数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个目录中有大约一百个PDF文件,并想知道R是否可以计算每个文件中有多少页。我的操作系统是Windows 8。

I have about a hundred long PDF files in a directory and would like to know whether R can count how many pages are in each file. My operating system is Windows 8.

这里是一个10页PDF文件的链接,以防这可以帮助您测试您的解决方案。 MWE pdf文件

Here is the link to a 10-page PDF file, in case this helps you test your solution. MWE pdf file

这似乎可能用python计数PDF页面,但我不知道该语言 python解决方案。已经使用例如Imagemagick在SO上讨论了其他解决方案。和C ##。

It appears to be possible to count PDF pages with python, but I don't know that language python solution. Other solutions have been discussed on SO using, e.g., Imagemagick. and C##.

推荐答案

我在Windows 7机器上工作,但是我在Windows 8上的经验让我认为

I'm working on a Windows 7 machine, but my experiences on Windows 8 make me think it should work just as well for you.

我无法编译 Rpoppler 包和hrbrmstr点出来,这可能不值得战斗。如果你有7-Zip,你可以提取poppler工具为Windows。我把它们提取到 C:\poppler 的位置。一旦存在,我可以执行以下操作:

I wasn't able to compile the Rpoppler package, and as hrbrmstr points out, it's probably not worth fighting. If you have 7-Zip, you can extract the poppler tools for Windows. I've extracted them to the location C:\poppler. Once there, I can do the following

file_name <- "C:/[file_path]/whitepaper-pdfprimer.pdf"

pdf_pages <- function(file_name){
  require(magrittr)
  pages <- system2("C:/poppler/bin/pdfinfo.exe",
                   args = file_name,
                   stdout = TRUE)
  pages[grepl("Pages:", pages)] %>%
    gsub("Pages:", "", .) %>%
    as.numeric()
}

pdf_pages(file_name)

如果你有一个文件名的向量你想传递

And if you have a vector of file names you want to pass

vapply(file_names, pdf_pages, numeric(1))

输入@hrbrmstr指出poppler工具从来没有听说过他们直到今天)。

Credit to @hrbrmstr for pointing out the poppler tools (I'd never heard of them until today).

这篇关于使用R,人们如何计算PDF文件中的页数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆