如何使用pdfbox读取pdf文档的当前页码 [英] How to read the current page number of the pdf document using pdfbox

查看:392
本文介绍了如何使用pdfbox读取pdf文档的当前页码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

PDF中的页码有不同的变体,有些PDF的初始页面为罗马数字,如I,ii,后来页码为1,2,....我在 pdfbox 中找到了一个函数来获取所需的页面 page.get(pagenumber)。但是这个函数的问题是,当我写 get(1)时,它返回文档的第一页(可能编号为ii而不是带页面的页面) 2号)。有没有办法获得PDF中的页码是2而不是第二页的页面?

The page numbers in a PDF come in different variations, some PDFs have initial pages as roman numbers like I, ii, and later the page numbers are 1,2,... . I found a function in the pdfbox to get the desired page page.get(pagenumber). But the problem with this function is that when I write get(1), it returns the first page of the document (which may be numbered as ii and not the page with page number 2). Is there any way to obtain the page whose page number in the PDF is say 2 and not the second page overall?

推荐答案

虽然标题提到了PDFBox,你还添加了标签itext,所以让我告诉你如何使用iText提取PageLabels:

Although the title mentions PDFBox, you're also adding the label itext, so let me show you how to extract PageLabels using iText:

PdfReader reader = new PdfReader(src);
String[] labels = PdfPageLabels.getPageLabels(reader);

现在你有一个 String 数组可能有:

Now you have a String array where you could have:

labels[0] = "i";
labels[1] = "ii";
labels[2] = "iii";
labels[3] = "iv";
labels[4] = "1";
labels[5] = "2";
labels[6] = "3";
and so on...

现在你可以将这些值放在 HashMap 以及 index + 1 作为页码,如果你想知道哪个物理页面与页面标签对应2

Now you can put these values in a HashMap together with index + 1 as the page number if you want to know which physical page corresponds with the page label "2".

这篇关于如何使用pdfbox读取pdf文档的当前页码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆