从文档大纲(书签)中获取页码 [英] Get the page number from document outline (bookmarks)

查看:250
本文介绍了从文档大纲(书签)中获取页码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用itext7库来操作一些现有的PDF。出于某种原因,我无法从大纲中获取页码。我想我应该从 PdfDestination获得它但在其任何子类中找不到任何匹配方法。

I am using the itext7 library to manipulate some existing PDFs. For some reason, I am not able to get the page number from the outline. I guess I somehow should get it from the PdfDestination but cannot find any matching methods in any of its subclasses.

PdfDocument pdfDoc =  new PdfDocument(new PdfReader("example.pdf"));
var root = pdfDoc.GetOutlines(false);
foreach (PdfOutline ol in root.GetAllChildren()) {
    Console.WriteLine(ol.GetTitle());
    PdfDestination d =  ol.GetDestination();
    // how to get the page number from the destination object
}

在iText5中,我使用 SimpleBookmark.GetBookmark(reader)返回了包含Page条目的词典列表 - 但这个功能似乎已在iText7中删除了。

In iText5 I used the SimpleBookmark.GetBookmark(reader) that returned a list of dictionaries containing a "Page" entry - but this functionality seems to have been removed in iText7.

编辑:
我看了一下 PdfExplicitDestination.getDestinationPage() https://github.com/itext/itext7-dotnet/blob/develop/itext/itext.kernel/itext/kernel/pdf/navigation/PdfExplicitDestination.cs\"rel =nofollow noreferrer> Github (相同的 java 。我不明白这个方法的参数的用途。如果我传入null,它似乎适用于仅使用ToString()在大纲层次结构中使用一个级别的pdf。通过工作我意味着它将零索引页码作为字符串返回。对于PDF代码,它没有找到页码(对于冷杉都没有st level)。

I had a look at the Net implementation of PdfExplicitDestination.getDestinationPage() on Github (same for java. I don't understand the purpose of the parameters to this method. If I pass in null it seems to work on pdfs that only use one level in the outline hierarchy using ToString(). By working I mean that it returns the zero-indexed page number as a string. For PDF the code it does not find the page number (neither for the first level).

PdfDocument pdfDoc =  new PdfDocument(new PdfReader("example.pdf"));
var root = pdfDoc.GetOutlines();
foreach (PdfOutline ol in root.GetAllChildren()) {
    Console.WriteLine(ol.GetTitle());
    var d = ol.GetDestination();
    if (d is PdfExplicitDestination) {
        string PageNoStr = d.GetDestinationPage(null).ToString();               
        // this is the content of the method (less the ToString()
        //string PageNoStr = ((PdfArray)d.GetPdfObject()).Get(0).ToString();
        int pageNo;
        if (Int32.TryParse(PageNoStr, out pageNo)) {
            Console.WriteLine("Page is " + pageNo);
        } else  {
            Console.WriteLine("Error page");
        }    
    }
}

所以我仍然想弄清楚这一点。

So I am still trying to figure this out.

推荐答案

关于大纲层次结构的级别,为了遍历整个层次结构,你必须检查每个 PdfOutline 的子节点并递归遍历它们。

Regarding the levels of the outline hierarchy, in order to traverse the whole hierarchy you will have to check for each PdfOutline's children and traverse them recursively.

names参数令您感到困惑的是负责解析命名目标的参数,这在一般情况下是正确获取页码所必需的,因为您的PDF文档可能包含显式目标和命名目标。要获取名称映射,您可以使用 pdfDocument.getCatalog()。getNameTree(PdfName.Dests).getNames();

The names parameter that was confusing to you is the parameter that is responsible for resolving named destinations which is necessary to get the page numbers correctly in general case because your PDF document may contains explicit as well as named destinations. To get the names map you can use pdfDocument.getCatalog().getNameTree(PdfName.Dests).getNames();

要通过页面对象查找页码,您应该使用 pdfDocument.getPageNumber(PdfDictionary)

To find the page number by a page object, you should use pdfDocument.getPageNumber(PdfDictionary).

总体而言,遍历轮廓的方法可能如下所示:

Overall, the method walking through the outlines may look as following:

void walkOutlines(PdfOutline outline, Map<String, PdfObject> names, PdfDocument pdfDocument) {
    if (outline.getDestination() != null) {
        System.out.println(outline.getTitle() + ": page " +
                pdfDocument.getPageNumber((PdfDictionary) outline.getDestination().getDestinationPage(names)));
    }
    for (PdfOutline child : outline.getAllChildren()) {
        walkOutlines(child, names, pdfDocument);
    }
}

调用方法遍历的主入口点大纲根:

And the main entry point to call the method to traverse the outline root:

PdfNameTree destsTree = pdfDocument.getCatalog().getNameTree(PdfName.Dests);
PdfOutline root = pdfDocument.getOutlines(false);
walkOutlines(root, destsTree.getNames(), pdfDocument);

请注意,代码示例适用于Java,但在C#中应该类似,除了一些案例更改和 IDictionary 而不是地图

Please note that the code sample is for Java, but it should be similar in C# except some case changes and IDictionary instead if Map.

这篇关于从文档大纲(书签)中获取页码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆