如何将搜索目录文件(.pdx)与PDF文档关联 [英] How to associate search catalog file (.pdx) with PDF document

查看:488
本文介绍了如何将搜索目录文件(.pdx)与PDF文档关联的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用.NET应用程序,我正在尝试创建一个引用其他文件的PDF目录,例如可以在DVD等上分发的文件。

Using a .NET application, I am trying to create a PDF "table of contents" that references other files, like one would distribute on a DVD etc.

为此,我需要一个搜索索引和目录,因此全文搜索将跨文档工作。
我已经能够通过复制旧.pdx文件(目录结构始终相同)自动构建索引,然后从C#调用JavaScript:

For this purpose, I need a search index and catalog, so full-text search will work across documents. I have been able to automate the construction of the index by copying an "old" .pdx file (the directory structure is always the same) and then calling JavaScript from C#:

var js = $@"catalog.getIndex(""{pdxFilePath}"").build('alert(""Hello"")', true)";

formFields.ExecuteThisJavascript(js);

但是如何将.pdx文件与我的.pdf文档相关联,以便自动加载?

But how can I associate the .pdx file with my .pdf document, so it gets loaded automatically?

在Acrobat中,这是在高级文档属性中设置的:

In Acrobat, this is set in the "advanced" document properties:

但是,无法通过 info 元数据访问文档的属性。
显然这是存储在其他地方,但我对PDF格式知之甚少,无法弄清楚如何访问这些数据:

However, this is not accessible via the info or metadata properties of the document. Apparently this is stored somewhere else, but I don't know enough about the PDF format to figure out how to access this data:

任何帮助都将受到高度赞赏。我可以使用Adobe SDK / JavaScript API或其他一些库(例如,我知道我们已经拥有Aspose许可证)。

Any help would be highly appreciated. I could use both the Adobe SDK/JavaScript API or some other library (for instance, I know we already have an Aspose license).

推荐答案

在这里回答我自己的问题......我能够使用 PdfSharp 来解决这个问题。

Answering my own question here... I was able to solve this using PdfSharp.

以下代码与PdfSharp 1.50.4845-RC2a兼容。

The following code is compatible with PdfSharp 1.50.4845-RC2a.

pdxFile 应该是.pdx文件的名称,包括文件扩展名(例如catalog.pdx)。我只使用与PDF文档位于同一文件夹中的.pdx文件对此进行了测试,但我认为相对路径一般应该有效。

pdxFile should be the name of the .pdx file including the file extension (e.g. "catalog.pdx"). I have only tested this with .pdx files located in the same folder as the PDF document, but I would assume that relative paths in general should work.

不保证这一点这是一个完美的解决方案,因为我对PDF格式缺乏更深入的理解,但这似乎至少起作用。

No guarantees that this is a perfect solution as I lack a deeper understanding of the PDF format, but this seems to work at least.

    private void SetSearchCatalog(PdfDocument doc, string pdxFile)
    {
        var indexDict = new PdfDictionary(doc);
        indexDict.Elements["/F"] = new PdfString(pdxFile, PdfStringEncoding.RawEncoding);
        indexDict.Elements["/Type"] = new PdfName("/Filespec");

        var indexArrayItemDict = new PdfDictionary(doc);
        indexArrayItemDict.Elements["/Index"] = indexDict;
        indexArrayItemDict.Elements["/Name"] = new PdfName("/PDX");

        var indexArray = new PdfArray(doc, indexArrayItemDict);

        var searchDict = new PdfDictionary(doc);
        searchDict.Elements["/Indexes"] = indexArray;

        doc.Internals.Catalog.Elements["/Search"] = searchDict;
    }

这篇关于如何将搜索目录文件(.pdx)与PDF文档关联的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆