从PDF文件中读取超链接 [英] Reading hyperlinks from pdf file

查看：1059 发布时间：2015/11/24 15:43:48 c# .net pdf itextsharp

本文介绍了从PDF文件中读取超链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想阅读PDF文件，并从该文件中所有的超链接。我使用iTextSharp的为C＃.NET。

I'm trying to read a pdf file and get all hyperlinks from this file. I'm using iTextSharp for C# .net.

PdfReader reader = new PdfReader("test.pdf");           
List<PdfAnnotation.PdfImportedLink> list = reader.GetLinks(36);

此方法GetLinks返回列表，并附有大量有关链接的信息，但这种方法并不能返回我想要的价值的超级链接字符串，我确切地知道有超链接在第36页

This method "GetLinks" return a list with a lot of information about the links, but this method does not return the value that I want, the hyperlink string and I exactly know that there are hyperlinks in 36th page

推荐答案

PdfReader.GetLinks（）只是为了与内部链接的文档使用，不对外超链接。为什么？我不知道。

PdfReader.GetLinks() is only meant to be used with links internal to the document, not external hyperlinks. Why? I don't know.

在code以下是基于关闭的<一个href="http://stackoverflow.com/questions/6578316/editing-hyperlink-and-anchors-in-pdf-using-itextsharp/6599734#6599734">$c$c我前面写的但我也仅限于存储在PDF作为 PdfName.URI 链接。它可以存储，最终做同样的事情，有可能是其他类型的，但你需要检测的链路如JavaScript。我不相信有什么事的规范，指出一个链接实际上需要一个URI，它只是暗示，所以下面的code返回一个字符串，你可以（可能）转换为一个URI你自己。

The code below is based off of code I wrote earlier but I've limited it to links stored in the PDF as a PdfName.URI. Its possible to store the link as Javascript that ultimately does the same thing and there's probably other types but you'll need to detect for that. I don't believe there's anything in the spec that says that a link actually needs to be a URI, its just implied, so the code below returns a string that you can (probably) convert to a URI on your own.

    private static List<string> GetPdfLinks(string file, int page)
    {
        //Open our reader
        PdfReader R = new PdfReader(file);

        //Get the current page
        PdfDictionary PageDictionary = R.GetPageN(page);

        //Get all of the annotations for the current page
        PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);

        //Make sure we have something
        if ((Annots == null) || (Annots.Length == 0))
            return null;

        List<string> Ret = new List<string>();

        //Loop through each annotation
        foreach (PdfObject A in Annots.ArrayList)
        {
            //Convert the itext-specific object as a generic PDF object
            PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A);

            //Make sure this annotation has a link
            if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
                continue;

            //Make sure this annotation has an ACTION
            if (AnnotationDictionary.Get(PdfName.A) == null)
                continue;

            //Get the ACTION for the current annotation
            PdfDictionary AnnotationAction = (PdfDictionary)AnnotationDictionary.Get(PdfName.A);

            //Test if it is a URI action (There are tons of other types of actions, some of which might mimic URI, such as JavaScript, but those need to be handled seperately)
            if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI))
            {
                PdfString Destination = AnnotationAction.GetAsString(PdfName.URI);
                if (Destination != null)
                    Ret.Add(Destination.ToString());
            }
        }

        return Ret;

    }

和调用它：

        string myfile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Output.pdf");
        List<string> Links = GetPdfLinks(myfile, 1);

这篇关于从PDF文件中读取超链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从PDF文件中读取超链接 [英] Reading hyperlinks from pdf file

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

从PDF文件中读取超链接 [英] Reading hyperlinks from pdf file

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭