在PDF中添加链接 [英] Add links in PDF

查看:129
本文介绍了在PDF中添加链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些使用Microsoft Word生成的PDF.我要:

I have several PDFs that were generated with Microsoft Word. I want to:

  1. 使用正则表达式在PDF文本中查找匹配项.
  2. 将匹配的文本转换为指向外部URL的链接.
  3. 保存新版本的PDF.

如果我使用HTML进行此操作,则它看起来像这样:

If I were doing this in HTML, it would look like this:

<!-- before: -->
This is the text to match.

<!-- after: -->
This is the text to <a href="http://www.match.com/" target="_blank">match</a>.

如何对PDF执行此操作?

How can I do this to a PDF?

我更喜欢Python,但我愿意接受其他选择.

I'd prefer Python, but I'm open to alternatives.

编辑:我无权访问原始Word文档.我需要自己操纵PDF.我正在寻找一种使用Python PDF库(或另一种语言的类似内容)的技术.

I don't have access to the original Word documents. I need to manipulate the PDFs themselves. I'm looking for a technique using a Python PDF library (or something similar in another language).

我了解PDF的源代码不包含文字字符串.我想知道是否有一种方法可以执行以下操作:(1)提取文本,(2)找到匹配项,(3)为每个匹配项,在原始PDF中的文本位置周围绘制一个可单击的框.我最接近的是PyPDF2的addLink(),但这会在PDF中添加内部链接,而不是外部URL的链接.

Edit 2: I understand that the source code of a PDF doesn't contain literal strings. I'm wondering if there's an approach that could do something like: (1) extract the text, (2) find matches, and (3) for each match, draw a clickable box around the position of the text in the original PDF. The closest I've come is PyPDF2's addLink(), but that adds internal links in the PDF, not links to external URLs.

推荐答案

我已经解决了这个问题.感谢任何清理错误的人. https://github.com/JohnMulligan/PyPDF2/tree/URI-linking

I have solved this. Appreciate anyone cleaning up any errors. https://github.com/JohnMulligan/PyPDF2/tree/URI-linking

因为Kurt回答了第1部分和第2部分的大部分内容,所以我的回答仅限于原始问题的第3部分:如何向PDF添加外部链接. (我对1和2有一个完全有效的答案,但这很不雅致.如果人们愿意,我也将其张贴出来.)

Because Kurt answered most of parts 1 and 2, I'm going to restrict my answer to part 3 of the original question: how to add external links to a PDF. (I have a fully working answer to 1 & 2, but it's inelegant. If people want it, I'll post that, too.)

我的PyPDF2分支具有addURI功能,其功能与软件包的原始addLink()相同.

My branch of PyPDF2 has addURI functionality, that works in the same way as the package's original addLink().

具体来说: 对于具有页码键的矩形字典:

Specifically: With a rectangles dictionary that has has pagenumber keys:

rectangles_dictionary = {0:{'key1':[255, 156, 277, 171],'key2':[293, 446, 323, 461]},1:{'key2':[411, 404, 443, 419]}}

(矩形格式为[llX, llY, urX, urY]) 现在我们有了矩形,可以将2个矩形分配给第1页,将1个矩形分配给第2页.

(Rectangle format is [llX, llY, urX, urY]) Now we have rectangles to assign 2 rectangles to page 1, and 1 rectangle to page 2.

添加一个URL字典,该字典使用这些键查找要分配的URL:

Add a URLs dictionary that uses those keys to look up the URLs to assign:

destinations_dictionary = {'key1':'url1','key2':'url2'}

然后我们可以将适当的链接添加到所有这些矩形区域:

We can then add the appropriate links to all those rectangle zones:

def make_pdf(rectangles_dictionary,destinations_dictionary):
    input = reader(file('pdfs/input_pdf.pdf','rb'))
    output = file('pdfs/output_pdf.pdf','wb')
    result = writer()

    for pagenum in range(0, input.getNumPages()):
        page = input.getPage(pagenum)
        result.addPage(page)

    for pagenum in rectangles_dictionary.keys():

        for name in rectangles_dictionary[pagenum].keys():
            for rectangle in rectangles_dictionary[pagenum][name]:

                    destination = destinations_dictionary[name]
                    result.addURI(pagenum, destination, rectangle)

    result.write(output)

更干净的方法使用JSON或其他方法来完成上半部分,但对于我的实现而言,这是最有效的方法.

Cleaner ways to do the first half there with JSON or something but for my implementation it was the most efficient way.

当然关键是这一点:

result.addURI(pagenum, destination, rectangle)

pagenumint(),目标为str(),矩形为list()

With pagenum as int(), destination as str(), and rectangle as list()

这篇关于在PDF中添加链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆