如何从pdf文件中提取所有链接? [英] How to extract all links from pdf file?

查看：559 发布时间：2020/5/25 4:08:13 python pdf pypdf

本文介绍了如何从pdf文件中提取所有链接?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

按照标准，链接隐藏在注释中(规范的12.5.6.5节).从那里提取地址很容易: http://blah-blah.com ".如何不仅从注释中提取链接，还从文本本身中提取链接?我可以搜索整个文本并找到像"http://"这样的词，但是还有更好的解决方案吗? PDF编辑器也在突出显示文本链接，他们如何知道这部分文本是超链接?

By standard, links are hiding in Annotations (section 12.5.6.5 from specifications). It is easy to extract address from there: Extracting links to pages in another PDF from PDF using Python or other method But very often links are presented not like special objects in document, but as plain text like "http://blah-blah.com". How do I extract not only links from annotations, but links from text itself? I can search through the whole text and finding words like "http://", but is there more optimal solution? PDF editors are highlighting text-links too, how do they know that this piece of text is hyperlink?

如何从pdf文件中提取所有链接? [英] How to extract all links from pdf file?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从pdf文件中提取所有链接? [英] How to extract all links from pdf file?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭