如何从网页中的嵌入式pdf获取所选文本? [英] How to get the selected text from an embedded pdf in a web page?

查看:310
本文介绍了如何从网页中的嵌入式pdf获取所选文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个pdf文档的示例,我需要从中提取用户的选择 http://www .ada.gov/hospcombrprt.pdf .如果我们查看页面源代码,将会看到类似以下内容:

Here's an example of a pdf document from which I need to extract the user's selection http://www.ada.gov/hospcombrprt.pdf . If we look in the page source we will see smth like:

<html>
  <body marginwidth="0" marginheight="0" style="background-color: rgb(38,38,38)">  
     <embed width="100%" height="100%" name="plugin"        
     src="http://www.ada.gov/hospcombrprt.pdf" type="application/pdf">
  </body>
</html>

我们如何从此嵌入式pdf中获取用户的选择?

How can we get a user's selection from this embedded pdf?

我找到了一篇有关从pdf文档中提取全文的帖子这里,类似于我的帖子

I found a post about extracting the whole text from a pdf doc here and a similar to mine post here where it's written that there are no such possibilities.

但是应该有一些出路.可能可以提取整个文本,然后以某种方式确定选择了什么?还是通过鼠标向下和向上事件中的鼠标光标位置确定选择?将不胜感激.

But there should be some way out. Probably it's possible to extract the whole text and then somehow determine what's been selected? Or determine the selection through the mouse cursor position on the mouse-down and up events? Would appreciate any ideas.

推荐答案

我怀疑是否可以-如果是这样,将不会有通用解决方案,因为每个PDF查看器都是不同的.

I doubt this is possible - and if it is, there will be no generic solution, since every PDF viewer is different.

并非每个人都使用Adobe自己的Acrobat插件.福昕很受欢迎.这两个都是很可能不提供访问此信息的接口的插件.

Not everyone uses Adobe's own Acrobat plugin. Foxit is popular. Both of these are plugins that most likely do not provide an interface to access this information.

Chrome和Firefox等某些浏览器现在提供了内置的PDF查看器,其工作原理与插件完全不同.

And some Browsers such as Chrome and Firefox now provide a built in PDF viewer, which work completely different than the plugins.

此外,您是否正在其他域上访问PDF?在这种情况下,同源策略将始终阻止访问此类信息.

Also, are you accessing a PDF on a different domain? In that case same-origin policy would prevent accessing such information anyway.

最后,您需要考虑到并非每个用户都喜欢使用(甚至允许使用)PDF浏览器插件,因此在这种情况下,您的解决方案"将无法使用.

And finally you need to consider that not every user likes using (or even is allowed to use) a PDF browser plugin, so your "solution" won't work in those cases.

还有一点:您使用的是过时的embed元素而不是object的事实表明您正在使用非常古老的知识.

One more point: the fact that you are using the vastly outdated embed element instead of object suggests you are working with very old knowledge.

您可能需要退后一步,然后重新考虑您要在此处进行的操作.什么是大局?您要达到什么目的?

You may need to take a step back and really reconsider what you are trying to do here. What is the bigger picture? What are you trying to achieve?

这篇关于如何从网页中的嵌入式pdf获取所选文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆