关于pdfname中指定的每个constans的描述,因为我需要能够同时检索图像和文本 [英] description on each constans specified in pdfname, since i need to be able to retrieve both images and text at the same time

查看:154
本文介绍了关于pdfname中指定的每个constans的描述,因为我需要能够同时检索图像和文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在检索pdf文件中的图像和文本方面遇到了麻烦,我能够在pdf文件中获取图像和文本但不能同时获取(这将导致是否呈现首先是图像,或者首先是文本,例如在我的面板控件中?),也许你们可以帮我定义pdfname中每个常量的含义是什么?我尝试使用pdfname.all但它返回null,但在使用pdfname.resources时它返回procset,font和xobject。我使用xobject作为图像,但是什么是procset和字体(这可能是文本的样式吗?它有pdfname.text用于检索文本)吗?

i am having a trouble in retrieving images and text in a pdf file at the same, i was able to get images and text in a pdf file but not at the same time (this will cause a question of whether to render the image first or the text first for example in my panel control?), maybe if you guys can help me define what does each constants in pdfname means? i tried using pdfname.all but it returns null, but when using pdfname.resources it returns procset, font and xobject. i used xobject for image, but what are procset and font (could this be the style of the text? does it have pdfname.text for retrieving text)?

谢谢提前。

推荐答案

首先,


我在pdf文件中检索图像和文本时遇到问题

i am having a trouble in retrieving images and text in a pdf file at the same

此任务你应该使用iText(夏普)解析器API。在iTextSharp中,你基本上实现 IRenderListener (一种界面,包含了解内容流中的(位图)图像和文本片段的方法)并使用它处理页面内容:

for this task you should use the iText(Sharp) parser API. In iTextSharp you essentially implement IRenderListener (an interface with methods for being informed about (bitmap) images and text fragments in a content stream) and process the page contents with it:

PdfReader reader = new PdfReader(...);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
int pageNumber = [... the number of the page you are interested in; may be a loop variable ...];

IRenderListener listener = new [... your IRenderListener implementation ...]
parser.ProcessContent(pageNumber, listener);

你问


是首先渲染图像还是首先渲染文本,例如在我的面板控件中

whether to render the image first or the text first for example in my panel control

IRenderListener 方法还检索有关位图或文本片段位置的信息。

The IRenderListener methods also retrieve information on the location of the bitmap or text fragment in question.

关于如何在听众中组合文本片段的想法,您可能希望受到实现的启发 SimpleTextExtractionStrategy LocationTextExtractionStrategy

For ideas how the text fragments may be combined in your listener, you may want to be inspired by the implementations SimpleTextExtractionStrategy or LocationTextExtractionStrategy present in iTextSharp.

如果您坚持手动操作,但是......

If you insist on doing it manually, though...


也许你们可以帮我定义每个const是什么pdfname中的蚂蚁是指?

maybe if you guys can help me define what does each constants in pdfname means?

您可以在PDF规范ISO 32000-1:2008中找到名称映射到的内容的定义副本其中Adobe提供了此处

You find the definitions of what the names map to in the PDF specification ISO 32000-1:2008 a copy of which Adobe made available here.


使用pdfname.resources时,它返回procset,font和xobject。我使用xobject作为图像,但什么是procset和字体(这可能是文本的样式?

when using pdfname.resources it returns procset, font and xobject. i used xobject for image, but what are procset and font (could this be the style of the text?

页面内容资源字典在规范的第7.8.3节中解释。

The contents of the page Resource Dictionaries are explained in section 7.8.3 of the specification.


它是否有pdfname.text用于检索文本??

does it have pdfname.text for retrieving text)?

您将在第9节中找到如何在页面内容流和xobjects中显示测试。

You'll find how test is presented in page content streams and xobjects in section 9.

这篇关于关于pdfname中指定的每个constans的描述,因为我需要能够同时检索图像和文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆