具有内嵌图像的iText GetTextFromPage异常 [英] iText GetTextFromPage exception with inline image

查看：102 发布时间：2018/11/16 17:30:32 c# pdf itextsharp itext

本文介绍了具有内嵌图像的iText GetTextFromPage异常的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到的问题与此处讨论的问题相同，但尚未解决。我的目标是从现有的pdf文件中提取文本。我收到错误消息无法找到某个pdf的图像数据或EI ，我无法将其作为样本共享。它适用于其他pdf，具有以下代码

I have the same problem as was discussed here, which was not solved. My objective is to extract the text from an existing pdf file. I get the error message Could not find image data or EI for a certain pdf, which I cannot share as a sample. It works for other pdfs, with the following code

string fileURI = "C:\\Test\\Sample.pdf";
PdfReader reader = new PdfReader(fileURI);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
Debug.WriteLine(s);

我正在使用iTextSharp 5.5.0并尝试更改 found == 1 到找到< = 1 ，如其他帖子所示。它没有帮助。

I am using iTextSharp 5.5.0 and tried changing found == 1 to found <= 1 as suggested in other posts. It does not help.

它有助于删除pdf中的所有图像吗？我真的只需要文字。来自iText的哪些命令可以帮助我？

Would it help to remove all images in the pdf? I really just need the text. Which commands from iText could help me with this?

具有内嵌图像的iText GetTextFromPage异常 [英] iText GetTextFromPage exception with inline image

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

具有内嵌图像的iText GetTextFromPage异常 [英] iText GetTextFromPage exception with inline image

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭