在使用iTextSharp从PDF文件中提取文本时,我收到此错误:“无法找到图像数据或EI” [英] While extracting text from PDF file using iTextSharp, I am getting this error: "Could not find image data or EI"
问题描述
使用iTextSharp从PDF文件中提取文本时,我收到此错误:无法找到图像数据或EI
While extracting text from PDF file using iTextSharp, I am getting this error: "Could not find image data or EI"
在包含的特定页面上会出现此错误仅限图片。
This error occurs on particular pages that contains image only.
原因可能是因为我试图提取文本而不检查页面中是否有任何文字内容?
Could the reason be because I am trying to extract the text without checking whether there is any text content in the page?
推荐答案
在PDF规范中未明确指定内嵌图像。图像数据应包含在 ID
和 EI
运算符之间。但是图像数据本身可能包含EI。
在iText(夏普)图像数据被读取,直到遇到< whitespace> EI< whitespace>
。但是,有一些PDF将 EI< whitespace>
作为内嵌图像数据的结尾。对于那些内嵌图像,iText(Sharp)会抛出此异常。
Inline images are not specified very well in the PDF specification. The image data should be contained between ID
and EI
operators. But there's a possibility the image data itself contains "EI".
In iText(Sharp) image data is read until <whitespace>EI<whitespace>
is encountered. However, there are PDFs that have EI<whitespace>
as the end of inline image data. For those inline images iText(Sharp) throws this exception.
如果这是您的PDF问题,您可以通过更改找到它来修复它== 1
到找到< = 1
在 InlineImageUtils.ParseInlineImageSamples()
这里:
http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/src/core/iTextSharp/text/pdf/parser/InlineImageUtils.cs#l337
If this is the issue with your PDF, you can probably fix it by changing found == 1
to found <= 1
in InlineImageUtils.ParseInlineImageSamples()
here:
http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/src/core/iTextSharp/text/pdf/parser/InlineImageUtils.cs#l337
这篇关于在使用iTextSharp从PDF文件中提取文本时,我收到此错误:“无法找到图像数据或EI”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!