当我使用iText从PDF文件中提取文本时,我正在从以前的页面中获取值 [英] When I extract text from a PDF file using iText I am getting values from previous pages
问题描述
我正在尝试从多页PDF文件中的每一页的特定位置提取文本块.
I am trying to extract a block of text from a specific location from each page in a multiple page PDF file.
我知道了文本的位置,并且能够在第一页上正确地提取文本. 但是,在第一页之后的页面上,提取的文本似乎正在堆积.
I have the location of the text, and I am able to extract it correctly on the first page. However on the pages after the first page, the text extracted seems to be accumulating.
例如,如果页面1上的文本值为"A",则页面2为"B".而第3页为"C",那么我通过FOR循环在每次迭代的输出字符串中都会收到以下值:
For example if the text value on page 1 is "A", page 2 is "B" and Page 3 is "C" then I am receiving the following values in my output string for each iteration through my FOR loop:
循环1:输出= A
Loop1 : output = A
回路2:输出= B A
Loop2 : output = B A
循环3:输出= C B A
Loop3 : output = C B A
我正在用C#编写的项目中使用iTextSharp.
I am using iTextSharp in my project, written in C#.
任何帮助将不胜感激.
var reader = new PdfReader(foregroundFile);
RectangleJ customerIdRectangle = new RectangleJ(0, 495, 108, 27);
RenderFilter[] filters = new RenderFilter[1];
LocationTextExtractionStrategy regionFilter = new LocationTextExtractionStrategy();
filters[0] = new RegionTextRenderFilter(customerIdRectangle);
FilteredTextRenderListener strategy = new FilteredTextRenderListener(regionFilter, filters);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
string output = "";
output = PdfTextExtractor.GetTextFromPage(reader, i, strategy);
Console.WriteLine(output);
}
推荐答案
请像这样修改您的代码:
Please adapt your code like this:
var reader = new PdfReader(foregroundFile);
RectangleJ customerIdRectangle = new RectangleJ(0, 495, 108, 27);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
RenderFilter[] filters = new RenderFilter[1];
LocationTextExtractionStrategy regionFilter = new LocationTextExtractionStrategy();
filters[0] = new RegionTextRenderFilter(customerIdRectangle);
FilteredTextRenderListener strategy = new FilteredTextRenderListener(regionFilter, filters);
string output = "";
output = PdfTextExtractor.GetTextFromPage(reader, i, strategy);
Console.WriteLine(output);
}
这篇关于当我使用iText从PDF文件中提取文本时,我正在从以前的页面中获取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!