PDF提取未完成 [英] PDF extraction not complete

查看:117
本文介绍了PDF提取未完成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从PDF文件提取文本: http://www.filedropper.com/copy_1 ,但是我从页面中得到的文字不到一半. 我正在使用iTextSharp:

I'm trying to extract text from the PDF file: http://www.filedropper.com/copy_1, but I get less than half of text from a page. I'm using iTextSharp:

PdfReader reader = new PdfReader(file);
string currentText =  PdfTextExtractor.GetTextFromPage(reader, 1);

我也使用了SimpleTextExtractionStrategy来代替默认的LocationTextExtractionStrategy:

I have used SimpleTextExtractionStrategy as well instead of default LocationTextExtractionStrategy:

PdfTextExtractor.GetTextFromPage(reader, 1, new SimpleTextExtractionStrategy())

该文件最初是从Microsoft Reporting Service(我无权访问)生成的,并且我已经提取了一页用于测试文本提取.

The file was originally generated from Microsoft Reporting Service (to which I don't have an access), and that I've extracted one page for testing the text extraction.

任何人都可以帮忙吗?

推荐答案

尝试一下:-

PdfReader reader = new PdfReader(file);
StringBuilder currentText= new StringBuilder();
for (int i= 1; i <= reader.NumberOfPages; i++)
{
    currentText.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}

,然后对"currentText"执行所需的任何操作.

and then perform whatever operation you want on "currentText".

这篇关于PDF提取未完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆