如何在C#中从pdf文件中读取文本 [英] how to read text from pdf file in C#
本文介绍了如何在C#中从pdf文件中读取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
public string GetPDFText(String pdfPath)
{
PdfReader reader = new PdfReader(pdfPath);
StringWriter output = new StringWriter();
String _text = String.Empty;
int _subpage = 0;
Int16 PerPageText = 2000;//char
Int32 PageNumber = 1;
for (int i = 1; i <= reader.NumberOfPages; i++)
{
_text = _text+PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy());
_subpage = (_text.Length - _text.Length % PerPageText) / PerPageText;
if (_subpage > 0)
{
for (int j = 0; j < _subpage; j++)
{
output.WriteLine("Page " + PageNumber.ToString() + "<br />" + _text.Substring(PerPageText * j, PerPageText) + "<br /><br />");
PageNumber = PageNumber+1;
}
_text = _text.Substring(_text.Length - _text.Length % PerPageText, _text.Length % PerPageText);
}
//else {
// output.WriteLine("Page " + i.ToString() + "<br />" + _text+ "<br /><br />");
//}
}
return output.ToString();
}
推荐答案
结账
PFDSharp - http://pdfsharp.codeplex.com/ [ ^ ]
和
PDFLib - http://pdflib.codeplex.com/ [ ^ ]
干杯,
Edo
Checkout
PFDSharp - http://pdfsharp.codeplex.com/[^]
and
PDFLib - http://pdflib.codeplex.com/[^]
Cheers,
Edo
这篇关于如何在C#中从pdf文件中读取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文