使用iText5 for .NET读取PDF文件 [英] Reading a PDF File using iText5 for .NET
问题描述
我使用C#作为编程平台,使用 iTextSharp
来阅读PDF内容。我使用下面的代码来读取内容,但似乎每页都读取。
I'm using C# as programming platform and iTextSharp
to read PDF content. I have used the below code to read the content but it seems it read per page.
public string ReadPdfFile(object Filename)
{
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader((string)Filename);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
}
reader.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
return strText;
}
任何人都可以帮我解决如何编写每行读取pdf内容的代码?
Can anyone help me on how can I write a code reading pdf content per line?
推荐答案
试试这个,使用 LocationTextExtractionStrategy
而不是 SimpleTextExtractionStrategy
它会在返回的文本中添加换行符。然后你可以使用 strText.Split('\ n')
将你的文本分成 string []
和以每行为基础消费。
Try this, use the LocationTextExtractionStrategy
instead of the SimpleTextExtractionStrategy
it will add new line characters to the text returned. Then you can use strText.Split('\n')
to split your text into a string[]
and consume it on a per line basis.
这篇关于使用iText5 for .NET读取PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!