读取pdf文件中的名字信息 [英] read firstname information in pdf file
问题描述
.pdf文件具有用于名字,姓氏等的框
如果可能的话,请让我知道C#代码,该代码使我可以阅读框内这些部分的文字,例如名字,姓氏等.
我已经研究了pdfsharp,现在可以阅读其大小,作者等信息,但不确定如何阅读我上面所述的部分.
有什么想法吗?
谢谢
The .pdf files have boxes for firstname, lastname, etc
if possible, please let me know the c# code which allows me to read the text inside the box for these sections such as firstname, lastname, etc.
I have looked into pdfsharp and can now read the size, author, etc but not sure how to read the sections I described above.
Any thoughts please?
Thanks
推荐答案
这是一个如何获取PDF文件所有文本的示例.
This is an example how to get all the text of the PDF File.
public string ReadPdfFile(string filename)
{
PdfReader pdfReader = new PdfReader(filename);
string fullText = string.Empty;
for (int nPage = 1; nPage <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
PdfReader reader2 = new PdfReader(filename);
String s = PdfTextExtractor.GetTextFromPage(reader2, nPage, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
reader.Close();
}
return strText;
}
对于归档,您想要做什么,您必须创建自己的提取策略.一旦我根据LocationTextExtractionStrategy(位于ITextSharp源代码中)通过其位置获取文本后.
您应该根据自己的条件创建自己的TextExtractionStrategy.
我希望这是有用的.
毛罗.
For archive what do you want to do, you have to crete your own Extraction Strategy. Once I make mine to get the text by its position based on the LocationTextExtractionStrategy (is in the ITextSharp source code).
You should create you own TextExtractionStrategy with your conditions.
I hope this was useful.
Mauro.
这篇关于读取pdf文件中的名字信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!