读取pdf文件中的名字信息 [英] read firstname information in pdf file

查看:241
本文介绍了读取pdf文件中的名字信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

.pdf文件具有用于名字,姓氏等的框
如果可能的话,请让我知道C#代码,该代码使我可以阅读框内这些部分的文字,例如名字,姓氏等.
我已经研究了pdfsharp,现在可以阅读其大小,作者等信息,但不确定如何阅读我上面所述的部分.
有什么想法吗?
谢谢

The .pdf files have boxes for firstname, lastname, etc
if possible, please let me know the c# code which allows me to read the text inside the box for these sections such as firstname, lastname, etc.
I have looked into pdfsharp and can now read the size, author, etc but not sure how to read the sections I described above.
Any thoughts please?
Thanks

推荐答案

这是一个如何获取PDF文件所有文本的示例.

This is an example how to get all the text of the PDF File.

public string ReadPdfFile(string filename)
{
	PdfReader pdfReader = new PdfReader(filename);
	string fullText = string.Empty;

	for (int nPage = 1; nPage <= pdfReader.NumberOfPages; page++)
	{
		ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
		PdfReader reader2 = new PdfReader(filename);
		String s = PdfTextExtractor.GetTextFromPage(reader2, nPage, its);

		s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
		strText = strText + s;
		reader.Close();
	}
  
  return strText;
}



对于归档,您想要做什么,您必须创建自己的提取策略.一旦我根据LocationTextExtractionStrategy(位于ITextSharp源代码中)通过其位置获取文本后.
您应该根据自己的条件创建自己的TextExtractionStrategy.

我希望这是有用的.

毛罗.



For archive what do you want to do, you have to crete your own Extraction Strategy. Once I make mine to get the text by its position based on the LocationTextExtractionStrategy (is in the ITextSharp source code).
You should create you own TextExtractionStrategy with your conditions.

I hope this was useful.

Mauro.


这篇关于读取pdf文件中的名字信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆