使用ASP.Net C#使用iTextSharp在PDF文件中查找字符串和位置 [英] Find a string and location in PDF file using iTextSharp using ASP.Net C#
问题描述
我正在尝试使用Asp.net C#中的iTextSharp查找字符串及其在PDF中的位置进行编辑。但到目前为止,借助Google提供的帮助,我无法做到。这是当前的代码,但它确实通过块读取文本块,但找不到所需的文本。需要帮助谢谢
I am trying to find a string and it's location in a PDF using iTextSharp in Asp.net C# for editing. But so far with the help available on Google I am unable to do it. This is the current code but it does read text chunk by chunk but couldn't find the required text. Need help Thanks
public class RectAndText
{
public iTextSharp.text.Rectangle Rect;
public String Text;
public RectAndText(iTextSharp.text.Rectangle rect, String text)
{
this.Rect = rect;
this.Text = text;
}
}
public class MyLocationTextExtractionStrategy : LocationTextExtractionStrategy
{
public List<RectAndText> myPoints = new List<RectAndText>();
public String TextToSearchFor { get; set; }
public System.Globalization.CompareOptions CompareOptions { get; set; }
public MyLocationTextExtractionStrategy(String textToSearchFor, System.Globalization.CompareOptions compareOptions = System.Globalization.CompareOptions.None)
{
this.TextToSearchFor = textToSearchFor;
this.CompareOptions = compareOptions;
}
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
var startPosition = System.Globalization.CultureInfo.CurrentCulture.CompareInfo.IndexOf(renderInfo.GetText(), this.TextToSearchFor, this.CompareOptions);
if (startPosition < 0)
{
return;
}
var chars = renderInfo.GetCharacterRenderInfos().Skip(startPosition).Take(this.TextToSearchFor.Length).ToList();
var firstChar = chars.First();
var lastChar = chars.Last();
var bottomLeft = firstChar.GetDescentLine().GetStartPoint();
var topRight = lastChar.GetAscentLine().GetEndPoint();
var rect = new iTextSharp.text.Rectangle(
bottomLeft[Vector.I1],
bottomLeft[Vector.I2],
topRight[Vector.I1],
topRight[Vector.I2]
);
this.myPoints.Add(new RectAndText(rect, this.TextToSearchFor));
}
}
通话功能
string thisDir = System.Web.Hosting.HostingEnvironment.MapPath("~/");
var testFile = thisDir + "example.pdf";
var t = new MyLocationTextExtractionStrategy("searchstring"); //need to search this searchstring
using (var r = new PdfReader(testFile))
{
var ex = PdfTextExtractor.GetTextFromPage(r, 1, t);
}
foreach (var p in t.myPoints)
{
Console.WriteLine(string.Format("Found text {0} at {1}x{2}", p.Text, p.Rect.Left, p.Rect.Bottom));
}
推荐答案
这很容易管理(在iText7中使用RegexBasedLocationExtractionStrategy。
可以使用正则表达式构造此类,并推出与表达式匹配的文本的位置。即使您无法切换到iText7,您仍然可以查看源代码,看看我们是如何实现它的。
This can easily be managed (in iText7) using RegexBasedLocationExtractionStrategy. This class can be constructed using a regular expression and pushes out the locations of the text matching the expression. Even if you can not switch to iText7, you can still have a look at the source code and see how we implemented it.
这篇关于使用ASP.Net C#使用iTextSharp在PDF文件中查找字符串和位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!