搜索特定的词使用iTextSharp的PDF [英] Search Particular Word in PDF using Itextsharp

查看:118
本文介绍了搜索特定的词使用iTextSharp的PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在StackOverflow的第一篇文章。

This is my first post in StackOverflow.

我在我的系统驱动器中的PDF文件...我要编写使用Itextsharp.dll参考搜索该PDF一个特定的词在C#程序...说我要搜索计算器.. 。
   如果PDF包含单词计算器,它应该返回true。

I have a PDF file in my System drive... I want to write a program in C# using Itextsharp.dll reference to search for a Particular word in that PDF ... say I want to search "StackOverFlow"... If the PDF contains the Word " StackOverFlow" , it should return true.

否则应返回false。

Else it should return false.

我也看着很多文章,但没有得到解决,直至现在.. :-(

I have looked into many articles but didn't get the solution till now ..:-(

我曾尝试到现在是什么:

What I have tried till now is :

public string ReadPdfFile(string fileName)
        {
            StringBuilder text = new StringBuilder();

            if (File.Exists(fileName))
            {
                PdfReader pdfReader = new PdfReader(fileName);

                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                    string currentText = "2154/MUM/2012 A";// PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                    text.Append(currentText);
                }
                pdfReader.Close();
            }
            return text.ToString();
        }

在此先感谢,
Sabya开发

Thanks in advance, Sabya Dev

推荐答案

下面的方法工作正常。它给出了在该文本中找到的网页列表中。

The following method works fine. It gives the list of pages in which the text is found.

     public  List<int> ReadPdfFile(string fileName, String searthText)
            {
                List<int> pages = new List<int>();
                if (File.Exists(fileName))
                {
                    PdfReader pdfReader = new PdfReader(fileName);
                    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                    {
                        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                        string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                        if (currentPageText.Contains(searthText))
                        {
                            pages.Add(page);
                        }
                    }
                    pdfReader.Close();
                }
                return pages;
            }

这篇关于搜索特定的词使用iTextSharp的PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆