如何使用c#在asp.net的pdf文档中阅读和查找特定单词? [英] how to read and find the particular word in the pdf document in asp.net using c#?

查看:98
本文介绍了如何使用c#在asp.net的pdf文档中阅读和查找特定单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试过iTextSharp来阅读并找到pdf文档中的单词,但我无法得到正确的结果,所以还有其他任何方法来实现这个。

解决方案

< blockquote>以下方法工作正常。它给出了找到文本的页面列表。



  public  List< int> ReadPdfFile( string  fileName, String  searthText)
{
List< int> pages = new List< int>();
if (File.Exists(fileName))
{
PdfReader pdfReader = new PdfReader(fileName);
for int page = 1 ; page < = pdfReader.NumberOfPages; page ++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader,page,strategy);
if (currentPageText.Contains(searthText))
{
pages.Add(page);
}
}
pdfReader.Close();
}
返回页面;
}
< / int > < / int > ; < / int >





在这里阅读答案: http://stackoverflow.com/questions/17485548/search-particular-word-in-pdf-using-itextsharp [ ^ ]


以下是主要想法:使用iTextSharp从PDF文件中提取文本 [ ^ ]


i tried iTextSharp to read and find the word from pdf document but i unable to get the proper result, so any other ways are there to implement this one.

解决方案

The following method works fine. It gives the list of pages in which the text is found.

public  List<int> ReadPdfFile(string fileName, String searthText)
            {
                List<int> pages = new List<int>();
                if (File.Exists(fileName))
                {
                    PdfReader pdfReader = new PdfReader(fileName);
                    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                    {
                        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                        string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                        if (currentPageText.Contains(searthText))
                        {
                            pages.Add(page);
                        }
                    }
                    pdfReader.Close();
                }
                return pages;
            }
</int></int></int>



Read the answer here : http://stackoverflow.com/questions/17485548/search-particular-word-in-pdf-using-itextsharp[^]


Here is main idea: Using iTextSharp to extract Text from PDF files[^]


这篇关于如何使用c#在asp.net的pdf文档中阅读和查找特定单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆