如何在C#中从pdf文件中读取文本 [英] how to read text from pdf file in C#

查看:180
本文介绍了如何在C#中从pdf文件中读取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

public  string GetPDFText(String pdfPath)
    {
        PdfReader reader = new PdfReader(pdfPath);

        StringWriter output = new StringWriter();
        String _text = String.Empty;
        int _subpage = 0;
        Int16 PerPageText = 2000;//char
        Int32 PageNumber = 1;

        for (int i = 1; i <= reader.NumberOfPages; i++)
        {
            _text = _text+PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy());
            _subpage = (_text.Length - _text.Length % PerPageText) / PerPageText;

            if (_subpage > 0)
            {
                for (int j = 0; j < _subpage; j++)
                {

                    output.WriteLine("Page " + PageNumber.ToString() + "<br />" + _text.Substring(PerPageText * j, PerPageText) + "<br /><br />");
                    PageNumber = PageNumber+1;
                }
                _text = _text.Substring(_text.Length - _text.Length % PerPageText, _text.Length % PerPageText);
            }
            //else {
            //    output.WriteLine("Page " + i.ToString() + "<br />" + _text+ "<br /><br />");
            //}
            
        }

        return output.ToString();
    }

推荐答案

结账



PFDSharp - http://pdfsharp.codeplex.com/ [ ^ ]



PDFLib - http://pdflib.codeplex.com/ [ ^ ]



干杯,

Edo
Checkout

PFDSharp - http://pdfsharp.codeplex.com/[^]
and
PDFLib - http://pdflib.codeplex.com/[^]

Cheers,
Edo


这篇关于如何在C#中从pdf文件中读取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆