如何在C＃中从PDF中提取格式化文本 [英] How to extract formatted text from PDF in C#

查看：146 发布时间：2019/6/8 4:11:03 C# PDF Text itextsharp

本文介绍了如何在C＃中从PDF中提取格式化文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Hello Experts，

我正在开发一个基于Web的应用程序，用户将通过该应用程序上传其PDF文档，我需要从该PDF中提取几个细节，在分析数据后，我将显示结果网页。我搜索了很多，发现了一些帮助我使用 iTextSharp ， PDFBox 以及 Codeproject 和 stackoverflow 上提出的更多类似问题

不知何故，我逐页获得了文本，但它没有格式化，所以我无法对从pdf中提取的数据执行操作。有没有办法逐行逐行提取文字。

谢谢

Hello Experts,
I am developing a web based application through which user will upload its PDF document, i need to extract several details from that PDF and after analysing the data i will show the result on web page. I have googled a lot and found several article which helped me to extract text using iTextSharp, PDFBox and many more similar question asked on Codeproject and stackoverflow
Somehow i got the text page by page but it was not formatted so i could not perform operation on data extracted from pdf. Is there any way to extract text like line by line , column by column.

Thank you

推荐答案

public string ReadPdfFile(string path)
        {
            string result = "";
            StringBuilder text = new StringBuilder();

            PdfReader pdfReader = new PdfReader(path);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                result += PdfTextExtractor.  GetTextFromPage(pdfReader, page, strategy);

                //  result = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(result)));
                // text.Append(result);

            }

            pdfReader.Close();
            txtInput.Text = result;
            return result;
        }

这篇关于如何在C＃中从PDF中提取格式化文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在C＃中从PDF中提取格式化文本 [英] How to extract formatted text from PDF in C#

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何在C＃中从PDF中提取格式化文本 [英] How to extract formatted text from PDF in C#

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭