在C＃中的PDF文本提取 [英] Extracting text from PDFs in C#

查看：135 发布时间：2016/9/8 18:56:55 c# pdf text extract

本文介绍了在C＃中的PDF文本提取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

倒也干脆，我要撕裂文字了多个PDF文件（颇多实际上）为了在SQL数据库坚持之前分析的内容。

Pretty simply, I need to rip text out of multiple PDFs (quite a lot actually) in order to analyse the contents before sticking it in an SQL database.

我发现，这类工作（最好的一个使用iTextSharp的）一些非常粗略的免费C＃库，但也有许许多多的格式错误，某些字符混乱和时间有空格（''）无处不在很多 - 里面的话，每个字母之间，其中大块占用了几行，这一切似乎有点随意。

I've found some pretty sketchy free C# libraries that sort of work (the best one uses iTextSharp), but there are umpteen formatting errors and some characters are scrambled and alot of the time there are spaces (' ') EVERYWHERE - inside words, between every letter, huge blocks of them taking up several lines, it all seems a bit random.

是否有这样做的，我完全可以俯瞰的任何简单的方法（很有可能！），或者是一个艰巨的任务有点，涉及把取出的字节值成信可靠？

Is there any easy way of doing this that I'm completely overlooking (quite likely!) or is it a bit of an arduous task that involves converting the extracted byte values into letters reliably?

干杯，

邓肯

在C＃中的PDF文本提取 [英] Extracting text from PDFs in C#

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

在C＃中的PDF文本提取 [英] Extracting text from PDFs in C#

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭