使用c#从pdf文件中读取数据 [英] read data from pdf files using c#

查看:984
本文介绍了使用c#从pdf文件中读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个问题。我有5个PDF,有大约38,000个客观问题。所以我想创建一个导入这个问题的应用程序并将其保存到数据库中,然后为用户提供界面以选择具有四个目标的问题。我使用itextsharp从PDF中读取一个块,也逐行读取。阅读后的内容是分散的,我无法弄清楚我可以分割或区分问题和四个目标的顺序。有没有更好的方法可以从PDF导入数据? PDF中的内容采用表格格式。



以下是pdf和结果字符串的快照。

输入Pdf文件

窗口中的结果字符串

I have this question. I have 5 PDFs having around 38,000 objective questions. So i want to make an application which imports this questions and save it into database and then give interface to the user for choosing question with four objective. I used itextsharp to read from PDFs as a chunk and also line by line. The content after reading is scattered and i cannot figure out a sequence by which i can split or differentiate between the question and the four objectives. Is there any better way by which I can import data from PDFs?? The content in PDFs is in tabular format.

Here is the snapshots of the pdf and the resulting string.
Input Pdf file
resulting string in Window

推荐答案

查看我的上一个答案 [ ^ ]类似的问题。
See my previous answer[^] for similar question.


这篇关于使用c#从pdf文件中读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆