我想在C#中读取一组pdf文件(集合) [英] i wan to read set of pdf files(collection) in C#

查看:129
本文介绍了我想在C#中读取一组pdf文件(集合)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实际上,我已经开发了一个winform应用程序,该程序可以读取

内容为字符串格式,但我的应用程序一次只能读取一个文件.但是我

想阅读PDF文件集(集合).我不知道如何在
中设置收集路径
pdf阅读器类.我认为可能foreach循环在收集代码中非常有用

我的代码如下:

Actually i have develop one winform application that winform application reads the

content in string format but my application reads the only one file at a time. but i

want to read set of pdf files(collection). i don''t know how to set the collection path in

pdf reader class.i think may be foreach loop is very usefull in collection code

my code like as:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.IO;
using System.Collections;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

namespace test
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
   
        public static string ExtractTextFromPdf(string path)
        {
            using (PdfReader reader = new PdfReader(path))
            {
                StringBuilder text = new StringBuilder();

                for (int i = 1; i <= reader.NumberOfPages; i++)
                {
                    text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
                    
                }

                return text.ToString();
            }

        } 
             private void button1_Click(object sender, EventArgs e)
            {
            Form1.ExtractTextFromPdf(@"D:\Data Sets\Enron\168.pdf");
            }
            
        }
        }



我的要求是如何读取所有pdf文件(集合),例如路径将为

"@" D:\ Data Sets \ Enron.Enron文件夹中包含一组pdf文件,然后每次提取

一个pdf文件并阅读内容.我认为可能foreach非常有用.
但是我想阅读所有pdf文件(Enron文件夹)



my requirement is how to read all pdf files(collection) like path will be as

"@"D:\Data Sets\Enron".Enron folder conatin set of pdf files then each time pick up

one pdf file and read the content. i think may be foreach is very usefull.
however i want read all pdf files(Enron folder)

推荐答案

此代码段向您展示了如何在目录中获取所有PDF文件名
然后如何进行迭代:
This snippet shows you how to get all PDF file names in your directory
and then how to iterate through this:
string pathName = @"D:\Data Sets\Enron";

string[] pdfFileNames = Directory.GetFiles(pathName, "*.pdf");

foreach(string pdfFileName in pdfFileNames)
{
    ExtractTextFromPdf(pdfFileName);
}


这篇关于我想在C#中读取一组pdf文件(集合)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆