我想在C#中读取一组pdf文件(集合) [英] i wan to read set of pdf files(collection) in C#
问题描述
实际上,我已经开发了一个winform应用程序,该程序可以读取
内容为字符串格式,但我的应用程序一次只能读取一个文件.但是我
想阅读PDF文件集(集合).我不知道如何在
中设置收集路径
pdf阅读器类.我认为可能foreach循环在收集代码中非常有用
我的代码如下:
Actually i have develop one winform application that winform application reads the
content in string format but my application reads the only one file at a time. but i
want to read set of pdf files(collection). i don''t know how to set the collection path in
pdf reader class.i think may be foreach loop is very usefull in collection code
my code like as:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.IO;
using System.Collections;
using System.Windows.Forms;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
namespace test
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString();
}
}
private void button1_Click(object sender, EventArgs e)
{
Form1.ExtractTextFromPdf(@"D:\Data Sets\Enron\168.pdf");
}
}
}
我的要求是如何读取所有pdf文件(集合),例如路径将为
"@" D:\ Data Sets \ Enron.Enron文件夹中包含一组pdf文件,然后每次提取
一个pdf文件并阅读内容.我认为可能foreach非常有用.
但是我想阅读所有pdf文件(Enron文件夹)
my requirement is how to read all pdf files(collection) like path will be as
"@"D:\Data Sets\Enron".Enron folder conatin set of pdf files then each time pick up
one pdf file and read the content. i think may be foreach is very usefull.
however i want read all pdf files(Enron folder)
推荐答案
此代码段向您展示了如何在目录中获取所有PDF文件名
然后如何进行迭代:
This snippet shows you how to get all PDF file names in your directory
and then how to iterate through this:
string pathName = @"D:\Data Sets\Enron";
string[] pdfFileNames = Directory.GetFiles(pathName, "*.pdf");
foreach(string pdfFileName in pdfFileNames)
{
ExtractTextFromPdf(pdfFileName);
}
这篇关于我想在C#中读取一组pdf文件(集合)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!