如何将PDF转换为iTextSharp的文本文件 [英] How to convert PDF to text file in iTextSharp

查看：133 发布时间：2016/10/5 21:51:37 c# pdf itextsharp

本文介绍了如何将PDF转换为iTextSharp的文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要检索的PDF文件中的文本。但是，使用下面的代码我只得到空文本文件。

I have to retrieve text from PDF file. But using the following code I only get empty text file.

for (int i = 0; i < n; i++)
{
    pagenumber = i + 1;
    filename = pagenumber.ToString();
    while (filename.Length < digits) filename = "0" + filename;
    filename = "_" + filename;
    filename = splitFile + name + filename;
    // step 1: creation of a document-object
    document = new Document(reader.GetPageSizeWithRotation(pagenumber));
    // step 2: we create a writer that listens to the document
    PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(filename + ".pdf", FileMode.Create));

    // step 3: we open the document
    document.Open();

    PdfContentByte cb = writer.DirectContent;
    PdfImportedPage page = writer.GetImportedPage(reader, pagenumber);
    int rotation = reader.GetPageRotation(pagenumber);
    if (rotation == 90 || rotation == 270)
    {
        cb.AddTemplate(page, 0, -1f, 1f, 0, 0, reader.GetPageSizeWithRotation(pagenumber).Height);
    }
    else
    {
        cb.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);
    }
    // step 5: we close the document

    document.Close();
    PDFParser parser = new PDFParser();
    parser.ExtractText(filename + ".pdf", filename + ".txt");
}

我在做什么错了，我应该如何从PDF中提取文本？

What am I doing wrong and how should I extract text from PDF?

推荐答案

有关与iTextSharp的文本提取，采取库的最新版本，并使用

For text extraction with iTextSharp, take a current version of that library and use

PdfTextExtractor.GetTextFromPage(reader, pageNumber);

当心，还有在某些5.3.x版本的文本提取代码已经被同时一个bug固定于躯干。因此，您可能希望签出树干修订版。

Beware, there is a bug in the text extraction code in some 5.3.x version which has meanwhile been fixed in trunk. You, therefore, might want to checkout the trunk revision.

这篇关于如何将PDF转换为iTextSharp的文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将PDF转换为iTextSharp的文本文件 [英] How to convert PDF to text file in iTextSharp

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

如何将PDF转换为iTextSharp的文本文件 [英] How to convert PDF to text file in iTextSharp

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭