itextsharp阅读pdf文件 [英] itextsharp read pdf file

查看:101
本文介绍了itextsharp阅读pdf文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

hiiii朋友...
我必须使用itextsharp阅读pdf文件,所以plzg给出了该代码....

在我的应用程序中,一个页面中有许多文件,因此我将生成pdf页面,因此我获取了该文件中的值以存储在我的sql数据库中....

任何人都知道这一点,所以给我一些代码或提示...

hiiii friend...
i have to read pdf file using itextsharp so plzg give some code of that....

in my application many file in one page so i will generate pdf page so i was get values in that files to store in my sql database....

any one know about that so give me some code or hint...

推荐答案

阅读PDF文件意味着什么?我并不是在开玩笑地问这个问题,因为了解PDF文件不是结构化文件很重要.换句话说,您不能说可以例如通过读取一些字符串来检索段落.另外,您是否也要考虑图像数据?关于您可以做的最好的事情是这样的:
What do you mean by read the PDF file? I''m not kidding asking this question because it''s important to understand that a PDF file isn''t a structured file. In other words, you can''t say that you can retrieve a paragraph, for instance, just by reading some strings. Plus, do you want to consider image data in this as well? About the best that you can do is something like this:
public string ParsePdf(string fileName)
{
  if (!File.Exists(fileName))
    throw new FileNotFoundException("fileName");
  using (PdfReader reader = new PdfReader(fileName))
  {
    StringBuilder sb = new StringBuilder();

    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
    for (int page = 0; page < reader.NumberOfPages; page++)
    {
      string text = PdfTextExtractor.GetTextFromPage(reader, page + 1, strategy);
      if (!string.IsNullOrWhitespace(text))
      {
        sb.Append(Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text))));
      }
    }

    return sb.ToString();
  } 
 }
}

这使用ITextSharp提供的简单阅读器将文本读出.没有尝试创建任何类似的段落.

This uses a simple reader provided by ITextSharp to read the text out. There''s no attempt to create anything like paragraphs out of this.


static void Main(string[] args)
        {
            string[] strings = { pdffile1 };
            ReadPdf("D:\\", strings);
        }
        private static void ReadPdf(string outputFilePath, string[] pdfFiles)
        {
            if (pdfFiles.Length > 0)
            {
                Console.WriteLine("Reading Pdf file.....");
                PdfDocument outputPDFDocument = new PdfDocument();

                foreach (string pdfFile in pdfFiles)
                {
                    PdfDocument inputPDFDocument = PdfReader.Open(pdfFile, PdfDocumentOpenMode.Import);
                    outputPDFDocument.Version = inputPDFDocument.Version;
                    foreach (PdfPage page in inputPDFDocument.Pages)
                    {
                        outputPDFDocument.AddPage(page);
                    }
                }
                outputPDFDocument.Save(outputFilePath);
                Console.WriteLine("Successfully Completed the pdf documents");
                Console.WriteLine("File path is: {0}", MergedFileresultspath);
                Console.ReadLine();
            }
            else
                Console.WriteLine("Please Give the pdf file path before read & save");

        }


请参考以下简单示例
http://www.dotnetspider.com/forum/298029-Print-out-PDF- file.aspx [ ^ ]
Refer this simple example
http://www.dotnetspider.com/forum/298029-Print-out-PDF-file.aspx[^]


这篇关于itextsharp阅读pdf文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆