使用iTextSharp的PDF从图像中提取 [英] Extract image from PDF using itextsharp

查看:320
本文介绍了使用iTextSharp的PDF从图像中提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图提取所有使用iTextSharp的pdf文件图像,但似乎无法克服这一障碍。

错误就行了为System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS)occures; 给人一个错误参数无效

我觉得它工作时,图像是一个位图,但不是任何其他格式。

我有这个以下code - 遗憾的长度;

 私人无效Form1_Load的(对象发件人,EventArgs的发送)
    {
        的FileStream FS = File.OpenRead(@reader.pdf);
        字节[]数据=新的字节[fs.Length]
        fs.Read(数据,0,(int)的fs.Length);        清单<&System.Drawing.Image对象GT; ImgList =新的List<&System.Drawing.Image对象GT;();        iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = NULL;
        iTextSharp.text.pdf.PdfReader PDFReaderObj = NULL;
        iTextSharp.text.pdf.PdfObject PDFObj = NULL;
        iTextSharp.text.pdf.PdfStream PDFStremObj = NULL;        尝试
        {
            RAFObj =新iTextSharp.text.pdf.RandomAccessFileOrArray(数据);
            PDFReaderObj =新iTextSharp.text.pdf.PdfReader(RAFObj,NULL);            的for(int i = 0; I< = PDFReaderObj.XrefSize - 1;我++)
            {
                PDFObj = PDFReaderObj.GetPdfObject(ⅰ);                如果((PDFObj = NULL)及!&安培; PDFObj.IsStream())
                {
                    PDFStremObj =(iTextSharp.text.pdf.PdfStream)PDFObj;
                    iTextSharp.text.pdf.PdfObject亚型= PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);                    如果((亚型=空)及!&放大器; subtype.ToString()== iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                    {
                        字节[]字节= iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);                        如果((字节!= NULL))
                        {
                            尝试
                            {
                                System.IO.MemoryStream MS =新System.IO.MemoryStream(字节);                                MS.Position = 0;
                                为System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);                                ImgList.Add(ImgPDF);                            }
                            赶上(例外)
                            {
                            }
                        }
                    }
                }
            }
            PDFReaderObj.Close();
        }
        赶上(异常前)
        {
            抛出新的异常(ex.Message);
        }    } // Form1_Load的


解决方案

我已经使用这个库在过去,没有任何问题。它应该是正是你追求的。

http://www.winnovative-software.com/PdfImgExtractor.aspx

I am trying to extract all the images from a pdf using itextsharp but can't seem to overcome this one hurdle.

The error occures on the line System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS); giving an error of "Parameter is not valid".

I think it works when the image is a bitmap but not of any other format.

I have this following code - sorry for the length;

    private void Form1_Load(object sender, EventArgs e)
    {
        FileStream fs = File.OpenRead(@"reader.pdf");
        byte[] data = new byte[fs.Length];
        fs.Read(data, 0, (int)fs.Length);

        List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

        iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
        iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
        iTextSharp.text.pdf.PdfObject PDFObj = null;
        iTextSharp.text.pdf.PdfStream PDFStremObj = null;

        try
        {
            RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(data);
            PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

            for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
            {
                PDFObj = PDFReaderObj.GetPdfObject(i);

                if ((PDFObj != null) && PDFObj.IsStream())
                {
                    PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                    iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                    if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                    {
                        byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

                        if ((bytes != null))
                        {
                            try
                            {
                                System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);

                                MS.Position = 0;
                                System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);

                                ImgList.Add(ImgPDF);

                            }
                            catch (Exception)
                            {
                            }
                        }
                    }
                }
            }
            PDFReaderObj.Close();
        }
        catch (Exception ex)
        {
            throw new Exception(ex.Message);
        }



    } //Form1_Load

解决方案

I have used this library in the past with no problems. It should be exactly what you're after.

http://www.winnovative-software.com/PdfImgExtractor.aspx

这篇关于使用iTextSharp的PDF从图像中提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆