从PDF特定页面提取图像 [英] Extract Image from a particular page in PDF

查看:240
本文介绍了从PDF特定页面提取图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从PDF文件中提取图像。我试着用下面的代码,它从PDF完美提取的JPEG图像。现在的问题是如何从一个特定页面例如提取图像第1页或其他页面。我不想读整个PDF搜索图像。



有什么建议?



代码提取图像:

 私人无效名单,LT;为System.Drawing.Image> ExtractImages(字符串PDFSourcePath)
{
名单,LT;为System.Drawing.Image> ImgList =新的List<&System.Drawing.Image对象GT;();

iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = NULL;
iTextSharp.text.pdf.PdfReader PDFReaderObj = NULL;
iTextSharp.text.pdf.PdfObject PDFObj = NULL;
iTextSharp.text.pdf.PdfStream PDFStremObj = NULL;


{
RAFObj =新iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
PDFReaderObj =新iTextSharp.text.pdf.PdfReader(RAFObj,NULL);

的for(int i = 0; I< = PDFReaderObj.XrefSize - 1;我++)
{
PDFObj = PDFReaderObj.GetPdfObject(I)

如果((PDFObj = NULL)及!&安培; PDFObj.IsStream())
{
PDFStremObj =(iTextSharp.text.pdf.PdfStream)PDFObj;
iTextSharp.text.pdf.PdfObject亚型= PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

如果((亚型=空)及!&放大器; subtype.ToString()== iTextSharp.text.pdf.PdfName.IMAGE.ToString())
{
字节[]字节= iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

如果((字节!= NULL))
{

{
System.IO.MemoryStream MS =新System.IO.MemoryStream(字节);

MS.Position = 0;
为System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);
pictureBox1.Image = ImgPDF;
MS.Close();
MS.Flush();

}
赶上(例外)
{

}
}
}
}
}
PDFReaderObj.Close();
}
赶上(异常前)
{
抛出新的异常(ex.Message);
}
}


解决方案

我没有iTextSharp的4.0目前可用所以这个代码针对5.2,但它应该只是罚款对于上了年纪的,太。此代码是从这个帖子这里几乎直达电梯,请参阅该职位以及为响应进一步的问题。正如我在上面的评论说,你的代码看着都从文档的角度图像的同时,我联系到代码放在页逐页。



请阅读所有的意见在其他职位,尤其是这个这也解释了这的适用于JPG图像。有很多不同类型的图像的PDF支持,除非你知道你只处理你需要添加更多的代码一堆JPG图片。请参见这篇文章这个职位一些提示。

 字符串TESTFILE = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder的.desktop),Doc1.pdf); 
串outputPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
INT页次= 1;

PdfReader PDF =新PdfReader(TESTFILE);
PdfDictionary PG = pdf.GetPageN(页次);
PdfDictionary解析度=(PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj =(PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
如果(xobj == NULL){返回; }
的foreach(在xobj.Keys PdfName名){
PdfObject OBJ = xobj.Get(名);
如果(obj.IsIndirect()!){继续; }
PdfDictionary TG =(PdfDictionary)PdfReader.GetPdfObject(OBJ);
PdfName类型=(PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
如果{继续(type.Equals(PdfName.IMAGE)!); }
INT XrefIndex = Convert.ToInt32(((PRIndirectReference)OBJ).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem =(PdfStream)pdfObj;
字节[]字节= PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
如果(字节== NULL){继续; }使用
(System.IO.MemoryStream memStream =新System.IO.MemoryStream(字节)){
memStream.Position = 0;
为System.Drawing.Image IMG = System.Drawing.Image.FromStream(memStream);
如果
Directory.CreateDirectory(outputPath)(Directory.Exists(outputPath)!);

路径字符串= Path.Combine(outputPath,的String.Format(@{0} .JPG,页次));
System.Drawing.Imaging.EncoderParameters PARMS =新System.Drawing.Imaging.EncoderParameters(1);
parms.Param [0] =新System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression,0);
VAR jpegEncoder = ImageCodecInfo.GetImageEncoders()了ToList()查找(X => x.FormatID == ImageFormat.Jpeg.Guid)。
img.Save(路径,jpegEncoder,PARMS);

}
}


I want to extract an Image from a PDF file. I tried with the following code and it extracted a jpeg Image perfectly from the PDF. The problem is how to extract image from a particular page e.g. Page 1 or from some other page. I don't want to read the whole PDF to search for the Image.

Any suggestions?

Code to extract Image:

private void List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
        {
            List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

            iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
            iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
            iTextSharp.text.pdf.PdfObject PDFObj = null;
            iTextSharp.text.pdf.PdfStream PDFStremObj = null;

            try
            {
                RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
                PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

                for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
                {
                    PDFObj = PDFReaderObj.GetPdfObject(i);

                    if ((PDFObj != null) && PDFObj.IsStream())
                    {
                        PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                        iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                        if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                        {
                            byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

                            if ((bytes != null))
                            {
                                try
                                {
                                    System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);

                                    MS.Position = 0;
                                    System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);
                                    pictureBox1.Image = ImgPDF;
                                    MS.Close();
                                    MS.Flush();

                                }
                                catch (Exception)
                                {

                                }
                            }
                        }
                    }
                }
                PDFReaderObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }                
        }

解决方案

I don't have iTextSharp 4.0 available currently so this code targets 5.2 but it should work just fine for the older one, too. This code is an almost direct lift from this post here, so please see that post as well as responses for further questions. As I said in the comments above, your code is looking at all of the images from the document-perspective while the code that I linked to goes page-by-page.

Please read all of the comments in the other post, especially this one which explains that this ONLY works for JPG images. There's a lot of different types of images that PDF supports so unless you know that you're only dealing with JPGs you'll need to add a bunch of more code. See this post and this post for some hints.

        string testFile = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Doc1.pdf");
        string outputPath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
        int pageNum = 1;

        PdfReader pdf = new PdfReader(testFile);
        PdfDictionary pg = pdf.GetPageN(pageNum);
        PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
        PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
        if (xobj == null) { return; }
        foreach (PdfName name in xobj.Keys) {
            PdfObject obj = xobj.Get(name);
            if (!obj.IsIndirect()) { continue; }
            PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
            PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
            if (!type.Equals(PdfName.IMAGE)) { continue; }
            int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
            PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
            PdfStream pdfStrem = (PdfStream)pdfObj;
            byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
            if (bytes == null) { continue; }
            using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes)) {
                memStream.Position = 0;
                System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                if (!Directory.Exists(outputPath))
                    Directory.CreateDirectory(outputPath);

                string path = Path.Combine(outputPath, String.Format(@"{0}.jpg", pageNum));
                System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
                parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
                var jpegEncoder = ImageCodecInfo.GetImageEncoders().ToList().Find(x => x.FormatID == ImageFormat.Jpeg.Guid);
                img.Save(path, jpegEncoder, parms);

            }
        }

这篇关于从PDF特定页面提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆