如何使用iText7 C#从pdf提取图像 [英] How to extract images from pdf using iText7 c#

查看:640
本文介绍了如何使用iText7 C#从pdf提取图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我用来从pdf提取图像的方法.但是子类型始终为null.我正在使用新版本的iText7库.如果有任何机构与新图书馆合作,请提出建议.

Below approach i have used to extract images from pdf. But sub type is always giving null. I am working with iText7 library which is new version. If any body worked with new library please give suggestions.

    public static string ExtractImageFromPDF(string sourcePdf)
    {            
        PdfReader reader = new PdfReader(sourcePdf);
        try
        {
            PdfDocument document = new PdfDocument(reader);

            for (int pageNumber = 1; pageNumber <= document.GetNumberOfPages(); pageNumber++)
            {
                PdfDictionary obj = (PdfDictionary)document.GetPdfObject(pageNumber);

                if (obj != null && obj.IsStream())
                {
                    PdfDictionary pd = (PdfDictionary)obj;
                    if (pd.ContainsKey(PdfName.Subtype) && pd.Get(PdfName.Subtype).ToString() == "/Image")
                    {
                        string filter = pd.Get(PdfName.Filter).ToString();
                        string width = pd.Get(PdfName.Width).ToString();
                        string height = pd.Get(PdfName.Height).ToString();
                        string bpp = pd.Get(PdfName.BitsPerComponent).ToString();
                        string extent = ".";
                        byte[] img = null;
                        switch (filter)
                        {
                            case "/FlateDecode":
                                byte[] arr = FlateDecodeFilter.FlateDecode(null, true);
                                Bitmap bmp = new Bitmap(Int32.Parse(width), Int32.Parse(height), PixelFormat.Format24bppRgb);
                                BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), ImageLockMode.WriteOnly,
                                    PixelFormat.Format24bppRgb);
                                Marshal.Copy(arr, 0, bmd.Scan0, arr.Length);
                                bmp.UnlockBits(bmd);
                                bmp.Save("d:\\pdf\\bmp1.png", ImageFormat.Png);
                                break;
                            case "/CCITTFaxDecode":
                                break;
                            default:
                                break;
                        }
                    }
                }
            }
        }
        catch
        {
            throw;
        }
        return "";
    }

推荐答案

您的方法的想法是检查其中的每个间接对象是否为图像XObject并提取其中包含的图像数据如果是这样.

The idea of your approach is to check every indirect object in it whether it is an image XObject and extract the contained image data therein if it is.

但是,实际上,您只将值1 .. document.GetNumberOfPages()作为对象编号进行迭代,即仅对文档的间接对象的一小部分进行迭代!

Actually, though, you only iterate over the values 1..document.GetNumberOfPages() as object numbers, i.e. only over a fraction of the indirect objects of your document!

实际上,PDF中的间接对象比页面多,通常很多.

Indeed, there are more indirect objects in a PDF than there are pages, usually very many more.

因此,请迭代直到document.GetNumberOfPdfObjects()-1.

这篇关于如何使用iText7 C#从pdf提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆