使用/CCITTFaxDecode 过滤器从 PDF 中提取图像 [英] Extracting image from PDF with /CCITTFaxDecode filter

查看:33
本文介绍了使用/CCITTFaxDecode 过滤器从 PDF 中提取图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由扫描软件生成的 pdf.pdf 每页有 1 个 TIFF 图像.我想从每个页面中提取 TIFF 图像.

我正在使用 iTextSharp 并且我已成功找到图像并且可以从 PdfReader.GetStreamBytesRaw 方法取回原始字节.问题是,正如我之前发现的那样,iTextSharp 不包含 PdfReader.CCITTFaxDecode 方法.

我还知道什么?即使没有 iTextSharp,我也可以在记事本中打开 pdf 并使用 /Filter/CCITTFaxDecode 找到流,并且我从 /DecodeParams 知道它正在使用 CCITTFaxDecode 组 4.

有没有人知道如何从我的 pdf 中获取 CCITTFaxDecode 过滤器图像?

干杯,卡胡

解决方案

实际上,vbcrlfuser 的回答确实对我有帮助,但是对于当前版本的 BitMiracle.LibTiff.NET,代码不太正确,因为我可以下载它.在当前版本中,等效代码如下所示:

使用 iTextSharp.text.pdf;使用 BitMiracle.LibTiff.Classic;...Tiff tiff = Tiff.Open("C:\test.tif", "w");tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString()));tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString()));tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4);tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString()));tiff.SetField(TiffTag.SAMPLESPEPIXEL, 1);tiff.WriteRawStrip(0, raw, raw.Length);tiff.关闭();

使用上面的代码,我最终在 C: est.tif 中得到了一个有效的 Tiff 文件.谢谢你,vbcrlfuser!

I have a pdf that was generated from scanning software. The pdf has 1 TIFF image per page. I want to extract the TIFF image from each page.

I am using iTextSharp and I have successfully found the images and can get back the raw bytes from the PdfReader.GetStreamBytesRaw method. The problem is, as many before me have discovered, iTextSharp does not contain a PdfReader.CCITTFaxDecode method.

What else do I know? Even without iTextSharp I can open the pdf in notepad and find the streams with /Filter /CCITTFaxDecode and I know from the /DecodeParams that it is using CCITTFaxDecode group 4.

Does anyone out there know how I can get the CCITTFaxDecode filter images out of my pdf?

Cheers, Kahu

解决方案

Actually, vbcrlfuser's answer did help me, but the code was not quite correct for the current version of BitMiracle.LibTiff.NET, as I could download it. In the current version, equivalent code looks like this:

using iTextSharp.text.pdf;
using BitMiracle.LibTiff.Classic;

...
      Tiff tiff = Tiff.Open("C:\test.tif", "w");
      tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString()));
      tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString()));
      tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4);
      tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString()));
      tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
      tiff.WriteRawStrip(0, raw, raw.Length);
      tiff.Close();

Using the above code, I finally got a valid Tiff file in C: est.tif. Thank you, vbcrlfuser!

这篇关于使用/CCITTFaxDecode 过滤器从 PDF 中提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆