使用C#编程使用免费软件库压缩现有的PDF [英] Compress existing PDF using C# programming using freeware libraries

查看:144
本文介绍了使用C#编程使用免费软件库压缩现有的PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在谷歌上搜索了很多关于如何压缩现有的 PDF (大小)。
我的问题是

I have been searching a lot on Google about how to compress existing pdf (size). My problem is


  1. 我不能使用任何应用程序,因为它需要通过一个C#程序来完成

  1. I can't use any application, because it needs to be done by a C# program.

我不能使用任何付费库作为我的客户不想出去预算。所以付费库无疑是一个

I can't use any paid library as my clients don't want to go out of Budget. So a PAID library is certainly a NO

我做家庭作业的最后2天来了在使用iTextSharp的,BitMiracle一个解决方案,但都无济于事,因为前者减少文件的1%,后来一个是支付。

I did my home-work for last 2 days and came upon a solution using iTextSharp, BitMiracle but to no avail as the former decrease just 1% of a file and later one is a paid.

我也来了跨PDFcompressNET和PDFTK,但我没能找到自己的.dll文件。

I also came across PDFcompressNET and pdftk but i wasn't able to find their .dll.

其实是PDF保险2-3图像(黑白)和70左右网页占比达到5 MB的大小。

Actually the pdf is insurance policy with 2-3 images (black and white) and around 70 pages accounting to size of 5 MB.

我需要PDF格式输出只(不能在任何其他格式)

I need the output in pdf only(can't be in any other format)

推荐答案

下面是做到这一点的方法(这应该不考虑使用该工具包工作):

Here's an approach to do this (and this should work without regard to the toolkit you use):

如果你有一个24位RGB或32位CMYK图像做到以下几点:

If you have a 24-bit rgb or 32 bit cmyk image do the following:


  • 确定图像是否真的是什么。如果是CMYK,转换为RGB。如果是RGB和真灰色,转换为灰色。如果它是灰色或调色板和只有2位真彩,转换成1位。如果它是灰色和有相对较少的灰色变化的方式,可以考虑转换为1位用合适的二值化技术。

  • 测量有关的图像尺寸它是如何被放置在页面 - 如果它是300 dpi或更高,考虑到图像重新取样,具体取决于图像的位深度更小的尺寸 - 例如,你可以的可能的300 dpi的灰色或RGB 200 dpi的去并没有失去太多细节。

  • 如果你有一个RGB图像是真彩色,考虑palettizing吧。

  • 检查图像的内容看看你是否能帮助使其更加压缩。例如,如果您通过彩色/灰度图像运行,精细了很多的集群色彩,平滑考虑他们。如果它是灰色或黑色和白色,并包含了一些斑点,可以考虑去斑点。

  • 明智地选择你的最终压缩。 JPEG2000可以比JPEG做得更好。 JBIG2确实比G4要好得多。 Flate大概为灰色最好无损压缩。 JPEG2000和JBIG2的大多数实现都的的自由。

  • 如果你是一个摇滚明星,你想尝试分割图像并将其打入该领域真的是黑色和白色,真正颜色。

  • determine if the image is really what it is. If it's cmyk, convert to rgb. If it's rgb and really gray, convert to gray. If it's gray or paletted and only has 2 real colors, convert to 1-bit. If it's gray and there is relatively little in the way of gray variations, consider converting to 1 bit with a suitable binarization technique.
  • measure the image dimensions in relation to how it is being placed on the page - if it's 300 dpi or greater, consider resampling the image to a smaller size depending on the bit depth of the image - for example, you can probably go from 300 dpi gray or rgb to 200 dpi and not lose too much detail.
  • if you have an rgb image that is really color, consider palettizing it.
  • Examine the contents of the image to see if you can help make it more compressible. For example, if you run through a color/gray image and fine a lot of colors that cluster, consider smoothing them. If it's gray or black and white and contains a number of specks, consider despeckling.
  • choose your final compression wisely. JPEG2000 can do better than JPEG. JBIG2 does much better than G4. Flate is probably the best non-destructive compression for gray. Most implementations of JPEG2000 and JBIG2 are not free.
  • if you're a rock star, you want to try to segment the image and break it into areas that are really black and white and really color.

这是说,如果你能做到这一切以及在无人监督的方式,你在自己的权利的商业产品。

That said, if you do can do all of this well in an unsupervised manner, you have a commercial product in its own right.

我会说,你可以做其中的大部分用的 Atalasoft dotImage 。(免责声明:这不是免费的,我在那里工作,我已经写了几乎所有的PDF工具;我曾经在Acrobat中工作)

I will say that you can do most of this with Atalasoft dotImage (disclaimers: it's not free; I work there; I've written nearly all the PDF tools; I used to work on Acrobat).

到与dotImage一个特定的方法是将使出浑身是图像而已,重新压缩它们并出它们保存到一个新的PDF,然后从原始文件把所有的网页,并取代它们的再压缩的网页建立一个新的PDF页面,然后再保存。它并不难

One particular way to that with dotImage is to pull out all the pages that are image only, recompress them and save them out to a new PDF then build a new PDF by taking all the pages from the original document and replacing them the recompressed pages, then saving again. It's not that hard.

List<int> pagesToReplace = new List<int>();
PdfImageCollection pagesToEncode = new PdfImageCollection();

using (Document doc = new Document(sourceStream, password)) {

    for (int i=0; i < doc.Pages.Count; i++) {
        Page page = doc.Pages[i];
        if (page.SingleImageOnly) {
            pagesToReplace.Add(i);
            // a PDF image encapsulates an image an compression parameters
            PdfImage image = ProcessImage(sourceStream, doc, page, i);
            pagesToEncode.Add(i);
        }
    }

    PdfEncoder encoder = new PdfEncoder();
    encoder.Save(tempOutStream, pagesToEncode, null); // re-encoded pages
    tempOutStream.Seek(0, SeekOrigin.Begin);

    sourceStream.Seek(0, SeekOrigin.Begin);
    PdfDocument finalDoc = new PdfDocument(sourceStream, password);
    PdfDocument replacementPages = new PdfDocument(tempOutStream);

    for (int i=0; i < pagesToReplace.Count; i++) {
         finalDoc.Pages[pagesToReplace[i]] = replacementPages.Pages[i];
    }

    finalDoc.Save(finalOutputStream);



现在缺少的这里是processImage来()。 processImage来将光栅化的页面(你不需要明白,图像可能被缩小要对PDF)或提取图像(和跟踪图像上的变换矩阵),并通过上面列出的步骤。这是不平凡的,但它是可行的。

What's missing here is ProcessImage(). ProcessImage will rasterize the page (and you wouldn't need to understand that the image might have been scaled to be on the PDF) or extract the image (and track the transformation matrix on the image), and go through the steps listed above. This is non-trivial, but it's doable.

这篇关于使用C#编程使用免费软件库压缩现有的PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆