将 PDF 转换为多页 tiff(第 4 组) [英] Converting PDF to multipage tiff (Group 4)

查看:48
本文介绍了将 PDF 转换为多页 tiff(第 4 组)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试转换由 org.apache.pdfbox.pdmodel.PDDocument 类和 icafe 库(

左侧显示ghostscript"的输出,右侧显示icafe"的输出.可以看出,至少在这种情况下,icafe"的输出优于ghostscript"的输出.

使用 CCITTFAX4 压缩,"ghostscript" 中的文件大小为 2.22M,"icafe" 中的文件大小为 2.08M.考虑到在创建黑白输出时使用了抖动,两者都不太好.事实上,不同的压缩算法将创建更小的文件大小.例如,使用 LZW,icafe"的相同输出只有 634K,如果使用 DEFLATE 压缩,则输出文件大小下降到 582K.

I'm trying to convert PDFs as represented by the org.apache.pdfbox.pdmodel.PDDocument class and the icafe library (https://github.com/dragon66/icafe/) to a multipage tiff with group 4 compression and 300 dpi. The sample code works for me for 288 dpi but strangely NOT for 300 dpi, the exported tiff remains just white. Has anybody an idea what the issue is here?

The sample pdf which I use in the example is located here: http://www.bergophil.ch/a.pdf

import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

import cafe.image.ImageColorType;
import cafe.image.ImageParam;
import cafe.image.options.TIFFOptions;
import cafe.image.tiff.TIFFTweaker;
import cafe.image.tiff.TiffFieldEnum.Compression;
import cafe.io.FileCacheRandomAccessOutputStream;
import cafe.io.RandomAccessOutputStream;

public class Pdf2TiffConverter {
    public static void main(String[] args) {
        String pdf = "a.pdf";
        PDDocument pddoc = null;
        try {
            pddoc = PDDocument.load(pdf);
        } catch (IOException e) {
        }

        try {
            savePdfAsTiff(pddoc);
        } catch (IOException e) {
        }
    }

    private static void savePdfAsTiff(PDDocument pdf) throws IOException {
        BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
        for (int i = 0; i < images.length; i++) {
            PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
                    .get(i);
            BufferedImage image;
            try {
//              image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
                image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
                images[i] = image;
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        FileOutputStream fos = new FileOutputStream("a.tiff");
        RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
                fos);
        ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
        ImageParam[] param = new ImageParam[1];
        TIFFOptions tiffOptions = new TIFFOptions();
        tiffOptions.setTiffCompression(Compression.CCITTFAX4);
        builder.imageOptions(tiffOptions);
        builder.colorType(ImageColorType.BILEVEL);
        param[0] = builder.build();
        TIFFTweaker.writeMultipageTIFF(rout, param, images);
        rout.close();
        fos.close();
    }
}

Or is there another library to write multi-page TIFFs?

EDIT:

Thanks to dragon66 the bug in icafe is now fixed. In the meantime I experimented with other libraries and also with invoking ghostscript. As I think ghostscript is very reliable as id is a widely used tool, on the other hand I have to rely that the user of my code has an ghostscript-installation, something like this:

   /**
 * Converts a given pdf as specified by its path to an tiff using group 4 compression
 *
 * @param pdfFilePath The absolute path of the pdf
 * @param tiffFilePath The absolute path of the tiff to be created
 * @param dpi The resolution of the tiff
 * @throws MyException If the conversion fails
 */
private static void convertPdfToTiffGhostscript(String pdfFilePath, String tiffFilePath, int dpi) throws MyException {
    // location of gswin64c.exe
    String ghostscriptLoc = context.getGhostscriptLoc();

    // enclose src and dest. with quotes to avoid problems if the paths contain whitespaces
    pdfFilePath = """ + pdfFilePath + """;
    tiffFilePath = """ + tiffFilePath + """;

    logger.debug("invoking ghostscript to convert {} to {}", pdfFilePath, tiffFilePath);
    String cmd = ghostscriptLoc + " -dQUIET -dBATCH -o " + tiffFilePath + " -r" + dpi + " -sDEVICE=tiffg4 " + pdfFilePath;
    logger.debug("The following command will be invoked: {}", cmd);

    int exitVal = 0;
    try {
        exitVal = Runtime.getRuntime().exec(cmd).waitFor();
    } catch (Exception e) {
        logger.error("error while converting to tiff using ghostscript", e);
        throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR, e);
    }
    if (exitVal != 0) {
        logger.error("error while converting to tiff using ghostscript, exitval is {}", exitVal);
        throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR);
    }
}

I found that the produced tif from ghostscript strongly differs in quality from the tiff produced by icafe (the group 4 tiff from ghostscript looks greyscale-like)

解决方案

It's been a while since the question was asked and I finally find time and a wonderful ordered dither matrix which allows me to give some details on how "icafe" can be used to get similar or better results than calling external ghostscript executable. Some new features were added to "icafe" recently such as better quantization and ordered dither algorithms which is used in the following example code.

Here the sample pdf I am going to use is princeCatalogue. Most of the following code is from the OP with some changes due to package name change and more ImageParam control settings.

import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

import com.icafe4j.image.ImageColorType;
import com.icafe4j.image.ImageParam;
import com.icafe4j.image.options.TIFFOptions;
import com.icafe4j.image.quant.DitherMethod;
import com.icafe4j.image.quant.DitherMatrix;
import com.icafe4j.image.tiff.TIFFTweaker;
import com.icafe4j.image.tiff.TiffFieldEnum.Compression;
import com.icafe4j.io.FileCacheRandomAccessOutputStream;
import com.icafe4j.io.RandomAccessOutputStream;

public class Pdf2TiffConverter {
    public static void main(String[] args) {
        String pdf = "princecatalogue.pdf";
        PDDocument pddoc = null;
        try {
            pddoc = PDDocument.load(pdf);
        } catch (IOException e) {
        }

        try {
            savePdfAsTiff(pddoc);
        } catch (IOException e) {
        }
    }

    private static void savePdfAsTiff(PDDocument pdf) throws IOException {
        BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
        for (int i = 0; i < images.length; i++) {
            PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
                    .get(i);
            BufferedImage image;
            try {
//              image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
                image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
                images[i] = image;
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        FileOutputStream fos = new FileOutputStream("a.tiff");
        RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
                fos);
        ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
        ImageParam[] param = new ImageParam[1];
        TIFFOptions tiffOptions = new TIFFOptions();
        tiffOptions.setTiffCompression(Compression.CCITTFAX4);
        builder.imageOptions(tiffOptions);
        builder.colorType(ImageColorType.BILEVEL).ditherMatrix(DitherMatrix.getBayer8x8Diag()).applyDither(true).ditherMethod(DitherMethod.BAYER);
        param[0] = builder.build();
        TIFFTweaker.writeMultipageTIFF(rout, param, images);
        rout.close();
        fos.close();
    }
}

For ghostscript, I used command line directly with the same parameters provided by the OP. The screenshots for the first page of the resulted TIFF images are showing below:

The lefthand side shows the output of "ghostscript" and the righthand side the output of "icafe". It can be seen, at least in this case, the output from "icafe" is better than the output from "ghostscript".

Using CCITTFAX4 compression, the file size from "ghostscript" is 2.22M and the file size from "icafe" is 2.08M. Both are not so good given the fact dither is used while creating the black and white output. In fact, a different compression algorithm will create way smaller file size. For example, using LZW, the same output from "icafe" is only 634K and if using DEFLATE compression the output file size went down to 582K.

这篇关于将 PDF 转换为多页 tiff(第 4 组)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆