如何从iText PDFReader获取字节数组 [英] How to get byte array from iText PDFReader

查看:577
本文介绍了如何从iText PDFReader获取字节数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从Itext PDFReader获取字节数组。

How to get byte array from Itext PDFReader.

float width = 8.5f * 72;
float height = 11f * 72;
float tolerance = 1f;

PdfReader reader = new PdfReader("source.pdf");

for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
    Rectangle cropBox = reader.getCropBox(i);
    float widthToAdd = width - cropBox.getWidth();
    float heightToAdd = height - cropBox.getHeight();
    if (Math.abs(widthToAdd) > tolerance || Math.abs(heightToAdd) > tolerance)
    {
        float[] newBoxValues = new float[] { 
            cropBox.getLeft() - widthToAdd / 2,
            cropBox.getBottom() - heightToAdd / 2,
            cropBox.getRight() + widthToAdd / 2,
            cropBox.getTop() + heightToAdd / 2
        };
        PdfArray newBox = new PdfArray(newBoxValues);

        PdfDictionary pageDict = reader.getPageN(i);
        pageDict.put(PdfName.CROPBOX, newBox);
        pageDict.put(PdfName.MEDIABOX, newBox);
    }
}

从上面的代码我需要从阅读器获取字节数组宾语。如何?

From above code I need to get byte array from reader object. How?

1)不工作,获取空byteArray。

1) Not working, getting empty byteArray.

OutputStream out = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(reader, out);
stamper.close();

byte byteArray[] = (((ByteArrayOutputStream)out).toByteArray()); 

2)无效,收到java.io.IOException:错误:Header不包含versioninfo

2) Not working, getting java.io.IOException: Error: Header doesn't contain versioninfo

ByteArrayOutputStream outputStream = new ByteArrayOutputStream( );
    for (int i = 1; i <= reader.getNumberOfPages(); i++)
        {
            outputStream.write(reader.getPageContent(i));
        }
   PDDocument pdDocument = new PDDocument().load(outputStream.toByteArray( );)  

有没有其他方法可以从PDFReader获取字节数组。

Is there any other way to get byte array from PDFReader.

推荐答案

让我们从另一个问题中得到一个问题角度。在我看来,您希望逐页呈现PDF。如果是这样,那么你的问题就错了。提取页面内容流是不够的,因为我已经指出:没有一个渲染器能够渲染这样的流,因为你没有传递任何资源,如字体,Form和Image XObjects,......

Let's take a the question from a different angle. It seems to me that you want to render a PDF page by page. If so, then your question is all wrong. Extracting the page content stream will not be sufficient as I already indicated: not a single renderer will be able to render such a stream because you don't pass any resources such as fonts, Form and Image XObjects,...

如果要从PDF渲染单独的页面,则需要文档拆分为单独的单页PDF文档。这些单页文档需要包含呈现页面所需的所有信息。这不是内存友好的:假设你有一个10页的100 KB文档,其中每个页面显示一个80 KB的标识,你最终将得到10个文件,每个文件至少80 KByte(10个文件已经是800 KByte,不仅仅是10页的文档,其中10页共享一个Image XObject。)

If you want to render separate pages from a PDF, you need to burst the document into separate single page full-blown PDF documents. These single page documents need to contain all the necessary information to render the page. This isn't memory friendly: suppose that you have a 100 KByte document of 10 pages where every page shows an 80 KByte logo, you'll end up with 10 documents that are each at least 80 KByte (times 10 makes already 800 KByte which is much more than the 10-page document where a single Image XObject is shared by the 10 pages).

你需要做这样的事情:

PdfReader reader = new PdfReader("source.pdf");
int n = reader.getNumberOfPages();
reader close();
ByteArrayOutputStream boas;
PdfStamper stamper;
for (int i = 0; i < n; ) {
    reader = new PdfReader("source.pdf");
    reader.selectPages(String.valueOf(++i));
    baos = new ByteArrayOutputStream();
    stamper = new PdfStamper(reader, baos);
    stamper.close();
    doSomethingWithBytes(baos.toByteArray);
}

在这种情况下, baos.toByteArray()将包含有效PDF文件的字节。在你的任何(尴尬)尝试中都不是这种情况。

In this case, baos.toByteArray() will contain the bytes of a valid PDF file. This wasn't the case in any of your (awkward) attempts.

这篇关于如何从iText PDFReader获取字节数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆