从Excel嵌入式对象到XML中的Base64字符串 [英] From Excel Embedded Object to Base64 String in XML

查看:143
本文介绍了从Excel嵌入式对象到XML中的Base64字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Excel工作表,允许用户单击特定的单元格并附加/嵌入文件.这些文件通常是.pdf和.jpg格式.我已经阅读了《忙碌的开发人员指南》中有关如何使用Apache POI读取嵌入式文件的信息,但是我认为我实际上并未读取正确的文件,因为当我去本地保存文件或编码然后解码以进行测试时,该文件说损坏,将无法打开.

I have an Excel sheet that allows users to click on specific cells and attach/embed files. These files are typically .pdf and .jpg format. I've read the Busy Developers guide on how to read embedded files using Apache POI, however I don't think I'm actually reading the correct file because when I go to save file locally or encode then decode for testing, the file says corrupt and will not open.

这是一些代码:

for (PackagePart pPart : workbook.getAllEmbedds()) {
    InputStream inputStream = pPart.getInputStream();
    byte[] bytes = IOUtils.toByteArray(inputStream);
    inputStream.close();

    byte[] encoded = Base64.encodeBase64(bytes);

    attachmentFile.setValue(encoded);

    JAXBElement<Base64Binary> item = ncObjectFactory.createBinaryBase64Object(attachmentFile);

    attachment.getBinaryObject().add(item);
    attachment.getBinaryFormatID().add(idType);
    attachment.getBinaryDescriptionText().add(attachmentTextType);
    attachmentsType.getAttachment().add(attachment);

以上代码将其放入XML的base64中.但是,当我在测试脚本中解码该文件时,无法打开文件,因为Adobe中的错误表明文件已损坏或未正确保存.

The above code gets it into base64 for my XML. However when I go to decode this in a test script, I am unable to open the files because the error in Adobe says that the file is corrupt or not saved correctly.

当我通过getAllEmbedds()进行迭代时,得到oleObject1.bin或oleObject2.bin或oleObject3.bin等.我相信这是我嵌入文件的二进制版本,那么如何将它们转换回其原始格式,以便可以在本地或在另一台计算机上打开它们?

I get oleObject1.bin, or oleObject2.bin, or, oleObject3.bin, etc as I iterate through getAllEmbedds(). I believe this is the binary version of my embedded files, so how do I convert them back to their original format so they can be opened locally or on another machine?

我的总体目标是将嵌入式对象作为Base64BinaryObjects放置到XML中,将XML发送到另一个系统,以便它可以将这些文件拉出以进行检查.我当前的问题是,一旦从XML中检索了文件,由于它们的格式已损坏/损坏/格式不正确,它们将无法打开.

My overall goal is to place embedded objects into an XML as Base64BinaryObjects, send XML to another system so it can pull those files out for review. My current issue is that once the files are retrieved from the XML, they won't open because they are corrupt/damaged/not correct format.

更新:更深入地了解oleObject.bin文件,我发现某种包装器已添加到原始文件中.因此,在原始文件的开头和结尾添加了字节(?).当我在Adobe中打开文件时,由于文件在前1024个字节内找不到%PDF,因此文件已损坏.因此,我想我的问题会导致-如何删除文件开头的包装器和/或字节?

Update: Looking deeper into the oleObject.bin files, I see that some sort of wrapper is added to the original file. So there are bytes (?) added to the front and end of the original file. When I go to open the file in Adobe, I get that the file is corrupt since it can't find %PDF within the first 1024 bytes. So, I guess my question leads to - how do I remove the wrapper and/or the bytes at the beginning of the file?

推荐答案

我能够为oleObject.bin文件弄清楚这一点.问题是* .bin文件将OLE标头添加到原始文件,并且当我尝试通过Adobe读取文件时,出现错误.因此,我不得不删除添加的标题,或者弄清楚如何在没有标题的情况下获取内容.这是对我有用的东西:

I was able to figure this out for oleObject.bin files. The problem is that the *.bin file was adding an OLE header to the original file and when I tried to read the file via Adobe, I got an error. So I had to either remove the added header or figure out how to get content without the header. Here's what worked for me:

POIFSFileSystem fs = new POIFSFileSystem(pPart.getInputStream());
TikaInputStream stream = null;
stream = TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));

bytes = IOUtils.toByteArray(stream);
String encoded = Base64.encodeBase64String(bytes);

这篇关于从Excel嵌入式对象到XML中的Base64字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆