从 Excel 嵌入对象到 XML 中的 Base64 字符串 [英] From Excel Embedded Object to Base64 String in XML

查看:29
本文介绍了从 Excel 嵌入对象到 XML 中的 Base64 字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Excel 工作表,允许用户单击特定单元格并附加/嵌入文件.这些文件通常是 .pdf 和 .jpg 格式.我已经阅读了有关如何使用 Apache POI 读取嵌入文件的 Busy Developers 指南,但是我认为我实际上并没有读取正确的文件,因为当我在本地保存文件或编码然后解码以进行测试时,该文件说损坏,打不开.

I have an Excel sheet that allows users to click on specific cells and attach/embed files. These files are typically .pdf and .jpg format. I've read the Busy Developers guide on how to read embedded files using Apache POI, however I don't think I'm actually reading the correct file because when I go to save file locally or encode then decode for testing, the file says corrupt and will not open.

这是一些代码:

for (PackagePart pPart : workbook.getAllEmbedds()) {
    InputStream inputStream = pPart.getInputStream();
    byte[] bytes = IOUtils.toByteArray(inputStream);
    inputStream.close();

    byte[] encoded = Base64.encodeBase64(bytes);

    attachmentFile.setValue(encoded);

    JAXBElement<Base64Binary> item = ncObjectFactory.createBinaryBase64Object(attachmentFile);

    attachment.getBinaryObject().add(item);
    attachment.getBinaryFormatID().add(idType);
    attachment.getBinaryDescriptionText().add(attachmentTextType);
    attachmentsType.getAttachment().add(attachment);

上面的代码将它转换为我的 XML 的 base64.但是,当我在测试脚本中对其进行解码时,我无法打开文件,因为 Adob​​e 中的错误表明文件已损坏或未正确保存.

The above code gets it into base64 for my XML. However when I go to decode this in a test script, I am unable to open the files because the error in Adobe says that the file is corrupt or not saved correctly.

当我遍历 getAllEmbedds() 时,我得到 oleObject1.bin、oleObject2.bin 或 oleObject3.bin 等.我相信这是我嵌入文件的二进制版本,那么如何将它们转换回原始格式,以便它们可以在本地或其他机器上打开?

I get oleObject1.bin, or oleObject2.bin, or, oleObject3.bin, etc as I iterate through getAllEmbedds(). I believe this is the binary version of my embedded files, so how do I convert them back to their original format so they can be opened locally or on another machine?

我的总体目标是将嵌入的对象作为 Base64BinaryObjects 放入 XML 中,将 XML 发送到另一个系统,以便它可以提取这些文件进行审查.我目前的问题是,一旦从 XML 中检索文件,它们将无法打开,因为它们已损坏/损坏/格式不正确.

My overall goal is to place embedded objects into an XML as Base64BinaryObjects, send XML to another system so it can pull those files out for review. My current issue is that once the files are retrieved from the XML, they won't open because they are corrupt/damaged/not correct format.

更新: 深入查看 oleObject.bin 文件,我发现原始文件中添加了某种包装器.所以在原始文件的前端和末尾添加了字节(?).当我在 Adob​​e 中打开文件时,我发现文件已损坏,因为它在前 1024 个字节中找不到 %PDF.所以,我想我的问题导致 - 如何删除文件开头的包装器和/或字节?

Update: Looking deeper into the oleObject.bin files, I see that some sort of wrapper is added to the original file. So there are bytes (?) added to the front and end of the original file. When I go to open the file in Adobe, I get that the file is corrupt since it can't find %PDF within the first 1024 bytes. So, I guess my question leads to - how do I remove the wrapper and/or the bytes at the beginning of the file?

推荐答案

我能够为 oleObject.bin 文件解决这个问题.问题是 *.bin 文件向原始文件添加了一个 OLE 标头,当我尝试通过 Adob​​e 读取文件时,出现错误.所以我必须要么删除添加的标题,要么弄清楚如何在没有标题的情况下获取内容.以下是对我有用的方法:

I was able to figure this out for oleObject.bin files. The problem is that the *.bin file was adding an OLE header to the original file and when I tried to read the file via Adobe, I got an error. So I had to either remove the added header or figure out how to get content without the header. Here's what worked for me:

POIFSFileSystem fs = new POIFSFileSystem(pPart.getInputStream());
TikaInputStream stream = null;
stream = TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));

bytes = IOUtils.toByteArray(stream);
String encoded = Base64.encodeBase64String(bytes);

这篇关于从 Excel 嵌入对象到 XML 中的 Base64 字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆