如何找出是什么原因使poi破坏了xlsx/xlsm文件 [英] How to find out what makes poi corrupt a xlsx / xlsm file

查看:239
本文介绍了如何找出是什么原因使poi破坏了xlsx/xlsm文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,就是Apache POI仅通过读写即可破坏" xlsm/xlsx文件

I have the issue that Apache POI "corrupted" a xlsm / xlsx file by just reading and writing it (e.g. with the following code)

public class Snippet {
    public static void main(String[] args) throws Exception {

        String str1 = "c:/tmp/spreadsheet.xlsm";
        String str2 = "c:/tmp/spreadsheet_poi.xlsm";

        // open file
        XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(new File(str1)));

        // save file
        FileOutputStream out = new FileOutputStream(str2);
        wb.write(out);
        wb.close();
        out.close();

    }
}

一旦在Excel中打开了sheetsheet_poi.xlsm,您将收到类似以下的错误

Once you open the spreadsheet_poi.xlsm in Excel you'll get an error like the following

我们发现xxx中的某些内容存在问题.您是否要我们尽我们最大努力来恢复……"?

"We found a problem with some content in xxx. Do you want us to try to recover as much as we can..."?

如果您说是",您将得到一条看起来像这样的日志:

If you say yes you'll end up with a log which could look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <logFileName>error145040_01.xml</logFileName>
    <summary>Errors were detected in file 'C:\tmp\spreadsheet_poi.xlsm'</summary>
    <repairedParts>
        <repairedPart>Repaired Part: /xl/worksheets/sheet4.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
        <repairedPart>Repaired Part: /xl/worksheets/sheet5.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
        <repairedPart>Repaired Part: /xl/worksheets/sheet8.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
    </repairedParts>
</recoveryLog>

更详细地调试问题的最佳方法是什么(例如,找出使poi破坏"文件的原因是什么?

Whats the best approach to debug the issue in more detail (e.g. find out what makes poi to "corrupt" the file?

推荐答案

最终,我发现调试这的最佳方法是两件事

Eventually I found how that the best approach for debugging this are two things

  1. 打开受影响的工作簿(例如,使用7zip并使用xml编辑器(例如,记事本++>插件> XML工具>漂亮打印(仅XML-带有换行符))格式化受影响的工作表.会在Excel错误日志中获得真实"行号.替代选项(我没有尝试过,但应该根据POI邮件列表运行:使用OOXMLPrettyPrint(://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ooxml/dev/)来格式化文件,然后在excel中重新打开它.
  2. 如果实际行号尚不能帮助比较原始xlsx文件的工作表xml文件和poi保存的工作表xml文件.您会注意到,属性方面存在差异,并且顺序也有所不同.为了进行正确的比较,我使用了其他文件格式"之外的超越比较"(请参阅​​ https://weblogs.asp.net/lorenh/comparing-xml-files-with-beyond-compare-3-brilliant 了解更多信息).也许还有另一个比较好的工具.
  1. open the affected workbook (e.g. with 7zip and format the affected sheets with an xml editor (e.g. Notepad++ > Plugins > XML Tools > Pretty print (XML only - with line breaks). After saving the files and updating the xlsm file you'll get the "real" line numbers in the Excel error log. Alternative option (which I haven't tried but should work according to the POI mailing liste: use OOXMLPrettyPrint (https://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ooxml/dev/) to format the file and then reopen it it in excel.
  2. if the real line numbers not already help compare the sheet xml files of the original xlsx file and the one saved by poi. You'll notice that there are differences in regards to the attributes and also the order is different. In order to properly compare I used Beyond Compare with "Additional File Formats" (see https://weblogs.asp.net/lorenh/comparing-xml-files-with-beyond-compare-3-brilliant for more information). Maybe there is another diff tool that is equally good.

在我的情况下,问题是poi某种程度上将尺寸设置从更改为

In my case the problem was that poi somehow changed the dimension setting from

<dimension ref="A1:XFD147"/>

<dimension ref="A1:XFE147"/>

(XFE蜂鸣不存在的列).我通过删除原始xlsx文件中的许多空列来修复它.

(with XFE beeing a non existing column). I fixed it by removing those many empty columns in the original xlsx file.

这篇关于如何找出是什么原因使poi破坏了xlsx/xlsm文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆