使用 HSSF 的 Apache POI 比使用 XSSF 快得多——接下来呢? [英] Apache POI much quicker using HSSF than XSSF - what next?

查看:26
本文介绍了使用 HSSF 的 Apache POI 比使用 XSSF 快得多——接下来呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 Apache POI 解析 .xlsx 文件时遇到了一些问题 - 我在部署的应用程序中遇到了 java.lang.OutOfMemoryError: Java heap space.我只处理小于 5MB 和大约 70,000 行的文件,所以我从阅读其他问题中怀疑是有些问题.

注意:上面提到的堆空间异常只发生在生产环境中(在 Google App Engine 上)并且只发生在 .xlsx 文件中,但是这个问题中提到的测试都在我的开发机器上运行过使用 -Xmx2g.我希望如果我能解决我的开发设置中的问题,它会在我部署时使用更少的内存.

来自应用引擎的堆栈跟踪:

<块引用>

Caused by: java.lang.OutOfMemoryError: Java heap space在 org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260)在 org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997)在 org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3211)在 org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)在 org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)在 org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)

解决方案

我在使用 Apache POI 读取庞大的 .xlsx 文件时遇到了同样的问题,我遇到了

excel-streaming-reader-github

这个库作为该流 API 的包装器,同时保留了标准 POI API 的语法

这个库可以帮助你阅读大文件.

I've been having some issues with parsing .xlsx files with Apache POI - I am getting java.lang.OutOfMemoryError: Java heap space in my deployed app. I'm only processing files under 5MB and around 70,000 rows so my suspicion from reading number other questions is that something is amiss.

As suggested in this comment I decided to run SSPerformanceTest.java with the suggested variables so see if there is anything wrong with my code or setup. The results show a significant difference between HSSF (.xls) and XSSF (.xlsx):

1) HSSF 50000 50 1: Elapsed 1 seconds

2) SXSSF 50000 50 1: Elapsed 5 seconds

3) XSSF 50000 50 1: Elapsed 15 seconds

The FAQ specifically says:

If you can't run that with 50,000 rows and 50 columns in all of HSSF, XSSF and SXSSF in under 3 seconds (ideally a lot less!), the problem is with your environment.

Next, it says to run XLS2CSV.java which I have done. Feeding in the XSSF file generated above (with 50000 rows and 50 columns) takes around 15 seconds - the same amount it took to write the file.

Is something wrong with my environment, and if so how do I investigate further?

Stats from VisualVM show the heap used shooting up to 1.2Gb during the processing. Surely this is way too high considering that's an extra gig on top of the heap compared to before processing began?

Note: The heap space exception mentioned above only happens in production (on Google App Engine) and only for .xlsx files, however the tests mentioned in this question have all been run on my development machine with -Xmx2g. I'm hoping that if I can fix the problem on my development setup it will use less memory when I deploy.

Stack trace from app engine:

Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997) at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3211) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)

解决方案

I was facing same kind of issue to read bulky .xlsx file using Apache POI and I came across

excel-streaming-reader-github

This library serves as a wrapper around that streaming API while preserving the syntax of the standard POI API

This library can help you to read large files.

这篇关于使用 HSSF 的 Apache POI 比使用 XSSF 快得多——接下来呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆