Apache的POI快得多使用HSSF比XSSF - 何去何从? [英] Apache POI much quicker using HSSF than XSSF - what next?
问题描述
我一直有与解析的.xlsx
文件与Apache POI的一些问题 - 我得到 java.lang.OutOfMemoryError:Java堆空间
在我的部署应用程序。我只处理5MB下文件及约70,000行,以便从阅读一些其他的问题,我怀疑是出了什么错。
作为<建议href=\"http://stackoverflow.com/questions/26067405/heap-space-error-with-apache-poi-xssf#comment40844721_26067405\">this评论我决定跑 SSPerformanceTest.java
与建议的变量,以便看看有什么毛病我的code或设置。结果表明HSSF(的.xls
)和XSSF之间的差异显著(的.xlsx
)
1) HSSF 50000 50 1:已播放1秒。
2)的 SXSSF 50000 50 1:经过的5秒
3) XSSF 50000 50 1:经过的15秒
借助 FAQ 特别说:
如果您不能在运行以50,000行和50列的所有HSSF,XSSF和SXSSF的不足3秒(最好少了很多!),问题是你的环境。
块引用>其次,它说运行
XLS2CSV.java
这是我做的事。上面生成的文件XSSF饲养(使用50000行和列50)需要15秒左右 - 它采取了同样的金额写入文件什么毛病我的环境,如果是这样我怎么进一步调查?
从统计资料显示的VisualVM在加工过程中投篮命中率高达1.2GB使用的堆。当然,这是太高考虑的堆顶部一个额外的演出相比之前开始处理?
注:以上仅提到的堆空间异常生产(在谷歌App引擎)发生,只为
的.xlsx
文件,然而,在这个问题中提到的测试都我的机器上使用-Xmx2g
被运行。我希望,如果我可以在我的开发设置解决这个问题,将使用较少的内存,当我部署。从应用程序引擎堆栈跟踪:
产生的原因:java.lang.OutOfMemoryError:Java堆空间
在org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260)
在org.apache.xmlbeans.impl.store.Cur $ CurLoadContext.startElement(Cur.java:2997)
在org.apache.xmlbeans.impl.store.Locale $ SaxHandler.startElement(Locale.java:3211)
在org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082)
在org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802)
在org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
块引用>解决方案我正面临着同一种问题,使用Apache POI读取庞大的.xlsx文件,我碰到
本库作为围绕流API,而preserving标准POI API的语法包装
这个库可以帮助你阅读大文件。
I've been having some issues with parsing
.xlsx
files with Apache POI - I am gettingjava.lang.OutOfMemoryError: Java heap space
in my deployed app. I'm only processing files under 5MB and around 70,000 rows so my suspicion from reading number other questions is that something is amiss.As suggested in this comment I decided to run
SSPerformanceTest.java
with the suggested variables so see if there is anything wrong with my code or setup. The results show a significant difference between HSSF (.xls
) and XSSF (.xlsx
):1) HSSF 50000 50 1: Elapsed 1 seconds
2) SXSSF 50000 50 1: Elapsed 5 seconds
3) XSSF 50000 50 1: Elapsed 15 seconds
The FAQ specifically says:
If you can't run that with 50,000 rows and 50 columns in all of HSSF, XSSF and SXSSF in under 3 seconds (ideally a lot less!), the problem is with your environment.
Next, it says to run
XLS2CSV.java
which I have done. Feeding in the XSSF file generated above (with 50000 rows and 50 columns) takes around 15 seconds - the same amount it took to write the file.Is something wrong with my environment, and if so how do I investigate further?
Stats from VisualVM show the heap used shooting up to 1.2Gb during the processing. Surely this is way too high considering that's an extra gig on top of the heap compared to before processing began?
Note: The heap space exception mentioned above only happens in production (on Google App Engine) and only for
.xlsx
files, however the tests mentioned in this question have all been run on my development machine with-Xmx2g
. I'm hoping that if I can fix the problem on my development setup it will use less memory when I deploy.Stack trace from app engine:
Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.xmlbeans.impl.store.Cur.createElementXobj(Cur.java:260) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.startElement(Cur.java:2997) at org.apache.xmlbeans.impl.store.Locale$SaxHandler.startElement(Locale.java:3211) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportStartTag(Piccolo.java:1082) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseAttributesNS(PiccoloLexer.java:1802) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseOpenTagNS(PiccoloLexer.java:1521)
解决方案I was facing same kind of issue to read bulky .xlsx file using Apache POI and I came across
This library serves as a wrapper around that streaming API while preserving the syntax of the standard POI API
This library can help you to read large files.
这篇关于Apache的POI快得多使用HSSF比XSSF - 何去何从?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!